Summary: | 碩士 === 國防大學管理學院 === 資訊管理學系 === 100 === With the rapid spread of Internet innovation in information technology, the Internet has become an indispensable source of access to information and electronic newspaper is the best tool for acquiring new knowledge and understanding current events. Currently, most of researches on news retrieval promote the precision rate and the recall rate of news retrieval via semantic/ concept expansion. However, their works result in a large amount of news to be processed. On the other hand, news classification is helpful for readers to focus on certain topics, but the outcome of news classification might not meet readers’ expectations.
In this study, the military news coming from e-paper is collected as a document corpus. The words of document corpus are identified via the CKIP developed by Academia Sinica and are used to construct as a news ontology for improving the precision rate and the recall rate of news retrieval. This study also introduces TF-IDF to select the representative terms for per news. Next, instead of using an ambiguous way to classify news by experts or algorithms, this research adopts a naïve 5W method to classify the words involving in news and a clustering approach to group the news with similar features. To reduce the amount of recalled news, this study keeps the news with higher information gain and bypasses the lower one. Furthermore, we apply a statistical method to automatically generate news summaries. By giving index terms as image annotation, a description of image is fulfilled and a platform of news retrieval is implemented, finally.
|