Summary: | 碩士 === 國立中央大學 === 資訊管理研究所 === 100 === With the rapid development of Internet, many kinds of information website continued a steady increase; the user can easily obtain a great deal of information from a variety of search engines and portals such as Google and Yahoo! However, Jansen, et al. pointed out that under normal circumstances, most users enter only 2.35 keywords, and mostly unclear or incomplete keyword results in returning a lot of websites so that lead to information overload. The research literature in the past, often using the categories of information, or filtering to help reduce the cost of user access to information, but these methods have to be built under the premise of a large number of training data can have good results. Recent studies have proposed NGD provided by Google''s search engine, key in the keywords to get the number of results to calculate the abstract distance between the two words, and then draw a conclusion of two words where the file is similar. However NGD rely on Google''s online search function, with the high-frequency query, Google will refused user to use the search service. In order to solve this problem, this study advances a method that use Wikipedia to establish the offline search engine, because Wikipedia has a structured concepts and high purity content. And with the experimental proofs, when user uses the offline Wikipedia database, the method proposed in this study still provides the user has a stable filtration performance, and saves the user a plenty of time costs.
|