Summary: | 碩士 === 國立中山大學 === 資訊管理學系研究所 === 92 === Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis.
Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user’s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.
|