Summary: | 碩士 === 國立中正大學 === 資訊工程研究所 === 90 === In this thesis, we study the problem of integrating documents from different sources into a comprehensive topic hierarchy. Our
objective is to develop efficient techniques that improve the
accuracy of traditional categorization methods by incorporating
categorization information provided by data sources into
categorization process. Notice that in the World-Wide Web,
categorization information is often available from information
sources. For example, news from newspapers, books from publishers, items from electronic commercial sites, or even web pages archived by web information portals are categorized. Observe that many of the topic hierarchies adopted by current information sources are highly related. We believe that categorization information can be used to improve classification accuracy. We present several techniques that explore relations between topic hierarchies and incorporate categorization information from source hierarchies into traditional classification methods such as Baysian methods
and support vector machines. Experiment on collections from
Openfind and Yam, and Google and Yahoo, well-known popular web
sites in Taiwan and USA, respectively, shows that incorporating
categorization information from source hierarchies can
significantly improve the classification accuracy.
|