Enhance Performance of Unsupervised Text Categorization by Using External Information

碩士 === 國立雲林科技大學 === 資訊管理系 === 102 === With swift growth of online text, how to organize text data effectively has become a major issue. Text classification is the task of classifying documents into pre-defined categories. For this, many supervised classification methods have been proposed. But super...

Full description

Bibliographic Details
Main Authors: Chun-Chih Chang, 張駿志
Other Authors: Chung-Chian Hsu
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/60800446714501803893
Description
Summary:碩士 === 國立雲林科技大學 === 資訊管理系 === 102 === With swift growth of online text, how to organize text data effectively has become a major issue. Text classification is the task of classifying documents into pre-defined categories. For this, many supervised classification methods have been proposed. But supervised learning methods have some disadvantage. The biggest bottleneck is the requirement of a large amount of training data for better classification performance. While unlabeled documents are simply collected and abundant, labeled documents are difficult to collect because labeling is usually done manually. The task is time-consuming. To overcome those disadvantages and achieve better classification accuracy without labeled data, we propose the combination of three external sources “Wikipedia”, ”WordNet” and ”Google distance” for text classification on unsupervised learning. The result of experiments shows that the combination of Wikipedia with WordNet achieves better performance than the individual methods