Enhance Performance of Unsupervised Text Categorization by Using External Information
碩士 === 國立雲林科技大學 === 資訊管理系 === 102 === With swift growth of online text, how to organize text data effectively has become a major issue. Text classification is the task of classifying documents into pre-defined categories. For this, many supervised classification methods have been proposed. But super...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2014
|
Online Access: | http://ndltd.ncl.edu.tw/handle/60800446714501803893 |
Summary: | 碩士 === 國立雲林科技大學 === 資訊管理系 === 102 === With swift growth of online text, how to organize text data effectively has become a major issue. Text classification is the task of classifying documents into pre-defined categories. For this, many supervised classification methods have been proposed. But supervised learning methods have some disadvantage. The biggest bottleneck is the requirement of a large amount of training data for better classification performance. While unlabeled documents are simply collected and abundant, labeled documents are difficult to collect because labeling is usually done manually. The task is time-consuming. To overcome those disadvantages and achieve better classification accuracy without labeled data, we propose the combination of three external sources “Wikipedia”, ”WordNet” and ”Google distance” for text classification on unsupervised learning. The result of experiments shows that the combination of Wikipedia with WordNet achieves better performance than the individual methods
|
---|