Co-Occurrence Based Feature Selection in Automatic Text Classification

碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user w...

Full description

Bibliographic Details
Main Authors: Lin Cheng Nan, 林政男
Other Authors: Lee Yue-Shi
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/06738940498276905238
Description
Summary:碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted. Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag.