Summary: | 碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted.
Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag.
|