Co-Occurrence Based Feature Selection in Automatic Text Classification

碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user w...

Full description

Bibliographic Details
Main Authors: Lin Cheng Nan, 林政男
Other Authors: Lee Yue-Shi
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/06738940498276905238
id ndltd-TW-092MCU00396017
record_format oai_dc
spelling ndltd-TW-092MCU003960172015-10-13T16:22:46Z http://ndltd.ncl.edu.tw/handle/06738940498276905238 Co-Occurrence Based Feature Selection in Automatic Text Classification 以共現語詞為基礎的特徵選取在文件自動分類上之研究 Lin Cheng Nan 林政男 碩士 銘傳大學 資訊管理研究所 92 Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted. Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag. Lee Yue-Shi 李御璽 2004 學位論文 ; thesis 38 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted. Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag.
author2 Lee Yue-Shi
author_facet Lee Yue-Shi
Lin Cheng Nan
林政男
author Lin Cheng Nan
林政男
spellingShingle Lin Cheng Nan
林政男
Co-Occurrence Based Feature Selection in Automatic Text Classification
author_sort Lin Cheng Nan
title Co-Occurrence Based Feature Selection in Automatic Text Classification
title_short Co-Occurrence Based Feature Selection in Automatic Text Classification
title_full Co-Occurrence Based Feature Selection in Automatic Text Classification
title_fullStr Co-Occurrence Based Feature Selection in Automatic Text Classification
title_full_unstemmed Co-Occurrence Based Feature Selection in Automatic Text Classification
title_sort co-occurrence based feature selection in automatic text classification
publishDate 2004
url http://ndltd.ncl.edu.tw/handle/06738940498276905238
work_keys_str_mv AT linchengnan cooccurrencebasedfeatureselectioninautomatictextclassification
AT línzhèngnán cooccurrencebasedfeatureselectioninautomatictextclassification
AT linchengnan yǐgòngxiànyǔcíwèijīchǔdetèzhēngxuǎnqǔzàiwénjiànzìdòngfēnlèishàngzhīyánjiū
AT línzhèngnán yǐgòngxiànyǔcíwèijīchǔdetèzhēngxuǎnqǔzàiwénjiànzìdòngfēnlèishàngzhīyánjiū
_version_ 1717769950041473024