Co-Occurrence Based Feature Selection in Automatic Text Classification
碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user w...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2004
|
Online Access: | http://ndltd.ncl.edu.tw/handle/06738940498276905238 |
id |
ndltd-TW-092MCU00396017 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-092MCU003960172015-10-13T16:22:46Z http://ndltd.ncl.edu.tw/handle/06738940498276905238 Co-Occurrence Based Feature Selection in Automatic Text Classification 以共現語詞為基礎的特徵選取在文件自動分類上之研究 Lin Cheng Nan 林政男 碩士 銘傳大學 資訊管理研究所 92 Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted. Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag. Lee Yue-Shi 李御璽 2004 學位論文 ; thesis 38 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 銘傳大學 === 資訊管理研究所 === 92 === Text classification is an important research subject in the Text Mining. The objective is to judge a new document’s category using pre-defined model in the training phase. We can use this technology to preprocess the new document to give a category, then the user would find the information they wanted.
Automatic text classification usually has two phases: feature selection and function designed. We use two feature select technology: auto-tag, non-auto-tag, and three classifiers: VSM, kNN, and SVM. For the feature unit, we have single term and co-occurrence. Then we use two feature units and try to find the accuracy in the three classifiers. In the experiment results, VSM using auto-tag technology has the better accuracy, and kNN and SVM using non-auto-tag technology have the better accuracy than auto-tag.
|
author2 |
Lee Yue-Shi |
author_facet |
Lee Yue-Shi Lin Cheng Nan 林政男 |
author |
Lin Cheng Nan 林政男 |
spellingShingle |
Lin Cheng Nan 林政男 Co-Occurrence Based Feature Selection in Automatic Text Classification |
author_sort |
Lin Cheng Nan |
title |
Co-Occurrence Based Feature Selection in Automatic Text Classification |
title_short |
Co-Occurrence Based Feature Selection in Automatic Text Classification |
title_full |
Co-Occurrence Based Feature Selection in Automatic Text Classification |
title_fullStr |
Co-Occurrence Based Feature Selection in Automatic Text Classification |
title_full_unstemmed |
Co-Occurrence Based Feature Selection in Automatic Text Classification |
title_sort |
co-occurrence based feature selection in automatic text classification |
publishDate |
2004 |
url |
http://ndltd.ncl.edu.tw/handle/06738940498276905238 |
work_keys_str_mv |
AT linchengnan cooccurrencebasedfeatureselectioninautomatictextclassification AT línzhèngnán cooccurrencebasedfeatureselectioninautomatictextclassification AT linchengnan yǐgòngxiànyǔcíwèijīchǔdetèzhēngxuǎnqǔzàiwénjiànzìdòngfēnlèishàngzhīyánjiū AT línzhèngnán yǐgòngxiànyǔcíwèijīchǔdetèzhēngxuǎnqǔzàiwénjiànzìdòngfēnlèishàngzhīyánjiū |
_version_ |
1717769950041473024 |