Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, n...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2004
|
Online Access: | http://ndltd.ncl.edu.tw/handle/65318851471594812159 |
id |
ndltd-TW-092KUAS0442031 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-092KUAS04420312015-10-13T16:22:46Z http://ndltd.ncl.edu.tw/handle/65318851471594812159 Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques SupportVectorMachine技術應用於中文文件自動分類之探討 Po-Yi Li 李柏毅 碩士 國立高雄應用科技大學 電機工程系碩士班 92 “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others. Chung-Hong Lee 李俊宏 2004 學位論文 ; thesis 110 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size.
This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others.
|
author2 |
Chung-Hong Lee |
author_facet |
Chung-Hong Lee Po-Yi Li 李柏毅 |
author |
Po-Yi Li 李柏毅 |
spellingShingle |
Po-Yi Li 李柏毅 Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
author_sort |
Po-Yi Li |
title |
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
title_short |
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
title_full |
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
title_fullStr |
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
title_full_unstemmed |
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques |
title_sort |
automatic text categorization of chinese document using support vector machine techniques |
publishDate |
2004 |
url |
http://ndltd.ncl.edu.tw/handle/65318851471594812159 |
work_keys_str_mv |
AT poyili automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques AT lǐbǎiyì automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques AT poyili supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo AT lǐbǎiyì supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo |
_version_ |
1717769877534539776 |