Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques

碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, n...

Full description

Bibliographic Details
Main Authors: Po-Yi Li, 李柏毅
Other Authors: Chung-Hong Lee
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/65318851471594812159
id ndltd-TW-092KUAS0442031
record_format oai_dc
spelling ndltd-TW-092KUAS04420312015-10-13T16:22:46Z http://ndltd.ncl.edu.tw/handle/65318851471594812159 Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques SupportVectorMachine技術應用於中文文件自動分類之探討 Po-Yi Li 李柏毅 碩士 國立高雄應用科技大學 電機工程系碩士班 92 “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others. Chung-Hong Lee 李俊宏 2004 學位論文 ; thesis 110 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others.
author2 Chung-Hong Lee
author_facet Chung-Hong Lee
Po-Yi Li
李柏毅
author Po-Yi Li
李柏毅
spellingShingle Po-Yi Li
李柏毅
Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
author_sort Po-Yi Li
title Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_short Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_full Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_fullStr Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_full_unstemmed Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_sort automatic text categorization of chinese document using support vector machine techniques
publishDate 2004
url http://ndltd.ncl.edu.tw/handle/65318851471594812159
work_keys_str_mv AT poyili automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques
AT lǐbǎiyì automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques
AT poyili supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo
AT lǐbǎiyì supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo
_version_ 1717769877534539776