Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques

碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, n...

Full description

Bibliographic Details
Main Authors:	Po-Yi Li, 李柏毅
Other Authors:	Chung-Hong Lee
Format:	Others
Language:	zh-TW
Published:	2004
Online Access:	http://ndltd.ncl.edu.tw/handle/65318851471594812159

id	ndltd-TW-092KUAS0442031
record_format	oai_dc
spelling	ndltd-TW-092KUAS04420312015-10-13T16:22:46Z http://ndltd.ncl.edu.tw/handle/65318851471594812159 Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques SupportVectorMachine技術應用於中文文件自動分類之探討 Po-Yi Li 李柏毅碩士國立高雄應用科技大學電機工程系碩士班 92 “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others. Chung-Hong Lee 李俊宏 2004 學位論文 ; thesis 110 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立高雄應用科技大學 === 電機工程系碩士班 === 92 === “Automatic text categorization” is based on machine learning techniques to fulfill classification of heterogeneous texts through an implemented classification system. The theory of Support Vector Machine (SVM) was constructed based on statistical learning, neural network and optimization techniques. The major features of SVM are: (1). the capacity to deal with linear and non-linear problems, and (2). the total sizes of tested data items (data size) are not limited. As a result, SVM algorithm offers an effective solution to resolve the difficulties in text categorization with a large scale data size. This research work is mainly based on Support Vector Machine (SVM) learning algorithm and proposed a strategy of feature selection to carry out classification of Chinese document. Based on several experimental situations, we discussed the differences among several feature selection strategies, and verified their impacts on the performance of SVM based classification tasks. After that, according to the analysis of the strategies, we determined one of them for our implementation of developed classification system, and combined different kernel functions with various parameters into the SVM algorithm to establish the experiments of document categorization. Our experimental results indicate that the SVM algorithm for document classification can produce a satisfactory performance, based on the determined strategy of feature selection. We also demonstrate that only 500 dimensions required, our system can perform an outstanding accuracy of categorization. Eventually we conducted several experiments to compare the neural networks and kNN classifiers with our implemented SVM classifier for document categorization. The SVM classifier also obtains a superior performance than others.
author2	Chung-Hong Lee
author_facet	Chung-Hong Lee Po-Yi Li 李柏毅
author	Po-Yi Li 李柏毅
spellingShingle	Po-Yi Li 李柏毅 Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
author_sort	Po-Yi Li
title	Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_short	Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_full	Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_fullStr	Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_full_unstemmed	Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques
title_sort	automatic text categorization of chinese document using support vector machine techniques
publishDate	2004
url	http://ndltd.ncl.edu.tw/handle/65318851471594812159
work_keys_str_mv	AT poyili automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques AT lǐbǎiyì automatictextcategorizationofchinesedocumentusingsupportvectormachinetechniques AT poyili supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo AT lǐbǎiyì supportvectormachinejìshùyīngyòngyúzhōngwénwénjiànzìdòngfēnlèizhītàntǎo
_version_	1717769877534539776

Automatic Text Categorization of Chinese Document Using Support Vector Machine Techniques

Similar Items