A Hybrid Chinese Feature Selection Method for Knowledge Document Classification

碩士 === 國立成功大學 === 工業與資訊管理學系專班 === 101 === Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting fe...

Full description

Bibliographic Details
Main Authors:	Kuan-ChungKuo, 郭冠忠
Other Authors:	Hei-Chia Wang
Format:	Others
Language:	zh-TW
Published:	2013
Online Access:	http://ndltd.ncl.edu.tw/handle/61154689719734252916

id	ndltd-TW-101NCKU5041064
record_format	oai_dc
spelling	ndltd-TW-101NCKU50410642016-03-18T04:42:17Z http://ndltd.ncl.edu.tw/handle/61154689719734252916 A Hybrid Chinese Feature Selection Method for Knowledge Document Classification 利用混合式中文特徵選取法於知識文件分類 Kuan-ChungKuo 郭冠忠碩士國立成功大學工業與資訊管理學系專班 101 Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting features which affecting the accuracy of classification, it is necessary to do text pre-processing before classifying knowledge documents. Unfortunately, Chinese sentences are not easy to segment in text pre-processing phase, because there is no white space between two Chinese terms. Currently, there are two common methods to do Chinese segmentation: One is based on dictionary, the other is based on statistics. Unknown term is always a problem of the Chinese segmentation system based on dictionary. A dictionary could not cover all terms, because the newest terms are created without end. For resolving this problem, this study used two dictionary-based Chinese segmentation systems, Stanford Chinese Word Segmenter and CKIP segmentation system, and one statistical-based method, n-grams method, and calculating the TF-ICF(Term Frequency-Inverse Category Frequency) score of terms to select the final features, then, classifying and validating with SVM classifier. This study found that the hybrid Chinese feature selection method has better accuracy of classification, compared with the method using single Chinese segmentation system. The performance of TF-ICF is better than TF and TF-IDF. The hybrid Chinese feature selection can improve the accuracy of Chinese knowledge documents classification. Hei-Chia Wang 王惠嘉 2013 學位論文 ; thesis 45 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 工業與資訊管理學系專班 === 101 === Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting features which affecting the accuracy of classification, it is necessary to do text pre-processing before classifying knowledge documents. Unfortunately, Chinese sentences are not easy to segment in text pre-processing phase, because there is no white space between two Chinese terms. Currently, there are two common methods to do Chinese segmentation: One is based on dictionary, the other is based on statistics. Unknown term is always a problem of the Chinese segmentation system based on dictionary. A dictionary could not cover all terms, because the newest terms are created without end. For resolving this problem, this study used two dictionary-based Chinese segmentation systems, Stanford Chinese Word Segmenter and CKIP segmentation system, and one statistical-based method, n-grams method, and calculating the TF-ICF(Term Frequency-Inverse Category Frequency) score of terms to select the final features, then, classifying and validating with SVM classifier. This study found that the hybrid Chinese feature selection method has better accuracy of classification, compared with the method using single Chinese segmentation system. The performance of TF-ICF is better than TF and TF-IDF. The hybrid Chinese feature selection can improve the accuracy of Chinese knowledge documents classification.
author2	Hei-Chia Wang
author_facet	Hei-Chia Wang Kuan-ChungKuo 郭冠忠
author	Kuan-ChungKuo 郭冠忠
spellingShingle	Kuan-ChungKuo 郭冠忠 A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
author_sort	Kuan-ChungKuo
title	A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
title_short	A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
title_full	A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
title_fullStr	A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
title_full_unstemmed	A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
title_sort	hybrid chinese feature selection method for knowledge document classification
publishDate	2013
url	http://ndltd.ncl.edu.tw/handle/61154689719734252916
work_keys_str_mv	AT kuanchungkuo ahybridchinesefeatureselectionmethodforknowledgedocumentclassification AT guōguānzhōng ahybridchinesefeatureselectionmethodforknowledgedocumentclassification AT kuanchungkuo lìyònghùnhéshìzhōngwéntèzhēngxuǎnqǔfǎyúzhīshíwénjiànfēnlèi AT guōguānzhōng lìyònghùnhéshìzhōngwéntèzhēngxuǎnqǔfǎyúzhīshíwénjiànfēnlèi AT kuanchungkuo hybridchinesefeatureselectionmethodforknowledgedocumentclassification AT guōguānzhōng hybridchinesefeatureselectionmethodforknowledgedocumentclassification
_version_	1718207967847776256

A Hybrid Chinese Feature Selection Method for Knowledge Document Classification

Similar Items