A Hybrid Chinese Feature Selection Method for Knowledge Document Classification
碩士 === 國立成功大學 === 工業與資訊管理學系專班 === 101 === Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting fe...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/61154689719734252916 |
id |
ndltd-TW-101NCKU5041064 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NCKU50410642016-03-18T04:42:17Z http://ndltd.ncl.edu.tw/handle/61154689719734252916 A Hybrid Chinese Feature Selection Method for Knowledge Document Classification 利用混合式中文特徵選取法於知識文件分類 Kuan-ChungKuo 郭冠忠 碩士 國立成功大學 工業與資訊管理學系專班 101 Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting features which affecting the accuracy of classification, it is necessary to do text pre-processing before classifying knowledge documents. Unfortunately, Chinese sentences are not easy to segment in text pre-processing phase, because there is no white space between two Chinese terms. Currently, there are two common methods to do Chinese segmentation: One is based on dictionary, the other is based on statistics. Unknown term is always a problem of the Chinese segmentation system based on dictionary. A dictionary could not cover all terms, because the newest terms are created without end. For resolving this problem, this study used two dictionary-based Chinese segmentation systems, Stanford Chinese Word Segmenter and CKIP segmentation system, and one statistical-based method, n-grams method, and calculating the TF-ICF(Term Frequency-Inverse Category Frequency) score of terms to select the final features, then, classifying and validating with SVM classifier. This study found that the hybrid Chinese feature selection method has better accuracy of classification, compared with the method using single Chinese segmentation system. The performance of TF-ICF is better than TF and TF-IDF. The hybrid Chinese feature selection can improve the accuracy of Chinese knowledge documents classification. Hei-Chia Wang 王惠嘉 2013 學位論文 ; thesis 45 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 工業與資訊管理學系專班 === 101 === Enterprises have knowledge management systems for training employees, and the knowledge documents of industries are very important sources of explicit knowledge. Knowledge documents classification is a significant work for enterprises today. For selecting features which affecting the accuracy of classification, it is necessary to do text pre-processing before classifying knowledge documents. Unfortunately, Chinese sentences are not easy to segment in text pre-processing phase, because there is no white space between two Chinese terms. Currently, there are two common methods to do Chinese segmentation: One is based on dictionary, the other is based on statistics.
Unknown term is always a problem of the Chinese segmentation system based on dictionary. A dictionary could not cover all terms, because the newest terms are created without end. For resolving this problem, this study used two dictionary-based Chinese segmentation systems, Stanford Chinese Word Segmenter and CKIP segmentation system, and one statistical-based method, n-grams method, and calculating the TF-ICF(Term Frequency-Inverse Category Frequency) score of terms to select the final features, then, classifying and validating with SVM classifier. This study found that the hybrid Chinese feature selection method has better accuracy of classification, compared with the method using single Chinese segmentation system. The performance of TF-ICF is better than TF and TF-IDF. The hybrid Chinese feature selection can improve the accuracy of Chinese knowledge documents classification.
|
author2 |
Hei-Chia Wang |
author_facet |
Hei-Chia Wang Kuan-ChungKuo 郭冠忠 |
author |
Kuan-ChungKuo 郭冠忠 |
spellingShingle |
Kuan-ChungKuo 郭冠忠 A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
author_sort |
Kuan-ChungKuo |
title |
A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
title_short |
A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
title_full |
A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
title_fullStr |
A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
title_full_unstemmed |
A Hybrid Chinese Feature Selection Method for Knowledge Document Classification |
title_sort |
hybrid chinese feature selection method for knowledge document classification |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/61154689719734252916 |
work_keys_str_mv |
AT kuanchungkuo ahybridchinesefeatureselectionmethodforknowledgedocumentclassification AT guōguānzhōng ahybridchinesefeatureselectionmethodforknowledgedocumentclassification AT kuanchungkuo lìyònghùnhéshìzhōngwéntèzhēngxuǎnqǔfǎyúzhīshíwénjiànfēnlèi AT guōguānzhōng lìyònghùnhéshìzhōngwéntèzhēngxuǎnqǔfǎyúzhīshíwénjiànfēnlèi AT kuanchungkuo hybridchinesefeatureselectionmethodforknowledgedocumentclassification AT guōguānzhōng hybridchinesefeatureselectionmethodforknowledgedocumentclassification |
_version_ |
1718207967847776256 |