Comparing Representations for Chinese Text Categorization
碩士 === 國立中正大學 === 資訊工程研究所 === 89 === In this thesis, we study the effects of various representations of Chinese documents for automatic text categorization. We make a comparison for word-based and n-gram-based representations when they are combined with weighting factors, such as term frequency(TF),...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2001
|
Online Access: | http://ndltd.ncl.edu.tw/handle/54063113676904681277 |
Summary: | 碩士 === 國立中正大學 === 資訊工程研究所 === 89 === In this thesis, we study the effects of various representations of Chinese documents for automatic text categorization. We make a comparison for word-based and n-gram-based representations when they are combined with weighting factors, such as term frequency(TF), inverse document frequency(IDF) and inverse class frequency(ICF). Experiment on CNA news collection shows that bigram achieves performance close to that of statistical word-based representations, and weighting methods that combine
TF, IDF and ICF achieve the best performance.
|
---|