Comparing Representations for Chinese Text Categorization

碩士 === 國立中正大學 === 資訊工程研究所 === 89 === In this thesis, we study the effects of various representations of Chinese documents for automatic text categorization. We make a comparison for word-based and n-gram-based representations when they are combined with weighting factors, such as term frequency(TF),...

Full description

Bibliographic Details
Main Authors: Sheng-Bin Chiu, 邱聖斌
Other Authors: Jyh-Jong Tsay
Format: Others
Language:en_US
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/54063113676904681277
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 89 === In this thesis, we study the effects of various representations of Chinese documents for automatic text categorization. We make a comparison for word-based and n-gram-based representations when they are combined with weighting factors, such as term frequency(TF), inverse document frequency(IDF) and inverse class frequency(ICF). Experiment on CNA news collection shows that bigram achieves performance close to that of statistical word-based representations, and weighting methods that combine TF, IDF and ICF achieve the best performance.