A Study of Multiple Classifier Systems in Automated Text Categorization

碩士 === 國立中正大學 === 資訊工程研究所 === 90 === Automatic text categorization, which is defined as the task of assigning predefined class (category) labels to text documents, is one of the main techniques that are useful both in organizing and in locating information in huge text collections from, f...

Full description

Bibliographic Details
Main Authors: Yuan-Gu Wei, 魏源谷
Other Authors: Jyh-Jong Tsay
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/58157330409643777309
Description
Summary:碩士 === 國立中正大學 === 資訊工程研究所 === 90 === Automatic text categorization, which is defined as the task of assigning predefined class (category) labels to text documents, is one of the main techniques that are useful both in organizing and in locating information in huge text collections from, for example, the Internet. Many approaches such as linear classifiers, decision trees, Bayesian methods, neural networks and support vector machines, have been extensively studied and used to implement classifier systems for text categorization as well as for web page classification. Although a lot of efforts have been spent in each of these methods, we are reaching the limit of further performance improvement. Multiple classifier systems whose objective aims to combine the strength of individual classifiers to improve overall performance, have been widely studied recently. In this thesis, we study the development of multiple classifier systems in the automated text categorization. We investigate and propose various approaches for fundamental issues such as classifier combination, classifier subset selection, and static and dynamic classifier selection. We use our idea to develop efficient combination-based as well as selection-based multiple classifier systems. Experiments show that our approaches significantly improves the classification accuracy of individual classifiers for web page collections from web portals. In addition, we also propose a cascaded class reduction method in which a sequence of classifiers are cascaded to successively reducing the set of possible classes. We show that by cascading Naive Bayes and SVMs, we can improve the classification accuracy of SVMs while reducing the running time of SVMs.