Summary: | 碩士 === 國立中正大學 === 資訊工程研究所 === 90 === Automatic text categorization, which is defined as the task of
assigning predefined class (category) labels to text documents, is one of the main techniques that are useful both in organizing and in locating information in huge text collections from, for example, the Internet. Many approaches such as linear classifiers, decision trees, Bayesian methods, neural networks and support vector machines, have been extensively studied and used to implement classifier systems for text categorization as well as for web page classification. Although a lot of efforts have been spent in each of these methods, we are reaching the limit of further performance improvement. Multiple classifier systems whose objective aims to combine the strength of individual classifiers to improve overall performance, have been widely studied recently.
In this thesis, we study the development of multiple classifier
systems in the automated text categorization. We investigate and
propose various approaches for fundamental issues such as
classifier combination, classifier subset selection, and static
and dynamic classifier selection. We use our idea to develop
efficient combination-based as well as selection-based multiple
classifier systems. Experiments show that our approaches
significantly improves the classification accuracy of individual
classifiers for web page collections from web portals. In
addition, we also propose a cascaded class reduction method in
which a sequence of classifiers are cascaded to successively
reducing the set of possible classes. We show that by cascading
Naive Bayes and SVMs, we can improve the classification accuracy
of SVMs while reducing the running time of SVMs.
|