Accuracy and Diversity in Ensembles of Text Categorisers
Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Centro Latinoamericano de Estudios en Informática
2005-12-01
|
Series: | CLEI Electronic Journal |
Online Access: | http://clei.org/cleiej-beta/index.php/cleiej/article/view/319 |
Summary: | Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.
|
---|---|
ISSN: | 0717-5000 |