Accuracy and Diversity in Ensembles of Text Categorisers

Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at...

Full description

Bibliographic Details
Main Authors: Juan Jose Garcıa Adeva, Ulises Cervino Beresi, Rafael A. Calvo
Format: Article
Language:English
Published: Centro Latinoamericano de Estudios en Informática 2005-12-01
Series:CLEI Electronic Journal
Online Access:http://clei.org/cleiej-beta/index.php/cleiej/article/view/319
Description
Summary:Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.
ISSN:0717-5000