Analyzing and pruning ensembles utilizing bias and variance theory

Ensemble methods are widely preferred over single classifiers due to the advantages they offer in terms of accuracy, complexity and flexibility. In this doctoral study, the aim is to understand and analyze ensembles while offering new design and pruning techniques. Bias-variance frameworks have been...

Full description

Bibliographic Details
Main Author: Zor, Cemre
Published: University of Surrey 2014
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.654766
Description
Summary:Ensemble methods are widely preferred over single classifiers due to the advantages they offer in terms of accuracy, complexity and flexibility. In this doctoral study, the aim is to understand and analyze ensembles while offering new design and pruning techniques. Bias-variance frameworks have been used as the main means of analysis, and Error Correcting Output Coding (ECOC) as an ensemble technique has been studied as a case study within each chapter. ECOC is a powerful multiclass ensemble classification technique, in which multiple two class base classifiers are trained using relabeled sets of the multiclass training data. The relabeling information is obtained from a preset code matrix. The main idea behind this procedure is to solve the original multiclass problem by combining the decision boundaries obtained from simpler two class decompositions. While ECOC is one of the best solutions to multiclass problems, it is still suboptimal. In this thesis, we have initially presented two algorithms that iteratively update the ECOC framework to improve the performance without a need of re-training. As a second step, in order to explain the underlying reasons behind the improved performance of ensembles and give hints on their designs, we have used bias and variance analysis. The ECOC framework has been theoretically analyzed using Tumer and Ghosh (T&G) bias-variance model, and its performance has been linked to that of its base classifiers. Accordingly, design hints on ECOC have been proposed. Moreover, the definition of James has been used for experimentation in order to explain the reasoning behind the success of ECOC compared to single multiclass classifiers and bagging ensembles. Furthermore for bias-variance analysis, we have established the missing links between some of the popular theories (theories of Geman, T&G and James) existing in the literature by providing closed form solutions. The final contribution of this thesis is on ensemble pruning. In order to increase efficiency and decrease computational and storage costs without sacrificing and preferably enhancing the generalization performance, two novel pruning algorithms to be used for bagging and ECOC ensembles have been proposed. The proposed methods, which are shown to achieve results better than the state of the art, are theoretically and experimentally analysed. The analysis also embodies the bias and variance theory.