Randomized ensemble methods for classification trees

Approved for public release, distribution is unlimited === We propose two methods of constructing ensembles of classifiers. One method directly injects randomness into classification tree algorithms by choosing a split randomly at each node with probabilities proportional to the measure of goodness...

Full description

Bibliographic Details
Main Author: Kobayashi, Izumi
Other Authors: Buttrey, Samuel E.
Published: Monterey, California. Naval Postgraduate School 2012
Online Access:http://hdl.handle.net/10945/9801
Description
Summary:Approved for public release, distribution is unlimited === We propose two methods of constructing ensembles of classifiers. One method directly injects randomness into classification tree algorithms by choosing a split randomly at each node with probabilities proportional to the measure of goodness for a split. We combine this method with a stopping rule which uses permutation of the outputs. The other method perturbs the output and constructs a classifier using the perturbed data. In both methods, the final classifier is given by an unweighted vote of the individual classifiers. These methods are compared with bagging, Adaboost, and random forests on thirteen commonly used data sets. The results show that our methods perform better than bagging, and comparably to Adaboost and random forests on average. Additional computation shows that our perturbation method could improve its performance by perturbing both the inputs and with the outputs, and combining a sufficiently large number of trees. Plots of strength and correlation show an interesting relationship. We also explore combining sampling subsets of the training set with our proposed methods. The results of a few trials show that the performance of our proposed methods could be improved by combining sampling subsets of the training set.