Stochastic Stepwise Ensembles for Variable Selection

Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In...

Full description

Bibliographic Details
Main Author:	Xin, Lu
Language:	en
Published:	2009
Subjects:	Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics
Online Access:	http://hdl.handle.net/10012/4369

id	ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-4369
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-OWTU.10012-43692013-10-04T04:09:07ZXin, Lu2009-04-30T19:32:56Z2009-04-30T19:32:56Z2009-04-30T19:32:56Z2009-04-30http://hdl.handle.net/10012/4369Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In this thesis, I propose a stochastic stepwise ensemble for variable selection, which improves upon PGA. Traditional stepwise regression (Efroymson 1960) combines forward and backward selection. One step of forward selection is followed by one step of backward selection. In the forward step, each variable other than those already included is added to the current model, one at a time, and the one that can best improve the objective function is retained. In the backward step, each variable already included is deleted from the current model, one at a time, and the one that can best improve the objective function is discarded. The algorithm continues until no improvement can be made by either the forward or the backward step. Instead of adding or deleting one variable at a time, Stochastic Stepwise Algorithm (STST) adds or deletes a group of variables at a time, where the group size is randomly decided. In traditional stepwise, the group size is one and each candidate variable is assessed. When the group size is larger than one, as is often the case for STST, the total number of variable groups can be quite large. Instead of evaluating all possible groups, only a few randomly selected groups are assessed and the best one is chosen. From a methodological point of view, the improvement of STST ensemble over PGA is due to the use of a more structured way to construct the ensemble; this allows us to better control over the strength-diversity tradeoff established by Breiman (2001). In fact, there is no mechanism to control this fundamental tradeoff in PGA. Empirically, the improvement is most prominent when a true variable in the model has a relatively small coefficient (relative to other true variables). I show empirically that PGA has a much higher probability of missing that variable.enStochastic StepwiseEnsembleParallel Genetic AlgorithmVariable Selectionstatistical learningStochastic Stepwise Ensembles for Variable SelectionThesis or DissertationStatistics and Actuarial ScienceMaster of MathematicsStatistics
collection	NDLTD
language	en
sources	NDLTD
topic	Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics
spellingShingle	Stochastic Stepwise Ensemble Parallel Genetic Algorithm Variable Selection statistical learning Statistics Xin, Lu Stochastic Stepwise Ensembles for Variable Selection
description	Ensembles methods such as AdaBoost, Bagging and Random Forest have attracted much attention in the statistical learning community in the last 15 years. Zhu and Chipman (2006) proposed the idea of using ensembles for variable selection. Their implementation used a parallel genetic algorithm (PGA). In this thesis, I propose a stochastic stepwise ensemble for variable selection, which improves upon PGA. Traditional stepwise regression (Efroymson 1960) combines forward and backward selection. One step of forward selection is followed by one step of backward selection. In the forward step, each variable other than those already included is added to the current model, one at a time, and the one that can best improve the objective function is retained. In the backward step, each variable already included is deleted from the current model, one at a time, and the one that can best improve the objective function is discarded. The algorithm continues until no improvement can be made by either the forward or the backward step. Instead of adding or deleting one variable at a time, Stochastic Stepwise Algorithm (STST) adds or deletes a group of variables at a time, where the group size is randomly decided. In traditional stepwise, the group size is one and each candidate variable is assessed. When the group size is larger than one, as is often the case for STST, the total number of variable groups can be quite large. Instead of evaluating all possible groups, only a few randomly selected groups are assessed and the best one is chosen. From a methodological point of view, the improvement of STST ensemble over PGA is due to the use of a more structured way to construct the ensemble; this allows us to better control over the strength-diversity tradeoff established by Breiman (2001). In fact, there is no mechanism to control this fundamental tradeoff in PGA. Empirically, the improvement is most prominent when a true variable in the model has a relatively small coefficient (relative to other true variables). I show empirically that PGA has a much higher probability of missing that variable.
author	Xin, Lu
author_facet	Xin, Lu
author_sort	Xin, Lu
title	Stochastic Stepwise Ensembles for Variable Selection
title_short	Stochastic Stepwise Ensembles for Variable Selection
title_full	Stochastic Stepwise Ensembles for Variable Selection
title_fullStr	Stochastic Stepwise Ensembles for Variable Selection
title_full_unstemmed	Stochastic Stepwise Ensembles for Variable Selection
title_sort	stochastic stepwise ensembles for variable selection
publishDate	2009
url	http://hdl.handle.net/10012/4369
work_keys_str_mv	AT xinlu stochasticstepwiseensemblesforvariableselection
_version_	1716600174545993728

Stochastic Stepwise Ensembles for Variable Selection

Similar Items