evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R

Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as spli...

Full description

Bibliographic Details
Main Authors: Thomas Grubinger, Achim Zeileis, Karl-Peter Pfeiffer
Format: Article
Language:English
Published: Foundation for Open Access Statistics 2014-10-01
Series:Journal of Statistical Software
Online Access:http://www.jstatsoft.org/index.php/jss/article/view/2189
id doaj-538d7ace92f548f2873f3b3bfb31bdca
record_format Article
spelling doaj-538d7ace92f548f2873f3b3bfb31bdca2020-11-24T23:24:27ZengFoundation for Open Access StatisticsJournal of Statistical Software1548-76602014-10-0161112910.18637/jss.v061.i01793evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in RThomas GrubingerAchim ZeileisKarl-Peter PfeifferCommonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the evtree package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the partykit package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. evtree is compared to the open-source CART implementation rpart, conditional inference trees (ctree), and the open-source C4.5 implementation J48. A benchmark study of predictive accuracy and complexity is carried out in which evtree achieved at least similar and most of the time better results compared to rpart, ctree, and J48. Furthermore, the usefulness of evtree in practice is illustrated in a textbook customer classification task.http://www.jstatsoft.org/index.php/jss/article/view/2189
collection DOAJ
language English
format Article
sources DOAJ
author Thomas Grubinger
Achim Zeileis
Karl-Peter Pfeiffer
spellingShingle Thomas Grubinger
Achim Zeileis
Karl-Peter Pfeiffer
evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
Journal of Statistical Software
author_facet Thomas Grubinger
Achim Zeileis
Karl-Peter Pfeiffer
author_sort Thomas Grubinger
title evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
title_short evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
title_full evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
title_fullStr evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
title_full_unstemmed evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R
title_sort evtree: evolutionary learning of globally optimal classification and regression trees in r
publisher Foundation for Open Access Statistics
series Journal of Statistical Software
issn 1548-7660
publishDate 2014-10-01
description Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the evtree package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the partykit package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. evtree is compared to the open-source CART implementation rpart, conditional inference trees (ctree), and the open-source C4.5 implementation J48. A benchmark study of predictive accuracy and complexity is carried out in which evtree achieved at least similar and most of the time better results compared to rpart, ctree, and J48. Furthermore, the usefulness of evtree in practice is illustrated in a textbook customer classification task.
url http://www.jstatsoft.org/index.php/jss/article/view/2189
work_keys_str_mv AT thomasgrubinger evtreeevolutionarylearningofgloballyoptimalclassificationandregressiontreesinr
AT achimzeileis evtreeevolutionarylearningofgloballyoptimalclassificationandregressiontreesinr
AT karlpeterpfeiffer evtreeevolutionarylearningofgloballyoptimalclassificationandregressiontreesinr
_version_ 1725560533796519936