A Partial Least Squares based algorithm for parsimonious variable selection

<p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify asso...

Full description

Bibliographic Details
Main Authors: Mehmood Tahir, Martens Harald, Sæbø Solve, Warringer Jonas, Snipen Lars
Format: Article
Language:English
Published: BMC 2011-12-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/6/1/27
id doaj-b7ece979844d47a1a77b15c0c51c54e3
record_format Article
spelling doaj-b7ece979844d47a1a77b15c0c51c54e32020-11-24T22:20:15ZengBMCAlgorithms for Molecular Biology1748-71882011-12-01612710.1186/1748-7188-6-27A Partial Least Squares based algorithm for parsimonious variable selectionMehmood TahirMartens HaraldSæbø SolveWarringer JonasSnipen Lars<p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems.</p> <p>Results</p> <p>We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection.</p> <p>Conclusions</p> <p>A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.</p> http://www.almob.org/content/6/1/27
collection DOAJ
language English
format Article
sources DOAJ
author Mehmood Tahir
Martens Harald
Sæbø Solve
Warringer Jonas
Snipen Lars
spellingShingle Mehmood Tahir
Martens Harald
Sæbø Solve
Warringer Jonas
Snipen Lars
A Partial Least Squares based algorithm for parsimonious variable selection
Algorithms for Molecular Biology
author_facet Mehmood Tahir
Martens Harald
Sæbø Solve
Warringer Jonas
Snipen Lars
author_sort Mehmood Tahir
title A Partial Least Squares based algorithm for parsimonious variable selection
title_short A Partial Least Squares based algorithm for parsimonious variable selection
title_full A Partial Least Squares based algorithm for parsimonious variable selection
title_fullStr A Partial Least Squares based algorithm for parsimonious variable selection
title_full_unstemmed A Partial Least Squares based algorithm for parsimonious variable selection
title_sort partial least squares based algorithm for parsimonious variable selection
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2011-12-01
description <p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems.</p> <p>Results</p> <p>We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection.</p> <p>Conclusions</p> <p>A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.</p>
url http://www.almob.org/content/6/1/27
work_keys_str_mv AT mehmoodtahir apartialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT martensharald apartialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT sæbøsolve apartialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT warringerjonas apartialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT snipenlars apartialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT mehmoodtahir partialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT martensharald partialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT sæbøsolve partialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT warringerjonas partialleastsquaresbasedalgorithmforparsimoniousvariableselection
AT snipenlars partialleastsquaresbasedalgorithmforparsimoniousvariableselection
_version_ 1725776160640466944