A Partial Least Squares based algorithm for parsimonious variable selection
<p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify asso...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2011-12-01
|
Series: | Algorithms for Molecular Biology |
Online Access: | http://www.almob.org/content/6/1/27 |
id |
doaj-b7ece979844d47a1a77b15c0c51c54e3 |
---|---|
record_format |
Article |
spelling |
doaj-b7ece979844d47a1a77b15c0c51c54e32020-11-24T22:20:15ZengBMCAlgorithms for Molecular Biology1748-71882011-12-01612710.1186/1748-7188-6-27A Partial Least Squares based algorithm for parsimonious variable selectionMehmood TahirMartens HaraldSæbø SolveWarringer JonasSnipen Lars<p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems.</p> <p>Results</p> <p>We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection.</p> <p>Conclusions</p> <p>A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.</p> http://www.almob.org/content/6/1/27 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mehmood Tahir Martens Harald Sæbø Solve Warringer Jonas Snipen Lars |
spellingShingle |
Mehmood Tahir Martens Harald Sæbø Solve Warringer Jonas Snipen Lars A Partial Least Squares based algorithm for parsimonious variable selection Algorithms for Molecular Biology |
author_facet |
Mehmood Tahir Martens Harald Sæbø Solve Warringer Jonas Snipen Lars |
author_sort |
Mehmood Tahir |
title |
A Partial Least Squares based algorithm for parsimonious variable selection |
title_short |
A Partial Least Squares based algorithm for parsimonious variable selection |
title_full |
A Partial Least Squares based algorithm for parsimonious variable selection |
title_fullStr |
A Partial Least Squares based algorithm for parsimonious variable selection |
title_full_unstemmed |
A Partial Least Squares based algorithm for parsimonious variable selection |
title_sort |
partial least squares based algorithm for parsimonious variable selection |
publisher |
BMC |
series |
Algorithms for Molecular Biology |
issn |
1748-7188 |
publishDate |
2011-12-01 |
description |
<p>Abstract</p> <p>Background</p> <p>In genomics, a commonly encountered problem is to extract a subset of variables out of a large set of explanatory variables associated with one or several quantitative or qualitative response variables. An example is to identify associations between codon-usage and phylogeny based definitions of taxonomic groups at different taxonomic levels. Maximum understandability with the smallest number of selected variables, consistency of the selected variables, as well as variation of model performance on test data, are issues to be addressed for such problems.</p> <p>Results</p> <p>We present an algorithm balancing the parsimony and the predictive performance of a model. The algorithm is based on variable selection using reduced-rank Partial Least Squares with a regularized elimination. Allowing a marginal decrease in model performance results in a substantial decrease in the number of selected variables. This significantly improves the understandability of the model. Within the approach we have tested and compared three different criteria commonly used in the Partial Least Square modeling paradigm for variable selection; loading weights, regression coefficients and variable importance on projections. The algorithm is applied to a problem of identifying codon variations discriminating different bacterial taxa, which is of particular interest in classifying metagenomics samples. The results are compared with a classical forward selection algorithm, the much used Lasso algorithm as well as Soft-threshold Partial Least Squares variable selection.</p> <p>Conclusions</p> <p>A regularized elimination algorithm based on Partial Least Squares produces results that increase understandability and consistency and reduces the classification error on test data compared to standard approaches.</p> |
url |
http://www.almob.org/content/6/1/27 |
work_keys_str_mv |
AT mehmoodtahir apartialleastsquaresbasedalgorithmforparsimoniousvariableselection AT martensharald apartialleastsquaresbasedalgorithmforparsimoniousvariableselection AT sæbøsolve apartialleastsquaresbasedalgorithmforparsimoniousvariableselection AT warringerjonas apartialleastsquaresbasedalgorithmforparsimoniousvariableselection AT snipenlars apartialleastsquaresbasedalgorithmforparsimoniousvariableselection AT mehmoodtahir partialleastsquaresbasedalgorithmforparsimoniousvariableselection AT martensharald partialleastsquaresbasedalgorithmforparsimoniousvariableselection AT sæbøsolve partialleastsquaresbasedalgorithmforparsimoniousvariableselection AT warringerjonas partialleastsquaresbasedalgorithmforparsimoniousvariableselection AT snipenlars partialleastsquaresbasedalgorithmforparsimoniousvariableselection |
_version_ |
1725776160640466944 |