The partitioned LASSO-patternsearch algorithm with application to gene expression data

<p>Abstract</p> <p>Background</p> <p>In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcod...

Full description

Bibliographic Details
Main Authors: Shi Weiliang, Wahba Grace, Irizarry Rafael A, Bravo Hector, Wright Stephen J
Format: Article
Language:English
Published: BMC 2012-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/13/98
id doaj-548d93c3501b4c8db169d21d3ff3517c
record_format Article
spelling doaj-548d93c3501b4c8db169d21d3ff3517c2020-11-25T00:26:06ZengBMCBMC Bioinformatics1471-21052012-05-011319810.1186/1471-2105-13-98The partitioned LASSO-patternsearch algorithm with application to gene expression dataShi WeiliangWahba GraceIrizarry Rafael ABravo HectorWright Stephen J<p>Abstract</p> <p>Background</p> <p>In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings.</p> <p>Results</p> <p>The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes.</p> <p>Conclusions</p> <p>We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.</p> http://www.biomedcentral.com/1471-2105/13/98
collection DOAJ
language English
format Article
sources DOAJ
author Shi Weiliang
Wahba Grace
Irizarry Rafael A
Bravo Hector
Wright Stephen J
spellingShingle Shi Weiliang
Wahba Grace
Irizarry Rafael A
Bravo Hector
Wright Stephen J
The partitioned LASSO-patternsearch algorithm with application to gene expression data
BMC Bioinformatics
author_facet Shi Weiliang
Wahba Grace
Irizarry Rafael A
Bravo Hector
Wright Stephen J
author_sort Shi Weiliang
title The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_short The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_full The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_fullStr The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_full_unstemmed The partitioned LASSO-patternsearch algorithm with application to gene expression data
title_sort partitioned lasso-patternsearch algorithm with application to gene expression data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2012-05-01
description <p>Abstract</p> <p>Background</p> <p>In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly to modern regularization approaches that thrive in high-dimensional settings.</p> <p>Results</p> <p>The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes.</p> <p>Conclusions</p> <p>We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several alternative methodologies.</p>
url http://www.biomedcentral.com/1471-2105/13/98
work_keys_str_mv AT shiweiliang thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wahbagrace thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT irizarryrafaela thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT bravohector thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wrightstephenj thepartitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT shiweiliang partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wahbagrace partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT irizarryrafaela partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT bravohector partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
AT wrightstephenj partitionedlassopatternsearchalgorithmwithapplicationtogeneexpressiondata
_version_ 1725346045310795776