Data integration by multi-tuning parameter elastic net regression

Abstract Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive...

Full description

Bibliographic Details
Main Authors: Jie Liu, Gangning Liang, Kimberly D Siegmund, Juan Pablo Lewinger
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2401-1
id doaj-b23a285ad1b54a30b6208477ddc37f78
record_format Article
spelling doaj-b23a285ad1b54a30b6208477ddc37f782020-11-25T02:45:11ZengBMCBMC Bioinformatics1471-21052018-10-011911910.1186/s12859-018-2401-1Data integration by multi-tuning parameter elastic net regressionJie Liu0Gangning Liang1Kimberly D Siegmund2Juan Pablo Lewinger3Department of Preventive Medicine, USC Keck School of MedicineUSC Institute of Urology and the Catherine & Joseph Aresty Department of Urology, Norris Comprehensive Cancer Center, University of Southern CaliforniaDepartment of Preventive Medicine, USC Keck School of MedicineDepartment of Preventive Medicine, USC Keck School of MedicineAbstract Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally. Results We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets. Conclusion The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms.http://link.springer.com/article/10.1186/s12859-018-2401-1Data integrationClassificationElastic net
collection DOAJ
language English
format Article
sources DOAJ
author Jie Liu
Gangning Liang
Kimberly D Siegmund
Juan Pablo Lewinger
spellingShingle Jie Liu
Gangning Liang
Kimberly D Siegmund
Juan Pablo Lewinger
Data integration by multi-tuning parameter elastic net regression
BMC Bioinformatics
Data integration
Classification
Elastic net
author_facet Jie Liu
Gangning Liang
Kimberly D Siegmund
Juan Pablo Lewinger
author_sort Jie Liu
title Data integration by multi-tuning parameter elastic net regression
title_short Data integration by multi-tuning parameter elastic net regression
title_full Data integration by multi-tuning parameter elastic net regression
title_fullStr Data integration by multi-tuning parameter elastic net regression
title_full_unstemmed Data integration by multi-tuning parameter elastic net regression
title_sort data integration by multi-tuning parameter elastic net regression
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-10-01
description Abstract Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally. Results We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets. Conclusion The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms.
topic Data integration
Classification
Elastic net
url http://link.springer.com/article/10.1186/s12859-018-2401-1
work_keys_str_mv AT jieliu dataintegrationbymultituningparameterelasticnetregression
AT gangningliang dataintegrationbymultituningparameterelasticnetregression
AT kimberlydsiegmund dataintegrationbymultituningparameterelasticnetregression
AT juanpablolewinger dataintegrationbymultituningparameterelasticnetregression
_version_ 1724763602248794112