eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models
Abstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-04-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-019-2778-5 |
id |
doaj-55c0b1f0bb1e4071ab454d748f8002fd |
---|---|
record_format |
Article |
spelling |
doaj-55c0b1f0bb1e4071ab454d748f8002fd2020-11-25T02:01:04ZengBMCBMC Bioinformatics1471-21052019-04-0120111110.1186/s12859-019-2778-5eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear modelsJulián Candia0John S Tsang1Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, National Institutes of HealthTrans-NIH Center for Human Immunology (CHI), National Institute of Allergy and Infectious Diseases, National Institutes of HealthAbstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. Results eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. Conclusions This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer.http://link.springer.com/article/10.1186/s12859-019-2778-5SoftwareR packageGeneralized linear modelsRegressionClassificationRegularization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Julián Candia John S Tsang |
spellingShingle |
Julián Candia John S Tsang eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models BMC Bioinformatics Software R package Generalized linear models Regression Classification Regularization |
author_facet |
Julián Candia John S Tsang |
author_sort |
Julián Candia |
title |
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models |
title_short |
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models |
title_full |
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models |
title_fullStr |
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models |
title_full_unstemmed |
eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models |
title_sort |
enetxplorer: an r package for the quantitative exploration of elastic net families for generalized linear models |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2019-04-01 |
description |
Abstract Background Regularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance. Results eNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization. Conclusions This package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs. The package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer. |
topic |
Software R package Generalized linear models Regression Classification Regularization |
url |
http://link.springer.com/article/10.1186/s12859-019-2778-5 |
work_keys_str_mv |
AT juliancandia enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels AT johnstsang enetxploreranrpackageforthequantitativeexplorationofelasticnetfamiliesforgeneralizedlinearmodels |
_version_ |
1724959056587653120 |