Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

Abstract Background The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables an...

Full description

Bibliographic Details
Main Authors: Shaima Belhechmi, Riccardo De Bin, Federico Rotolo, Stefan Michiels
Format: Article
Language:English
Published: BMC 2020-07-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-03618-y
id doaj-38326abb1e6e40fa92ea4f297801797a
record_format Article
spelling doaj-38326abb1e6e40fa92ea4f297801797a2020-11-25T03:24:38ZengBMCBMC Bioinformatics1471-21052020-07-0121112010.1186/s12859-020-03618-yAccounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression modelsShaima Belhechmi0Riccardo De Bin1Federico Rotolo2Stefan Michiels3Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 OncostatDepartment of Mathematics, University of OsloBiostatistics and Data Management Unit, Innate PharmaUniversité Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM U1018 OncostatAbstract Background The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. Results We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. Conclusions The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate.http://link.springer.com/article/10.1186/s12859-020-03618-yLasso penaltyHigh-dimensionalBiomarker selectionPathwaysCox modelPrecision medicine
collection DOAJ
language English
format Article
sources DOAJ
author Shaima Belhechmi
Riccardo De Bin
Federico Rotolo
Stefan Michiels
spellingShingle Shaima Belhechmi
Riccardo De Bin
Federico Rotolo
Stefan Michiels
Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
BMC Bioinformatics
Lasso penalty
High-dimensional
Biomarker selection
Pathways
Cox model
Precision medicine
author_facet Shaima Belhechmi
Riccardo De Bin
Federico Rotolo
Stefan Michiels
author_sort Shaima Belhechmi
title Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_short Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_full Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_fullStr Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_full_unstemmed Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models
title_sort accounting for grouped predictor variables or pathways in high-dimensional penalized cox regression models
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-07-01
description Abstract Background The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. However, these selection methods focus on a homogeneous set of variables and do not take into account the case of predictors belonging to functional groups; typically, genomic data can be grouped according to biological pathways or to different types of collected data. Another challenge is that the standard lasso penalisation is known to have a high false discovery rate. Results We evaluated different penalizations in a Cox model to select grouped variables in order to further penalize variables that, in addition to having a low effect, belong to a group with a low overall effect; and to favor the selection of variables that, in addition to having a large effect, belong to a group with a large overall effect. We considered the case of prespecified and disjoint groups and proposed diverse weights for the adaptive lasso method. In particular we proposed the product Max Single Wald by Single Wald weighting (MSW*SW) which takes into account the information of the group to which it belongs and of this biomarker. Through simulations, we compared the selection and prediction ability of our approach with the standard lasso, the composite Minimax Concave Penalty (cMCP), the group exponential lasso (gel), the Integrative L1-Penalized Regression with Penalty Factors (IPF-Lasso), and the Sparse Group Lasso (SGL) methods. In addition, we illustrated the methods using gene expression data of 614 breast cancer patients. Conclusions The adaptive lasso with the MSW*SW weighting method incorporates both the information in the grouping structure and the individual variable. It outperformed the competitors by reducing the false discovery rate without severely increasing the false negative rate.
topic Lasso penalty
High-dimensional
Biomarker selection
Pathways
Cox model
Precision medicine
url http://link.springer.com/article/10.1186/s12859-020-03618-y
work_keys_str_mv AT shaimabelhechmi accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT riccardodebin accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT federicorotolo accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
AT stefanmichiels accountingforgroupedpredictorvariablesorpathwaysinhighdimensionalpenalizedcoxregressionmodels
_version_ 1724600969936764928