Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced...

Full description

Bibliographic Details
Main Authors: Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Álvarez, Rosa E. Lillo, Sara López-Taruella, María del Monte-Millán, Antonio C. Picornell, Miguel Martín, Juan Romo
Format: Article
Language:English
Published: MDPI AG 2021-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/3/222
id doaj-ffd2cb097a2a4861bef04e28e3c4f182
record_format Article
spelling doaj-ffd2cb097a2a4861bef04e28e3c4f1822021-01-24T00:02:05ZengMDPI AGMathematics2227-73902021-01-01922222210.3390/math9030222Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast CancerJuan C. Laria0M. Carmen Aguilera-Morillo1Enrique Álvarez2Rosa E. Lillo3Sara López-Taruella4María del Monte-Millán5Antonio C. Picornell6Miguel Martín7Juan Romo8UC3M-BS Santander Big Data Institute, 28903 Getafe, SpainUC3M-BS Santander Big Data Institute, 28903 Getafe, SpainDepartment of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, SpainUC3M-BS Santander Big Data Institute, 28903 Getafe, SpainDepartment of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, SpainDepartment of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, SpainDepartment of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, SpainDepartment of Medical Oncology, Hospital General Universitario Gregorio Marañón, Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, SpainUC3M-BS Santander Big Data Institute, 28903 Getafe, SpainOver the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.https://www.mdpi.com/2227-7390/9/3/222variable selectionhigh dimensionregularizationclassificationsparse-group lasso
collection DOAJ
language English
format Article
sources DOAJ
author Juan C. Laria
M. Carmen Aguilera-Morillo
Enrique Álvarez
Rosa E. Lillo
Sara López-Taruella
María del Monte-Millán
Antonio C. Picornell
Miguel Martín
Juan Romo
spellingShingle Juan C. Laria
M. Carmen Aguilera-Morillo
Enrique Álvarez
Rosa E. Lillo
Sara López-Taruella
María del Monte-Millán
Antonio C. Picornell
Miguel Martín
Juan Romo
Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
Mathematics
variable selection
high dimension
regularization
classification
sparse-group lasso
author_facet Juan C. Laria
M. Carmen Aguilera-Morillo
Enrique Álvarez
Rosa E. Lillo
Sara López-Taruella
María del Monte-Millán
Antonio C. Picornell
Miguel Martín
Juan Romo
author_sort Juan C. Laria
title Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
title_short Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
title_full Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
title_fullStr Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
title_full_unstemmed Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer
title_sort iterative variable selection for high-dimensional data: prediction of pathological response in triple-negative breast cancer
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2021-01-01
description Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study.
topic variable selection
high dimension
regularization
classification
sparse-group lasso
url https://www.mdpi.com/2227-7390/9/3/222
work_keys_str_mv AT juanclaria iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT mcarmenaguileramorillo iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT enriquealvarez iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT rosaelillo iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT saralopeztaruella iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT mariadelmontemillan iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT antoniocpicornell iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT miguelmartin iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
AT juanromo iterativevariableselectionforhighdimensionaldatapredictionofpathologicalresponseintriplenegativebreastcancer
_version_ 1724326937556418560