A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.

<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to h...

Full description

Bibliographic Details
Main Authors: Nisha Puthiyedth, Carlos Riveros, Regina Berretta, Pablo Moscato
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2015-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0127702
id doaj-eafe9c08388a4bc8af1a37707ada96ce
record_format Article
spelling doaj-eafe9c08388a4bc8af1a37707ada96ce2021-03-04T07:58:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01106e012770210.1371/journal.pone.0127702A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.Nisha PuthiyedthCarlos RiverosRegina BerrettaPablo Moscato<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.https://doi.org/10.1371/journal.pone.0127702
collection DOAJ
language English
format Article
sources DOAJ
author Nisha Puthiyedth
Carlos Riveros
Regina Berretta
Pablo Moscato
spellingShingle Nisha Puthiyedth
Carlos Riveros
Regina Berretta
Pablo Moscato
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
PLoS ONE
author_facet Nisha Puthiyedth
Carlos Riveros
Regina Berretta
Pablo Moscato
author_sort Nisha Puthiyedth
title A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_short A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_full A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_fullStr A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_full_unstemmed A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_sort new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2015-01-01
description <h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.
url https://doi.org/10.1371/journal.pone.0127702
work_keys_str_mv AT nishaputhiyedth anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT carlosriveros anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT reginaberretta anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT pablomoscato anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT nishaputhiyedth newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT carlosriveros newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT reginaberretta newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
AT pablomoscato newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
_version_ 1714808221348134912