A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.

<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to h...

Full description

Bibliographic Details
Main Authors:	Nisha Puthiyedth, Carlos Riveros, Regina Berretta, Pablo Moscato
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0127702

id	doaj-eafe9c08388a4bc8af1a37707ada96ce
record_format	Article
spelling	doaj-eafe9c08388a4bc8af1a37707ada96ce2021-03-04T07:58:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01106e012770210.1371/journal.pone.0127702A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.Nisha PuthiyedthCarlos RiverosRegina BerrettaPablo Moscato<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.https://doi.org/10.1371/journal.pone.0127702
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato
spellingShingle	Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. PLoS ONE
author_facet	Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato
author_sort	Nisha Puthiyedth
title	A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_short	A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_full	A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_fullStr	A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_full_unstemmed	A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
title_sort	new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2015-01-01
description	<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.
url	https://doi.org/10.1371/journal.pone.0127702
work_keys_str_mv	AT nishaputhiyedth anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT carlosriveros anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT reginaberretta anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT pablomoscato anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT nishaputhiyedth newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT carlosriveros newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT reginaberretta newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT pablomoscato newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy
_version_	1714808221348134912

A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.

Similar Items