A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.
<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to h...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2015-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0127702 |
id |
doaj-eafe9c08388a4bc8af1a37707ada96ce |
---|---|
record_format |
Article |
spelling |
doaj-eafe9c08388a4bc8af1a37707ada96ce2021-03-04T07:58:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01106e012770210.1371/journal.pone.0127702A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study.Nisha PuthiyedthCarlos RiverosRegina BerrettaPablo Moscato<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.https://doi.org/10.1371/journal.pone.0127702 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato |
spellingShingle |
Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. PLoS ONE |
author_facet |
Nisha Puthiyedth Carlos Riveros Regina Berretta Pablo Moscato |
author_sort |
Nisha Puthiyedth |
title |
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. |
title_short |
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. |
title_full |
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. |
title_fullStr |
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. |
title_full_unstemmed |
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. |
title_sort |
new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2015-01-01 |
description |
<h4>Background</h4>The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics.<h4>Methods</h4>We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone.<h4>Results</h4>Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. |
url |
https://doi.org/10.1371/journal.pone.0127702 |
work_keys_str_mv |
AT nishaputhiyedth anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT carlosriveros anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT reginaberretta anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT pablomoscato anewcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT nishaputhiyedth newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT carlosriveros newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT reginaberretta newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy AT pablomoscato newcombinatorialoptimizationapproachforintegratedfeatureselectionusingdifferentdatasetsaprostatecancertranscriptomicstudy |
_version_ |
1714808221348134912 |