Predictive modelling using pathway scores: robustness and significance of pathway collections

Abstract Background Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We ai...

Full description

Bibliographic Details
Main Authors:	Marcelo P. Segura-Lepe, Hector C. Keun, Timothy M. D. Ebbels
Format:	Article
Language:	English
Published:	BMC 2019-11-01
Series:	BMC Bioinformatics
Subjects:	Pathways Robustness Predictive modelling
Online Access:	http://link.springer.com/article/10.1186/s12859-019-3163-0

id	doaj-214ae9cfec0c4d23b913d99dae08d675
record_format	Article
spelling	doaj-214ae9cfec0c4d23b913d99dae08d6752020-11-25T03:59:38ZengBMCBMC Bioinformatics1471-21052019-11-0120111110.1186/s12859-019-3163-0Predictive modelling using pathway scores: robustness and significance of pathway collectionsMarcelo P. Segura-Lepe0Hector C. Keun1Timothy M. D. Ebbels2Computational and Systems Medicine, Department of Surgery and Cancer, Sir Alexander Fleming building, Imperial CollegeDivision of Cancer, Department of Surgery and Cancer, Imperial College London, Hammersmith Hospital CampusComputational and Systems Medicine, Department of Surgery and Cancer, Sir Alexander Fleming building, Imperial CollegeAbstract Background Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a ‘pathway space’. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity. Results Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases. Conclusions Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.http://link.springer.com/article/10.1186/s12859-019-3163-0PathwaysRobustnessPredictive modelling
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Marcelo P. Segura-Lepe Hector C. Keun Timothy M. D. Ebbels
spellingShingle	Marcelo P. Segura-Lepe Hector C. Keun Timothy M. D. Ebbels Predictive modelling using pathway scores: robustness and significance of pathway collections BMC Bioinformatics Pathways Robustness Predictive modelling
author_facet	Marcelo P. Segura-Lepe Hector C. Keun Timothy M. D. Ebbels
author_sort	Marcelo P. Segura-Lepe
title	Predictive modelling using pathway scores: robustness and significance of pathway collections
title_short	Predictive modelling using pathway scores: robustness and significance of pathway collections
title_full	Predictive modelling using pathway scores: robustness and significance of pathway collections
title_fullStr	Predictive modelling using pathway scores: robustness and significance of pathway collections
title_full_unstemmed	Predictive modelling using pathway scores: robustness and significance of pathway collections
title_sort	predictive modelling using pathway scores: robustness and significance of pathway collections
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2019-11-01
description	Abstract Background Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a ‘pathway space’. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity. Results Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases. Conclusions Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.
topic	Pathways Robustness Predictive modelling
url	http://link.springer.com/article/10.1186/s12859-019-3163-0
work_keys_str_mv	AT marcelopseguralepe predictivemodellingusingpathwayscoresrobustnessandsignificanceofpathwaycollections AT hectorckeun predictivemodellingusingpathwayscoresrobustnessandsignificanceofpathwaycollections AT timothymdebbels predictivemodellingusingpathwayscoresrobustnessandsignificanceofpathwaycollections
_version_	1724453785966739456

Predictive modelling using pathway scores: robustness and significance of pathway collections

Similar Items