Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.

Model-based prediction is dependent on many choices ranging from the sample collection and prediction endpoint to the choice of algorithm and its parameters. Here we studied the effects of such choices, exemplified by predicting sensitivity (as IC50) of cancer cell lines towards a variety of compoun...

Full description

Bibliographic Details
Main Authors: Immanuel Bayer, Philip Groth, Sebastian Schneckener
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3720898?pdf=render
id doaj-419a40255b4f464f92ce7eeb8c81ae08
record_format Article
spelling doaj-419a40255b4f464f92ce7eeb8c81ae082020-11-24T21:44:20ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0187e7029410.1371/journal.pone.0070294Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.Immanuel BayerPhilip GrothSebastian SchneckenerModel-based prediction is dependent on many choices ranging from the sample collection and prediction endpoint to the choice of algorithm and its parameters. Here we studied the effects of such choices, exemplified by predicting sensitivity (as IC50) of cancer cell lines towards a variety of compounds. For this, we used three independent sample collections and applied several machine learning algorithms for predicting a variety of endpoints for drug response. We compared all possible models for combinations of sample collections, algorithm, drug, and labeling to an identically generated null model. The predictability of treatment effects varies among compounds, i.e. response could be predicted for some but not for all. The choice of sample collection plays a major role towards lowering the prediction error, as does sample size. However, we found that no algorithm was able to consistently outperform the other and there was no significant difference between regression and two- or three class predictors in this experimental setting. These results indicate that response-modeling projects should direct efforts mainly towards sample collection and data quality, rather than method adjustment.http://europepmc.org/articles/PMC3720898?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Immanuel Bayer
Philip Groth
Sebastian Schneckener
spellingShingle Immanuel Bayer
Philip Groth
Sebastian Schneckener
Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
PLoS ONE
author_facet Immanuel Bayer
Philip Groth
Sebastian Schneckener
author_sort Immanuel Bayer
title Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
title_short Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
title_full Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
title_fullStr Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
title_full_unstemmed Prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
title_sort prediction errors in learning drug response from gene expression data - influence of labeling, sample size, and machine learning algorithm.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2013-01-01
description Model-based prediction is dependent on many choices ranging from the sample collection and prediction endpoint to the choice of algorithm and its parameters. Here we studied the effects of such choices, exemplified by predicting sensitivity (as IC50) of cancer cell lines towards a variety of compounds. For this, we used three independent sample collections and applied several machine learning algorithms for predicting a variety of endpoints for drug response. We compared all possible models for combinations of sample collections, algorithm, drug, and labeling to an identically generated null model. The predictability of treatment effects varies among compounds, i.e. response could be predicted for some but not for all. The choice of sample collection plays a major role towards lowering the prediction error, as does sample size. However, we found that no algorithm was able to consistently outperform the other and there was no significant difference between regression and two- or three class predictors in this experimental setting. These results indicate that response-modeling projects should direct efforts mainly towards sample collection and data quality, rather than method adjustment.
url http://europepmc.org/articles/PMC3720898?pdf=render
work_keys_str_mv AT immanuelbayer predictionerrorsinlearningdrugresponsefromgeneexpressiondatainfluenceoflabelingsamplesizeandmachinelearningalgorithm
AT philipgroth predictionerrorsinlearningdrugresponsefromgeneexpressiondatainfluenceoflabelingsamplesizeandmachinelearningalgorithm
AT sebastianschneckener predictionerrorsinlearningdrugresponsefromgeneexpressiondatainfluenceoflabelingsamplesizeandmachinelearningalgorithm
_version_ 1725911054991491072