Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.

The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In th...

Full description

Bibliographic Details
Main Authors: E Tejera, I Carrera, Karina Jimenes-Vargas, V Armijos-Jaramillo, A Sánchez-Rodríguez, M Cruz-Monteagudo, Y Perez-Castillo
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0223276
id doaj-bffb98601ca8405dbcd9acbae6068823
record_format Article
spelling doaj-bffb98601ca8405dbcd9acbae60688232021-03-03T21:06:37ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011410e022327610.1371/journal.pone.0223276Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.E TejeraI CarreraKarina Jimenes-VargasV Armijos-JaramilloA Sánchez-RodríguezM Cruz-MonteagudoY Perez-CastilloThe prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.https://doi.org/10.1371/journal.pone.0223276
collection DOAJ
language English
format Article
sources DOAJ
author E Tejera
I Carrera
Karina Jimenes-Vargas
V Armijos-Jaramillo
A Sánchez-Rodríguez
M Cruz-Monteagudo
Y Perez-Castillo
spellingShingle E Tejera
I Carrera
Karina Jimenes-Vargas
V Armijos-Jaramillo
A Sánchez-Rodríguez
M Cruz-Monteagudo
Y Perez-Castillo
Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
PLoS ONE
author_facet E Tejera
I Carrera
Karina Jimenes-Vargas
V Armijos-Jaramillo
A Sánchez-Rodríguez
M Cruz-Monteagudo
Y Perez-Castillo
author_sort E Tejera
title Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
title_short Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
title_full Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
title_fullStr Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
title_full_unstemmed Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
title_sort cell fishing: a similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2019-01-01
description The prediction of cell-lines sensitivity to a given set of compounds is a very important factor in the optimization of in-vitro assays. To date, the most common prediction strategies are based upon machine learning or other quantitative structure-activity relationships (QSAR) based approaches. In the present research, we propose and discuss a straightforward strategy not based on any learning modelling but exclusively relying upon the chemical similarity of a query compound to reference compounds with annotated activity against cell lines. We also compare the performance of the proposed method to machine learning predictions on the same problem. A curated database of compounds-cell lines associations derived from ChemBL version 22 was created for algorithm construction and cross-validation. Validation was done using 10-fold cross-validation and testing the models on new data obtained from ChemBL version 25. In terms of accuracy, both methods perform similarly with values around 0.65 across 750 cell lines in 10-fold cross-validation experiments. By combining both methods it is possible to achieve 66% of correct classification rate in more than 26000 newly reported interactions comprising 11000 new compounds. A Web Service implementing the described approaches (both similarity and machine learning based models) is freely available at: http://bioquimio.udla.edu.ec/cellfishing.
url https://doi.org/10.1371/journal.pone.0223276
work_keys_str_mv AT etejera cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT icarrera cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT karinajimenesvargas cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT varmijosjaramillo cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT asanchezrodriguez cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT mcruzmonteagudo cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
AT yperezcastillo cellfishingasimilaritybasedapproachandmachinelearningstrategyformultiplecelllinescompoundsensitivityprediction
_version_ 1714818710484549632