Automated recognition of functional compound-protein relationships in literature.

MOTIVATION:Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open...

Full description

Bibliographic Details
Main Authors:	Kersten Döring, Ammar Qaseem, Michael Becer, Jianyu Li, Pankaj Mishra, Mingjie Gao, Pascal Kirchner, Florian Sauter, Kiran K Telukunta, Aurélien F A Moumbock, Philippe Thomas, Stefan Günther
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2020-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0220925

id	doaj-bd52bdcda3eb4a59ac2ec64f731edda5
record_format	Article
spelling	doaj-bd52bdcda3eb4a59ac2ec64f731edda52021-03-03T21:31:57ZengPublic Library of Science (PLoS)PLoS ONE1932-62032020-01-01153e022092510.1371/journal.pone.0220925Automated recognition of functional compound-protein relationships in literature.Kersten DöringAmmar QaseemMichael BecerJianyu LiPankaj MishraMingjie GaoPascal KirchnerFlorian SauterKiran K TelukuntaAurélien F A MoumbockPhilippe ThomasStefan GüntherMOTIVATION:Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task. METHOD:We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated. RESULTS:The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time. AVAILABILITY:The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.https://doi.org/10.1371/journal.pone.0220925
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Kersten Döring Ammar Qaseem Michael Becer Jianyu Li Pankaj Mishra Mingjie Gao Pascal Kirchner Florian Sauter Kiran K Telukunta Aurélien F A Moumbock Philippe Thomas Stefan Günther
spellingShingle	Kersten Döring Ammar Qaseem Michael Becer Jianyu Li Pankaj Mishra Mingjie Gao Pascal Kirchner Florian Sauter Kiran K Telukunta Aurélien F A Moumbock Philippe Thomas Stefan Günther Automated recognition of functional compound-protein relationships in literature. PLoS ONE
author_facet	Kersten Döring Ammar Qaseem Michael Becer Jianyu Li Pankaj Mishra Mingjie Gao Pascal Kirchner Florian Sauter Kiran K Telukunta Aurélien F A Moumbock Philippe Thomas Stefan Günther
author_sort	Kersten Döring
title	Automated recognition of functional compound-protein relationships in literature.
title_short	Automated recognition of functional compound-protein relationships in literature.
title_full	Automated recognition of functional compound-protein relationships in literature.
title_fullStr	Automated recognition of functional compound-protein relationships in literature.
title_full_unstemmed	Automated recognition of functional compound-protein relationships in literature.
title_sort	automated recognition of functional compound-protein relationships in literature.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2020-01-01
description	MOTIVATION:Much effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task. METHOD:We created a new benchmark dataset of 2,613 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated. RESULTS:The cross-validation of the all-paths graph kernel (AUC value: 84.6%, F1 score: 79.0%) shows slightly better results than the shallow linguistic kernel (AUC value: 82.5%, F1 score: 77.2%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance slightly. We used each of the two kernels to identify functional relationships in all PubMed abstracts (29 million) and provide the results, including recorded processing time. AVAILABILITY:The software for the tested kernels, the benchmark, the processed 29 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.
url	https://doi.org/10.1371/journal.pone.0220925
work_keys_str_mv	AT kerstendoring automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT ammarqaseem automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT michaelbecer automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT jianyuli automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT pankajmishra automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT mingjiegao automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT pascalkirchner automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT floriansauter automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT kiranktelukunta automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT aurelienfamoumbock automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT philippethomas automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature AT stefangunther automatedrecognitionoffunctionalcompoundproteinrelationshipsinliterature
_version_	1714816421154783232

Automated recognition of functional compound-protein relationships in literature.

Similar Items