An integrated text mining framework for metabolic interaction network reconstruction

Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valua...

Full description

Bibliographic Details
Main Authors: Preecha Patumcharoenpol, Narumol Doungpan, Asawin Meechai, Bairong Shen, Jonathan H. Chan, Wanwipa Vongsangnak
Format: Article
Language:English
Published: PeerJ Inc. 2016-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/1811.pdf
id doaj-4f9849eab4454018be5cec5335c7024a
record_format Article
spelling doaj-4f9849eab4454018be5cec5335c7024a2020-11-24T23:40:00ZengPeerJ Inc.PeerJ2167-83592016-03-014e181110.7717/peerj.1811An integrated text mining framework for metabolic interaction network reconstructionPreecha Patumcharoenpol0Narumol Doungpan1Asawin Meechai2Bairong Shen3Jonathan H. Chan4Wanwipa Vongsangnak5Systems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandSchool of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandSystems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandCenter for Systems Biology, Soochow University, Suzhou, ChinaSystems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandCenter for Systems Biology, Soochow University, Suzhou, ChinaText mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.https://peerj.com/articles/1811.pdfCorpusMetabolic entitiesText mining (TM)Integrated frameworkMetabolic interaction network
collection DOAJ
language English
format Article
sources DOAJ
author Preecha Patumcharoenpol
Narumol Doungpan
Asawin Meechai
Bairong Shen
Jonathan H. Chan
Wanwipa Vongsangnak
spellingShingle Preecha Patumcharoenpol
Narumol Doungpan
Asawin Meechai
Bairong Shen
Jonathan H. Chan
Wanwipa Vongsangnak
An integrated text mining framework for metabolic interaction network reconstruction
PeerJ
Corpus
Metabolic entities
Text mining (TM)
Integrated framework
Metabolic interaction network
author_facet Preecha Patumcharoenpol
Narumol Doungpan
Asawin Meechai
Bairong Shen
Jonathan H. Chan
Wanwipa Vongsangnak
author_sort Preecha Patumcharoenpol
title An integrated text mining framework for metabolic interaction network reconstruction
title_short An integrated text mining framework for metabolic interaction network reconstruction
title_full An integrated text mining framework for metabolic interaction network reconstruction
title_fullStr An integrated text mining framework for metabolic interaction network reconstruction
title_full_unstemmed An integrated text mining framework for metabolic interaction network reconstruction
title_sort integrated text mining framework for metabolic interaction network reconstruction
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2016-03-01
description Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.
topic Corpus
Metabolic entities
Text mining (TM)
Integrated framework
Metabolic interaction network
url https://peerj.com/articles/1811.pdf
work_keys_str_mv AT preechapatumcharoenpol anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT narumoldoungpan anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT asawinmeechai anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT bairongshen anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT jonathanhchan anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT wanwipavongsangnak anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT preechapatumcharoenpol integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT narumoldoungpan integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT asawinmeechai integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT bairongshen integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT jonathanhchan integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
AT wanwipavongsangnak integratedtextminingframeworkformetabolicinteractionnetworkreconstruction
_version_ 1725511416090198016