An integrated text mining framework for metabolic interaction network reconstruction
Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valua...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2016-03-01
|
Series: | PeerJ |
Subjects: | |
Online Access: | https://peerj.com/articles/1811.pdf |
id |
doaj-4f9849eab4454018be5cec5335c7024a |
---|---|
record_format |
Article |
spelling |
doaj-4f9849eab4454018be5cec5335c7024a2020-11-24T23:40:00ZengPeerJ Inc.PeerJ2167-83592016-03-014e181110.7717/peerj.1811An integrated text mining framework for metabolic interaction network reconstructionPreecha Patumcharoenpol0Narumol Doungpan1Asawin Meechai2Bairong Shen3Jonathan H. Chan4Wanwipa Vongsangnak5Systems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandSchool of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandSystems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandCenter for Systems Biology, Soochow University, Suzhou, ChinaSystems Biology and Bioinformatics Laboratory, King Mongkut’s University of Technology Thonburi, Bangkok, ThailandCenter for Systems Biology, Soochow University, Suzhou, ChinaText mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon.https://peerj.com/articles/1811.pdfCorpusMetabolic entitiesText mining (TM)Integrated frameworkMetabolic interaction network |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Preecha Patumcharoenpol Narumol Doungpan Asawin Meechai Bairong Shen Jonathan H. Chan Wanwipa Vongsangnak |
spellingShingle |
Preecha Patumcharoenpol Narumol Doungpan Asawin Meechai Bairong Shen Jonathan H. Chan Wanwipa Vongsangnak An integrated text mining framework for metabolic interaction network reconstruction PeerJ Corpus Metabolic entities Text mining (TM) Integrated framework Metabolic interaction network |
author_facet |
Preecha Patumcharoenpol Narumol Doungpan Asawin Meechai Bairong Shen Jonathan H. Chan Wanwipa Vongsangnak |
author_sort |
Preecha Patumcharoenpol |
title |
An integrated text mining framework for metabolic interaction network reconstruction |
title_short |
An integrated text mining framework for metabolic interaction network reconstruction |
title_full |
An integrated text mining framework for metabolic interaction network reconstruction |
title_fullStr |
An integrated text mining framework for metabolic interaction network reconstruction |
title_full_unstemmed |
An integrated text mining framework for metabolic interaction network reconstruction |
title_sort |
integrated text mining framework for metabolic interaction network reconstruction |
publisher |
PeerJ Inc. |
series |
PeerJ |
issn |
2167-8359 |
publishDate |
2016-03-01 |
description |
Text mining (TM) in the field of biology is fast becoming a routine analysis for the extraction and curation of biological entities (e.g., genes, proteins, simple chemicals) as well as their relationships. Due to the wide applicability of TM in situations involving complex relationships, it is valuable to apply TM to the extraction of metabolic interactions (i.e., enzyme and metabolite interactions) through metabolic events. Here we present an integrated TM framework containing two modules for the extraction of metabolic events (Metabolic Event Extraction module—MEE) and for the construction of a metabolic interaction network (Metabolic Interaction Network Reconstruction module—MINR). The proposed integrated TM framework performed well based on standard measures of recall, precision and F-score. Evaluation of the MEE module using the constructed Metabolic Entities (ME) corpus yielded F-scores of 59.15% and 48.59% for the detection of metabolic events for production and consumption, respectively. As for the testing of the entity tagger for Gene and Protein (GP) and metabolite with the test corpus, the obtained F-score was greater than 80% for the Superpathway of leucine, valine, and isoleucine biosynthesis. Mapping of enzyme and metabolite interactions through network reconstruction showed a fair performance for the MINR module on the test corpus with F-score >70%. Finally, an application of our integrated TM framework on a big-scale data (i.e., EcoCyc extraction data) for reconstructing a metabolic interaction network showed reasonable precisions at 69.93%, 70.63% and 46.71% for enzyme, metabolite and enzyme–metabolite interaction, respectively. This study presents the first open-source integrated TM framework for reconstructing a metabolic interaction network. This framework can be a powerful tool that helps biologists to extract metabolic events for further reconstruction of a metabolic interaction network. The ME corpus, test corpus, source code, and virtual machine image with pre-configured software are available at www.sbi.kmutt.ac.th/ preecha/metrecon. |
topic |
Corpus Metabolic entities Text mining (TM) Integrated framework Metabolic interaction network |
url |
https://peerj.com/articles/1811.pdf |
work_keys_str_mv |
AT preechapatumcharoenpol anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT narumoldoungpan anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT asawinmeechai anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT bairongshen anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT jonathanhchan anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT wanwipavongsangnak anintegratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT preechapatumcharoenpol integratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT narumoldoungpan integratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT asawinmeechai integratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT bairongshen integratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT jonathanhchan integratedtextminingframeworkformetabolicinteractionnetworkreconstruction AT wanwipavongsangnak integratedtextminingframeworkformetabolicinteractionnetworkreconstruction |
_version_ |
1725511416090198016 |