Content-rich biological network constructed by mining PubMed abstracts

<p>Abstract</p> <p>Background</p> <p>The integration of the rapidly expanding corpus of information about the genome, transcriptome, and proteome, engendered by powerful technological advances, such as microarrays, and the availability of genomic sequence from multiple...

Full description

Bibliographic Details
Main Authors: Sharp Burt M, Chen Hao
Format: Article
Language:English
Published: BMC 2004-10-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/147
id doaj-16460e2bed3b4c4684b4e70f09d3f1ea
record_format Article
spelling doaj-16460e2bed3b4c4684b4e70f09d3f1ea2020-11-25T00:27:33ZengBMCBMC Bioinformatics1471-21052004-10-015114710.1186/1471-2105-5-147Content-rich biological network constructed by mining PubMed abstractsSharp Burt MChen Hao<p>Abstract</p> <p>Background</p> <p>The integration of the rapidly expanding corpus of information about the genome, transcriptome, and proteome, engendered by powerful technological advances, such as microarrays, and the availability of genomic sequence from multiple species, challenges the grasp and comprehension of the scientific community. Despite the existence of text-mining methods that identify biological relationships based on the textual co-occurrence of gene/protein terms or similarities in abstract texts, knowledge of the underlying molecular connections on a large scale, which is prerequisite to understanding novel biological processes, lags far behind the accumulation of data. While computationally efficient, the co-occurrence-based approaches fail to characterize (e.g., inhibition or stimulation, directionality) biological interactions. Programs with natural language processing (NLP) capability have been created to address these limitations, however, they are in general not readily accessible to the public.</p> <p>Results</p> <p>We present a NLP-based text-mining approach, Chilibot, which constructs content-rich relationship networks among biological concepts, genes, proteins, or drugs. Amongst its features, suggestions for new hypotheses can be generated. Lastly, we provide evidence that the connectivity of molecular networks extracted from the biological literature follows the power-law distribution, indicating scale-free topologies consistent with the results of previous experimental analyses.</p> <p>Conclusions</p> <p>Chilibot distills scientific relationships from knowledge available throughout a wide range of biological domains and presents these in a content-rich graphical format, thus integrating general biomedical knowledge with the specialized knowledge and interests of the user. Chilibot <url>http://www.chilibot.net</url> can be accessed free of charge to academic users.</p> http://www.biomedcentral.com/1471-2105/5/147
collection DOAJ
language English
format Article
sources DOAJ
author Sharp Burt M
Chen Hao
spellingShingle Sharp Burt M
Chen Hao
Content-rich biological network constructed by mining PubMed abstracts
BMC Bioinformatics
author_facet Sharp Burt M
Chen Hao
author_sort Sharp Burt M
title Content-rich biological network constructed by mining PubMed abstracts
title_short Content-rich biological network constructed by mining PubMed abstracts
title_full Content-rich biological network constructed by mining PubMed abstracts
title_fullStr Content-rich biological network constructed by mining PubMed abstracts
title_full_unstemmed Content-rich biological network constructed by mining PubMed abstracts
title_sort content-rich biological network constructed by mining pubmed abstracts
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2004-10-01
description <p>Abstract</p> <p>Background</p> <p>The integration of the rapidly expanding corpus of information about the genome, transcriptome, and proteome, engendered by powerful technological advances, such as microarrays, and the availability of genomic sequence from multiple species, challenges the grasp and comprehension of the scientific community. Despite the existence of text-mining methods that identify biological relationships based on the textual co-occurrence of gene/protein terms or similarities in abstract texts, knowledge of the underlying molecular connections on a large scale, which is prerequisite to understanding novel biological processes, lags far behind the accumulation of data. While computationally efficient, the co-occurrence-based approaches fail to characterize (e.g., inhibition or stimulation, directionality) biological interactions. Programs with natural language processing (NLP) capability have been created to address these limitations, however, they are in general not readily accessible to the public.</p> <p>Results</p> <p>We present a NLP-based text-mining approach, Chilibot, which constructs content-rich relationship networks among biological concepts, genes, proteins, or drugs. Amongst its features, suggestions for new hypotheses can be generated. Lastly, we provide evidence that the connectivity of molecular networks extracted from the biological literature follows the power-law distribution, indicating scale-free topologies consistent with the results of previous experimental analyses.</p> <p>Conclusions</p> <p>Chilibot distills scientific relationships from knowledge available throughout a wide range of biological domains and presents these in a content-rich graphical format, thus integrating general biomedical knowledge with the specialized knowledge and interests of the user. Chilibot <url>http://www.chilibot.net</url> can be accessed free of charge to academic users.</p>
url http://www.biomedcentral.com/1471-2105/5/147
work_keys_str_mv AT sharpburtm contentrichbiologicalnetworkconstructedbyminingpubmedabstracts
AT chenhao contentrichbiologicalnetworkconstructedbyminingpubmedabstracts
_version_ 1725339147561861120