Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

<p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by...

Full description

Bibliographic Details
Main Authors: Spackman K, Dubay C, Hersh WR, Cohen AM
Format: Article
Language:English
Published: BMC 2005-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/6/103
id doaj-1d56ad7260624eefbb2aeacdf991bf5f
record_format Article
spelling doaj-1d56ad7260624eefbb2aeacdf991bf5f2020-11-25T00:38:53ZengBMCBMC Bioinformatics1471-21052005-04-016110310.1186/1471-2105-6-103Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstractsSpackman KDubay CHersh WRCohen AM<p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.</p> <p>Results</p> <p>Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.</p> <p>Conclusion</p> <p>The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.</p> http://www.biomedcentral.com/1471-2105/6/103
collection DOAJ
language English
format Article
sources DOAJ
author Spackman K
Dubay C
Hersh WR
Cohen AM
spellingShingle Spackman K
Dubay C
Hersh WR
Cohen AM
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
BMC Bioinformatics
author_facet Spackman K
Dubay C
Hersh WR
Cohen AM
author_sort Spackman K
title Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_short Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_full Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_fullStr Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_full_unstemmed Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
title_sort using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2005-04-01
description <p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.</p> <p>Results</p> <p>Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.</p> <p>Conclusion</p> <p>The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.</p>
url http://www.biomedcentral.com/1471-2105/6/103
work_keys_str_mv AT spackmank usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT dubayc usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT hershwr usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
AT cohenam usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts
_version_ 1725296003249078272