Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts
<p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2005-04-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/6/103 |
id |
doaj-1d56ad7260624eefbb2aeacdf991bf5f |
---|---|
record_format |
Article |
spelling |
doaj-1d56ad7260624eefbb2aeacdf991bf5f2020-11-25T00:38:53ZengBMCBMC Bioinformatics1471-21052005-04-016110310.1186/1471-2105-6-103Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstractsSpackman KDubay CHersh WRCohen AM<p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.</p> <p>Results</p> <p>Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.</p> <p>Conclusion</p> <p>The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.</p> http://www.biomedcentral.com/1471-2105/6/103 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Spackman K Dubay C Hersh WR Cohen AM |
spellingShingle |
Spackman K Dubay C Hersh WR Cohen AM Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts BMC Bioinformatics |
author_facet |
Spackman K Dubay C Hersh WR Cohen AM |
author_sort |
Spackman K |
title |
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_short |
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_full |
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_fullStr |
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_full_unstemmed |
Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts |
title_sort |
using co-occurrence network structure to extract synonymous gene and protein names from medline abstracts |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2005-04-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.</p> <p>Results</p> <p>Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.</p> <p>Conclusion</p> <p>The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.</p> |
url |
http://www.biomedcentral.com/1471-2105/6/103 |
work_keys_str_mv |
AT spackmank usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT dubayc usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT hershwr usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts AT cohenam usingcooccurrencenetworkstructuretoextractsynonymousgeneandproteinnamesfrommedlineabstracts |
_version_ |
1725296003249078272 |