Enhancement of chemical entity identification in text using semantic similarity validation.
With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors th...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2013-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC3642108?pdf=render |
id |
doaj-1063b1f8c471430a97f015881c8cca3f |
---|---|
record_format |
Article |
spelling |
doaj-1063b1f8c471430a97f015881c8cca3f2020-11-25T01:16:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0185e6298410.1371/journal.pone.0062984Enhancement of chemical entity identification in text using semantic similarity validation.Tiago GregoFrancisco M CoutoWith the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/.http://europepmc.org/articles/PMC3642108?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tiago Grego Francisco M Couto |
spellingShingle |
Tiago Grego Francisco M Couto Enhancement of chemical entity identification in text using semantic similarity validation. PLoS ONE |
author_facet |
Tiago Grego Francisco M Couto |
author_sort |
Tiago Grego |
title |
Enhancement of chemical entity identification in text using semantic similarity validation. |
title_short |
Enhancement of chemical entity identification in text using semantic similarity validation. |
title_full |
Enhancement of chemical entity identification in text using semantic similarity validation. |
title_fullStr |
Enhancement of chemical entity identification in text using semantic similarity validation. |
title_full_unstemmed |
Enhancement of chemical entity identification in text using semantic similarity validation. |
title_sort |
enhancement of chemical entity identification in text using semantic similarity validation. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2013-01-01 |
description |
With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/. |
url |
http://europepmc.org/articles/PMC3642108?pdf=render |
work_keys_str_mv |
AT tiagogrego enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation AT franciscomcouto enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation |
_version_ |
1725150847728353280 |