Enhancement of chemical entity identification in text using semantic similarity validation.

With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors th...

Full description

Bibliographic Details
Main Authors:	Tiago Grego, Francisco M Couto
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2013-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC3642108?pdf=render

id	doaj-1063b1f8c471430a97f015881c8cca3f
record_format	Article
spelling	doaj-1063b1f8c471430a97f015881c8cca3f2020-11-25T01:16:11ZengPublic Library of Science (PLoS)PLoS ONE1932-62032013-01-0185e6298410.1371/journal.pone.0062984Enhancement of chemical entity identification in text using semantic similarity validation.Tiago GregoFrancisco M CoutoWith the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/.http://europepmc.org/articles/PMC3642108?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Tiago Grego Francisco M Couto
spellingShingle	Tiago Grego Francisco M Couto Enhancement of chemical entity identification in text using semantic similarity validation. PLoS ONE
author_facet	Tiago Grego Francisco M Couto
author_sort	Tiago Grego
title	Enhancement of chemical entity identification in text using semantic similarity validation.
title_short	Enhancement of chemical entity identification in text using semantic similarity validation.
title_full	Enhancement of chemical entity identification in text using semantic similarity validation.
title_fullStr	Enhancement of chemical entity identification in text using semantic similarity validation.
title_full_unstemmed	Enhancement of chemical entity identification in text using semantic similarity validation.
title_sort	enhancement of chemical entity identification in text using semantic similarity validation.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2013-01-01
description	With the amount of chemical data being produced and reported in the literature growing at a fast pace, it is increasingly important to efficiently retrieve this information. To tackle this issue text mining tools have been applied, but despite their good performance they still provide many errors that we believe can be filtered by using semantic similarity. Thus, this paper proposes a novel method that receives the results of chemical entity identification systems, such as Whatizit, and exploits the semantic relationships in ChEBI to measure the similarity between the entities found in the text. The method assigns a single validation score to each entity based on its similarities with the other entities also identified in the text. Then, by using a given threshold, the method selects a set of validated entities and a set of outlier entities. We evaluated our method using the results of two state-of-the-art chemical entity identification tools, three semantic similarity measures and two text window sizes. The method was able to increase precision without filtering a significant number of correctly identified entities. This means that the method can effectively discriminate the correctly identified chemical entities, while discarding a significant number of identification errors. For example, selecting a validation set with 75% of all identified entities, we were able to increase the precision by 28% for one of the chemical entity identification tools (Whatizit), maintaining in that subset 97% the correctly identified entities. Our method can be directly used as an add-on by any state-of-the-art entity identification tool that provides mappings to a database, in order to improve their results. The proposed method is included in a freely accessible web tool at www.lasige.di.fc.ul.pt/webtools/ice/.
url	http://europepmc.org/articles/PMC3642108?pdf=render
work_keys_str_mv	AT tiagogrego enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation AT franciscomcouto enhancementofchemicalentityidentificationintextusingsemanticsimilarityvalidation
_version_	1725150847728353280

Enhancement of chemical entity identification in text using semantic similarity validation.

Similar Items