Evaluation of Named Entity Recognition Algorithms in Short Texts
Abstract: One of the major consequences of the growth of social networks has been the generation of huge volumes of content. The text that is generated in social networks constitutes a new type of content, that is short, informal, lacking grammar in some cases, and noise prone. Given the volume of...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Centro Latinoamericano de Estudios en Informática
2017-04-01
|
Series: | CLEI Electronic Journal |
Subjects: | |
Online Access: | http://clei.org/cleiej-beta/index.php/cleiej/article/view/9 |
id |
doaj-b7cee2dc6b5947cf92f4ca1bf3be81bd |
---|---|
record_format |
Article |
spelling |
doaj-b7cee2dc6b5947cf92f4ca1bf3be81bd2020-11-25T00:35:18ZengCentro Latinoamericano de Estudios en InformáticaCLEI Electronic Journal0717-50002017-04-0120110.19153/cleiej.20.1.4Evaluation of Named Entity Recognition Algorithms in Short TextsEdgar Casasola Murillo0Raquel Fonseca1Universidad de Costa RicaUniversidad de Costa RicaAbstract: One of the major consequences of the growth of social networks has been the generation of huge volumes of content. The text that is generated in social networks constitutes a new type of content, that is short, informal, lacking grammar in some cases, and noise prone. Given the volume of information that is produced every day, a manual processing of this data is unpractical, causing the need of exploring and applying automatic processing strategies, like Entity Recognition (ER). It becomes necessary to evaluate the performance of traditional ER algorithms in corpus with those characteristics. This paper presents the results of applying AlchemyAPI y Dandelion API algorithms in a corpus provided by The SemEval-2015 Aspect Based Sentiment Analysis Conference. The entities recognized by each algorithm were compared against the ones annotated in the collection in order to calculate their precision and recall. Dandelion API got better results than AlchemyAPI with the given corpus. Spanish Abstract: Una de las principales consecuencias del auge actual de las redes sociales es la generación de grandes volúmenes de información. El texto generado en estas redes corresponde a un nuevo género de texto: corto, informal, gramaticalmente deficiente y propenso a ruido. Debido a la tasa de producción de la información, el procesamiento manual resulta poco práctico, surgiendo así la necesidad de aplicar estrategias de procesamiento automático, como Reconocimiento de Entidades (RE). Debido a las características del contenido, surge además la necesidad de evaluar el desempeño de los algoritmos tradicionales, en corpus extraídos de estas redes sociales. Este trabajo presenta los resultados obtenidos al aplicar los algoritmos de AlchemyAPI y Dandelion API en un corpus provisto por la conferencia The SemEval-2015 Aspect Based Sentiment Analysis. Las entidades reconocidas por cada algoritmo fueron comparadas con las anotadas en la colección, para calcular su precisión y exhaustividad. Dandelion API obtuvo mejores resultados que AlchemyAPI en el corpus dado. http://clei.org/cleiej-beta/index.php/cleiej/article/view/9Information RetrievalEntity RecognitionERNamed Entity RecognitionNERcorpus |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Edgar Casasola Murillo Raquel Fonseca |
spellingShingle |
Edgar Casasola Murillo Raquel Fonseca Evaluation of Named Entity Recognition Algorithms in Short Texts CLEI Electronic Journal Information Retrieval Entity Recognition ER Named Entity Recognition NER corpus |
author_facet |
Edgar Casasola Murillo Raquel Fonseca |
author_sort |
Edgar Casasola Murillo |
title |
Evaluation of Named Entity Recognition Algorithms in Short Texts |
title_short |
Evaluation of Named Entity Recognition Algorithms in Short Texts |
title_full |
Evaluation of Named Entity Recognition Algorithms in Short Texts |
title_fullStr |
Evaluation of Named Entity Recognition Algorithms in Short Texts |
title_full_unstemmed |
Evaluation of Named Entity Recognition Algorithms in Short Texts |
title_sort |
evaluation of named entity recognition algorithms in short texts |
publisher |
Centro Latinoamericano de Estudios en Informática |
series |
CLEI Electronic Journal |
issn |
0717-5000 |
publishDate |
2017-04-01 |
description |
Abstract:
One of the major consequences of the growth of social networks has been the generation of huge volumes of content. The text that is generated in social networks constitutes a new type of content, that is short, informal, lacking grammar in some cases, and noise prone. Given the volume of information that is produced every day, a manual processing of this data is unpractical, causing the need of exploring and applying automatic processing strategies, like Entity Recognition (ER). It becomes necessary to evaluate the performance of traditional ER algorithms in corpus with those characteristics. This paper presents the results of applying AlchemyAPI y Dandelion API algorithms in a corpus provided by The SemEval-2015 Aspect Based Sentiment Analysis Conference. The entities recognized by each algorithm were compared against the ones annotated in the collection in order to calculate their precision and recall. Dandelion API got better results than AlchemyAPI with the given corpus.
Spanish Abstract:
Una de las principales consecuencias del auge actual de las redes sociales es la generación de grandes volúmenes de información. El texto generado en estas redes corresponde a un nuevo género de texto: corto, informal, gramaticalmente deficiente y propenso a ruido. Debido a la tasa de producción de la información, el procesamiento manual resulta poco práctico, surgiendo así la necesidad de aplicar estrategias de procesamiento automático, como Reconocimiento de Entidades (RE). Debido a las características del contenido, surge además la necesidad de evaluar el desempeño de los algoritmos tradicionales, en corpus extraídos de estas redes sociales. Este trabajo presenta los resultados obtenidos al aplicar los algoritmos de AlchemyAPI y Dandelion API en un corpus provisto por la conferencia The SemEval-2015 Aspect Based Sentiment Analysis. Las entidades reconocidas por cada algoritmo fueron comparadas con las anotadas en la colección, para calcular su precisión y exhaustividad. Dandelion API obtuvo mejores resultados que AlchemyAPI en el corpus dado.
|
topic |
Information Retrieval Entity Recognition ER Named Entity Recognition NER corpus |
url |
http://clei.org/cleiej-beta/index.php/cleiej/article/view/9 |
work_keys_str_mv |
AT edgarcasasolamurillo evaluationofnamedentityrecognitionalgorithmsinshorttexts AT raquelfonseca evaluationofnamedentityrecognitionalgorithmsinshorttexts |
_version_ |
1725309170847055872 |