Learning adaptive representations for entity recognition in the biomedical domain

Abstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine...

Full description

Bibliographic Details
Main Authors: Ivano Lauriola, Fabio Aiolli, Alberto Lavelli, Fabio Rinaldi
Format: Article
Language:English
Published: BMC 2021-05-01
Series:Journal of Biomedical Semantics
Subjects:
Online Access:https://doi.org/10.1186/s13326-021-00238-0
id doaj-a279e63e0b1742e49b9a623eb6e39414
record_format Article
spelling doaj-a279e63e0b1742e49b9a623eb6e394142021-05-23T11:30:37ZengBMCJournal of Biomedical Semantics2041-14802021-05-0112111310.1186/s13326-021-00238-0Learning adaptive representations for entity recognition in the biomedical domainIvano Lauriola0Fabio Aiolli1Alberto Lavelli2Fabio Rinaldi3Department of Mathematics, University of PadovaDepartment of Mathematics, University of PadovaFondazione Bruno KesslerFondazione Bruno KesslerAbstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. Results This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F 1 score. Conclusions Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.https://doi.org/10.1186/s13326-021-00238-0Named entity recognitionNeural networksKernel methodsEnsemble
collection DOAJ
language English
format Article
sources DOAJ
author Ivano Lauriola
Fabio Aiolli
Alberto Lavelli
Fabio Rinaldi
spellingShingle Ivano Lauriola
Fabio Aiolli
Alberto Lavelli
Fabio Rinaldi
Learning adaptive representations for entity recognition in the biomedical domain
Journal of Biomedical Semantics
Named entity recognition
Neural networks
Kernel methods
Ensemble
author_facet Ivano Lauriola
Fabio Aiolli
Alberto Lavelli
Fabio Rinaldi
author_sort Ivano Lauriola
title Learning adaptive representations for entity recognition in the biomedical domain
title_short Learning adaptive representations for entity recognition in the biomedical domain
title_full Learning adaptive representations for entity recognition in the biomedical domain
title_fullStr Learning adaptive representations for entity recognition in the biomedical domain
title_full_unstemmed Learning adaptive representations for entity recognition in the biomedical domain
title_sort learning adaptive representations for entity recognition in the biomedical domain
publisher BMC
series Journal of Biomedical Semantics
issn 2041-1480
publishDate 2021-05-01
description Abstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. Results This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F 1 score. Conclusions Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.
topic Named entity recognition
Neural networks
Kernel methods
Ensemble
url https://doi.org/10.1186/s13326-021-00238-0
work_keys_str_mv AT ivanolauriola learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain
AT fabioaiolli learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain
AT albertolavelli learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain
AT fabiorinaldi learningadaptiverepresentationsforentityrecognitioninthebiomedicaldomain
_version_ 1721429717771354112