A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine

Abstract Background The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to im...

Full description

Bibliographic Details
Main Authors: Leonardo Campillos-Llanos, Ana Valverde-Mateos, Adrián Capllonch-Carrión, Antonio Moreno-Sandoval
Format: Article
Language:English
Published: BMC 2021-02-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-021-01395-z
id doaj-23ec51d897d64bcd879152e951fea49c
record_format Article
spelling doaj-23ec51d897d64bcd879152e951fea49c2021-02-23T09:18:04ZengBMCBMC Medical Informatics and Decision Making1472-69472021-02-0121111910.1186/s12911-021-01395-zA clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicineLeonardo Campillos-Llanos0Ana Valverde-Mateos1Adrián Capllonch-Carrión2Antonio Moreno-Sandoval3Computational Linguistics Laboratory, Universidad Autónoma de MadridMedical Terminology Unit, Spanish Royal Academy of Medicine.Complejo Asistencial Hospital Benito Menni.Computational Linguistics Laboratory, Universidad Autónoma de MadridAbstract Background The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. Methods We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. Results This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. Conclusions Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html . The methods are generalizable to other languages with similar available sources.https://doi.org/10.1186/s12911-021-01395-zClinical TrialsEvidence-Based MedicineSemantic AnnotationInter-Annotator AgreementNatural Language Processing
collection DOAJ
language English
format Article
sources DOAJ
author Leonardo Campillos-Llanos
Ana Valverde-Mateos
Adrián Capllonch-Carrión
Antonio Moreno-Sandoval
spellingShingle Leonardo Campillos-Llanos
Ana Valverde-Mateos
Adrián Capllonch-Carrión
Antonio Moreno-Sandoval
A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
BMC Medical Informatics and Decision Making
Clinical Trials
Evidence-Based Medicine
Semantic Annotation
Inter-Annotator Agreement
Natural Language Processing
author_facet Leonardo Campillos-Llanos
Ana Valverde-Mateos
Adrián Capllonch-Carrión
Antonio Moreno-Sandoval
author_sort Leonardo Campillos-Llanos
title A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
title_short A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
title_full A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
title_fullStr A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
title_full_unstemmed A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine
title_sort clinical trials corpus annotated with umls entities to enhance the access to evidence-based medicine
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2021-02-01
description Abstract Background The large volume of medical literature makes it difficult for healthcare professionals to keep abreast of the latest studies that support Evidence-Based Medicine. Natural language processing enhances the access to relevant information, and gold standard corpora are required to improve systems. To contribute with a new dataset for this domain, we collected the Clinical Trials for Evidence-Based Medicine in Spanish (CT-EBM-SP) corpus. Methods We annotated 1200 texts about clinical trials with entities from the Unified Medical Language System semantic groups: anatomy (ANAT), pharmacological and chemical substances (CHEM), pathologies (DISO), and lab tests, diagnostic or therapeutic procedures (PROC). We doubly annotated 10% of the corpus and measured inter-annotator agreement (IAA) using F-measure. As use case, we run medical entity recognition experiments with neural network models. Results This resource contains 500 abstracts of journal articles about clinical trials and 700 announcements of trial protocols (292 173 tokens). We annotated 46 699 entities (13.98% are nested entities). Regarding IAA agreement, we obtained an average F-measure of 85.65% (±4.79, strict match) and 93.94% (±3.31, relaxed match). In the use case experiments, we achieved recognition results ranging from 80.28% (±00.99) to 86.74% (±00.19) of average F-measure. Conclusions Our results show that this resource is adequate for experiments with state-of-the-art approaches to biomedical named entity recognition. It is freely distributed at: http://www.lllf.uam.es/ESP/nlpmedterm_en.html . The methods are generalizable to other languages with similar available sources.
topic Clinical Trials
Evidence-Based Medicine
Semantic Annotation
Inter-Annotator Agreement
Natural Language Processing
url https://doi.org/10.1186/s12911-021-01395-z
work_keys_str_mv AT leonardocampillosllanos aclinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT anavalverdemateos aclinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT adriancapllonchcarrion aclinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT antoniomorenosandoval aclinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT leonardocampillosllanos clinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT anavalverdemateos clinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT adriancapllonchcarrion clinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
AT antoniomorenosandoval clinicaltrialscorpusannotatedwithumlsentitiestoenhancetheaccesstoevidencebasedmedicine
_version_ 1724254858037428224