A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

Biomedical named entity recognition (biomedical NER) is a core component to build biomedical text processing systems, such as biomedical information retrieval and question answering systems. Recently, many studies based on machine learning have been developed for a biomedical NER. The machine learni...

Full description

Bibliographic Details
Main Authors:	Juae Kim, Youngjoong Ko, Jungyun Seo
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Biomedical named entity recognition bootstrapping information extraction semi-supervised learning
Online Access:	https://ieeexplore.ieee.org/document/8703375/

id	doaj-1d68e46b55784456a64e270632ac549c
record_format	Article
spelling	doaj-1d68e46b55784456a64e270632ac549c2021-03-29T23:47:49ZengIEEEIEEE Access2169-35362019-01-017703087031810.1109/ACCESS.2019.29141688703375A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-DomainsJuae Kim0Youngjoong Ko1https://orcid.org/0000-0002-0241-9193Jungyun Seo2Department of Computer engineering, Sogang University, Seoul, South KoreaDepartment of Computer engineering, Dong-A University, Busan, South KoreaDepartment of Computer engineering, Sogang University, Seoul, South KoreaBiomedical named entity recognition (biomedical NER) is a core component to build biomedical text processing systems, such as biomedical information retrieval and question answering systems. Recently, many studies based on machine learning have been developed for a biomedical NER. The machine learning-based approaches generally require significant amounts of annotated corpora to achieve high performance. However, it is expensive to manually create a large number of high-quality corpora due to the demand for biomedical experts. In addition, most existing corpora have focused on several specific sub-domains, such as disease, protein, and species. It is difficult for a biomedical NER system trained with these corpora to provide much information for biomedical text processing systems. In this paper, we propose a method for automatically generating the machine-labeled biomedical NER corpus that covers various sub-domains by using proper categories from the semantic groups of a unified medical language system (UMLS). We use a bootstrapping approach with a small amount of manually annotated corpus to automatically generate a significant amount of corpus and then construct a biomedical NER system trained with the machine-labeled corpus. At last, we train two machine learning-based classifiers, conditional random fields (CRFs) and long short-term memory (LSTM), with the machine-labeled data to improve performance. The experimental results show that the proposed method is effective to improve performance. As a result, the proposed one obtains higher performance in 23.69% than the model that trained only a small amount of manually annotated corpus in F1-score.https://ieeexplore.ieee.org/document/8703375/Biomedical named entity recognitionbootstrappinginformation extractionsemi-supervised learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Juae Kim Youngjoong Ko Jungyun Seo
spellingShingle	Juae Kim Youngjoong Ko Jungyun Seo A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains IEEE Access Biomedical named entity recognition bootstrapping information extraction semi-supervised learning
author_facet	Juae Kim Youngjoong Ko Jungyun Seo
author_sort	Juae Kim
title	A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains
title_short	A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains
title_full	A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains
title_fullStr	A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains
title_full_unstemmed	A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains
title_sort	bootstrapping approach with crf and deep learning models for improving the biomedical named entity recognition in multi-domains
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Biomedical named entity recognition (biomedical NER) is a core component to build biomedical text processing systems, such as biomedical information retrieval and question answering systems. Recently, many studies based on machine learning have been developed for a biomedical NER. The machine learning-based approaches generally require significant amounts of annotated corpora to achieve high performance. However, it is expensive to manually create a large number of high-quality corpora due to the demand for biomedical experts. In addition, most existing corpora have focused on several specific sub-domains, such as disease, protein, and species. It is difficult for a biomedical NER system trained with these corpora to provide much information for biomedical text processing systems. In this paper, we propose a method for automatically generating the machine-labeled biomedical NER corpus that covers various sub-domains by using proper categories from the semantic groups of a unified medical language system (UMLS). We use a bootstrapping approach with a small amount of manually annotated corpus to automatically generate a significant amount of corpus and then construct a biomedical NER system trained with the machine-labeled corpus. At last, we train two machine learning-based classifiers, conditional random fields (CRFs) and long short-term memory (LSTM), with the machine-labeled data to improve performance. The experimental results show that the proposed method is effective to improve performance. As a result, the proposed one obtains higher performance in 23.69% than the model that trained only a small amount of manually annotated corpus in F1-score.
topic	Biomedical named entity recognition bootstrapping information extraction semi-supervised learning
url	https://ieeexplore.ieee.org/document/8703375/
work_keys_str_mv	AT juaekim abootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains AT youngjoongko abootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains AT jungyunseo abootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains AT juaekim bootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains AT youngjoongko bootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains AT jungyunseo bootstrappingapproachwithcrfanddeeplearningmodelsforimprovingthebiomedicalnamedentityrecognitioninmultidomains
_version_	1724188953205014528

A Bootstrapping Approach With CRF and Deep Learning Models for Improving the Biomedical Named Entity Recognition in Multi-Domains

Similar Items