Biomedical semantic indexing by deep neural network with multi-task learning

Abstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult...

Full description

Bibliographic Details
Main Authors: Yongping Du, Yunpeng Pan, Chencheng Wang, Junzhong Ji
Format: Article
Language:English
Published: BMC 2018-12-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2534-2
id doaj-e340f32c35e84791a95399a0e34bfb78
record_format Article
spelling doaj-e340f32c35e84791a95399a0e34bfb782020-11-25T00:51:53ZengBMCBMC Bioinformatics1471-21052018-12-0119S2011110.1186/s12859-018-2534-2Biomedical semantic indexing by deep neural network with multi-task learningYongping Du0Yunpeng Pan1Chencheng Wang2Junzhong Ji3Faculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyAbstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult to apply for semantic indexing task. Results In this paper, we present a novel deep serial multi-task learning model. The primary task treats the biomedical semantic indexing as a multi-label text classification issue that considers the relations of the labels. The auxiliary task is a regression task that predicts the MeSH number of the citation and provides hints for the network to make it converge faster. The experimental results on the BioASQ-Task5A open dataset show that our model outperforms the state-of-the-art solution “MTI”, proposed by the US National Library of Medicine. Further, it not only achieves the highest precision among all the solutions in BioASQ-Task5A but also has faster convergence speed compared with some naive deep learning methods. Conclusions Rather than parallel in an ordinary multi-task structure, the tasks in our model are serial and tightly coupled. It can achieve satisfied performance without any handcrafted feature.http://link.springer.com/article/10.1186/s12859-018-2534-2Multi-label classificationBiomedical semantic indexingData miningNatural language processingMulti-task learningWord embedding
collection DOAJ
language English
format Article
sources DOAJ
author Yongping Du
Yunpeng Pan
Chencheng Wang
Junzhong Ji
spellingShingle Yongping Du
Yunpeng Pan
Chencheng Wang
Junzhong Ji
Biomedical semantic indexing by deep neural network with multi-task learning
BMC Bioinformatics
Multi-label classification
Biomedical semantic indexing
Data mining
Natural language processing
Multi-task learning
Word embedding
author_facet Yongping Du
Yunpeng Pan
Chencheng Wang
Junzhong Ji
author_sort Yongping Du
title Biomedical semantic indexing by deep neural network with multi-task learning
title_short Biomedical semantic indexing by deep neural network with multi-task learning
title_full Biomedical semantic indexing by deep neural network with multi-task learning
title_fullStr Biomedical semantic indexing by deep neural network with multi-task learning
title_full_unstemmed Biomedical semantic indexing by deep neural network with multi-task learning
title_sort biomedical semantic indexing by deep neural network with multi-task learning
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-12-01
description Abstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult to apply for semantic indexing task. Results In this paper, we present a novel deep serial multi-task learning model. The primary task treats the biomedical semantic indexing as a multi-label text classification issue that considers the relations of the labels. The auxiliary task is a regression task that predicts the MeSH number of the citation and provides hints for the network to make it converge faster. The experimental results on the BioASQ-Task5A open dataset show that our model outperforms the state-of-the-art solution “MTI”, proposed by the US National Library of Medicine. Further, it not only achieves the highest precision among all the solutions in BioASQ-Task5A but also has faster convergence speed compared with some naive deep learning methods. Conclusions Rather than parallel in an ordinary multi-task structure, the tasks in our model are serial and tightly coupled. It can achieve satisfied performance without any handcrafted feature.
topic Multi-label classification
Biomedical semantic indexing
Data mining
Natural language processing
Multi-task learning
Word embedding
url http://link.springer.com/article/10.1186/s12859-018-2534-2
work_keys_str_mv AT yongpingdu biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning
AT yunpengpan biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning
AT chenchengwang biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning
AT junzhongji biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning
_version_ 1725243476719697920