Biomedical semantic indexing by deep neural network with multi-task learning
Abstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-12-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2534-2 |
id |
doaj-e340f32c35e84791a95399a0e34bfb78 |
---|---|
record_format |
Article |
spelling |
doaj-e340f32c35e84791a95399a0e34bfb782020-11-25T00:51:53ZengBMCBMC Bioinformatics1471-21052018-12-0119S2011110.1186/s12859-018-2534-2Biomedical semantic indexing by deep neural network with multi-task learningYongping Du0Yunpeng Pan1Chencheng Wang2Junzhong Ji3Faculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyFaculty of Information Technology, Beijing University of TechnologyAbstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult to apply for semantic indexing task. Results In this paper, we present a novel deep serial multi-task learning model. The primary task treats the biomedical semantic indexing as a multi-label text classification issue that considers the relations of the labels. The auxiliary task is a regression task that predicts the MeSH number of the citation and provides hints for the network to make it converge faster. The experimental results on the BioASQ-Task5A open dataset show that our model outperforms the state-of-the-art solution “MTI”, proposed by the US National Library of Medicine. Further, it not only achieves the highest precision among all the solutions in BioASQ-Task5A but also has faster convergence speed compared with some naive deep learning methods. Conclusions Rather than parallel in an ordinary multi-task structure, the tasks in our model are serial and tightly coupled. It can achieve satisfied performance without any handcrafted feature.http://link.springer.com/article/10.1186/s12859-018-2534-2Multi-label classificationBiomedical semantic indexingData miningNatural language processingMulti-task learningWord embedding |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yongping Du Yunpeng Pan Chencheng Wang Junzhong Ji |
spellingShingle |
Yongping Du Yunpeng Pan Chencheng Wang Junzhong Ji Biomedical semantic indexing by deep neural network with multi-task learning BMC Bioinformatics Multi-label classification Biomedical semantic indexing Data mining Natural language processing Multi-task learning Word embedding |
author_facet |
Yongping Du Yunpeng Pan Chencheng Wang Junzhong Ji |
author_sort |
Yongping Du |
title |
Biomedical semantic indexing by deep neural network with multi-task learning |
title_short |
Biomedical semantic indexing by deep neural network with multi-task learning |
title_full |
Biomedical semantic indexing by deep neural network with multi-task learning |
title_fullStr |
Biomedical semantic indexing by deep neural network with multi-task learning |
title_full_unstemmed |
Biomedical semantic indexing by deep neural network with multi-task learning |
title_sort |
biomedical semantic indexing by deep neural network with multi-task learning |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2018-12-01 |
description |
Abstract Background Biomedical semantic indexing is important for information retrieval and many other research fields in bioinformatics. It annotates biomedical citations with Medical Subject Headings. In face of unbalanced category distribution in the training data, sampling methods are difficult to apply for semantic indexing task. Results In this paper, we present a novel deep serial multi-task learning model. The primary task treats the biomedical semantic indexing as a multi-label text classification issue that considers the relations of the labels. The auxiliary task is a regression task that predicts the MeSH number of the citation and provides hints for the network to make it converge faster. The experimental results on the BioASQ-Task5A open dataset show that our model outperforms the state-of-the-art solution “MTI”, proposed by the US National Library of Medicine. Further, it not only achieves the highest precision among all the solutions in BioASQ-Task5A but also has faster convergence speed compared with some naive deep learning methods. Conclusions Rather than parallel in an ordinary multi-task structure, the tasks in our model are serial and tightly coupled. It can achieve satisfied performance without any handcrafted feature. |
topic |
Multi-label classification Biomedical semantic indexing Data mining Natural language processing Multi-task learning Word embedding |
url |
http://link.springer.com/article/10.1186/s12859-018-2534-2 |
work_keys_str_mv |
AT yongpingdu biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning AT yunpengpan biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning AT chenchengwang biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning AT junzhongji biomedicalsemanticindexingbydeepneuralnetworkwithmultitasklearning |
_version_ |
1725243476719697920 |