|
|
|
|
LEADER |
03611nam a2200517Ia 4500 |
001 |
10.1186-s12859-021-04421-z |
008 |
220427s2021 CNT 000 0 und d |
020 |
|
|
|a 14712105 (ISSN)
|
245 |
1 |
0 |
|a Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes
|
260 |
|
0 |
|b BioMed Central Ltd
|c 2021
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1186/s12859-021-04421-z
|
520 |
3 |
|
|a Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction. © 2021, The Author(s).
|
650 |
0 |
4 |
|a Biomedical relationship extraction
|
650 |
0 |
4 |
|a Biomedical relationship extraction
|
650 |
0 |
4 |
|a data mining
|
650 |
0 |
4 |
|a Data Mining
|
650 |
0 |
4 |
|a Deep learning
|
650 |
0 |
4 |
|a Deep learning
|
650 |
0 |
4 |
|a Deep neural networks
|
650 |
0 |
4 |
|a Ensemble learning
|
650 |
0 |
4 |
|a Ensemble learning
|
650 |
0 |
4 |
|a Extraction
|
650 |
0 |
4 |
|a human
|
650 |
0 |
4 |
|a Human phenotype ontology
|
650 |
0 |
4 |
|a Human phenotype ontology
|
650 |
0 |
4 |
|a Human proteins
|
650 |
0 |
4 |
|a Humans
|
650 |
0 |
4 |
|a Learning algorithms
|
650 |
0 |
4 |
|a Natural language processing systems
|
650 |
0 |
4 |
|a Neural Networks, Computer
|
650 |
0 |
4 |
|a Ontology's
|
650 |
0 |
4 |
|a phenotype
|
650 |
0 |
4 |
|a Phenotype
|
650 |
0 |
4 |
|a Protein phenotype relationship
|
650 |
0 |
4 |
|a Protein phenotype relationships
|
650 |
0 |
4 |
|a Proteins
|
650 |
0 |
4 |
|a Recurrent neural networks
|
650 |
0 |
4 |
|a Relationship extraction
|
650 |
0 |
4 |
|a Semi-supervised
|
650 |
0 |
4 |
|a Semi-supervised learning
|
650 |
0 |
4 |
|a Semi-supervised learning
|
650 |
0 |
4 |
|a supervised machine learning
|
650 |
0 |
4 |
|a Supervised Machine Learning
|
700 |
1 |
|
|a Kahanda, I.
|e author
|
700 |
1 |
|
|a Pourreza Shahri, M.
|e author
|
773 |
|
|
|t BMC Bioinformatics
|