Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes

Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automate...

Full description

Bibliographic Details
Main Authors: Kahanda, I. (Author), Pourreza Shahri, M. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03611nam a2200517Ia 4500
001 10.1186-s12859-021-04421-z
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04421-z 
520 3 |a Background: Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward. Results: In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists. Conclusions: This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction. © 2021, The Author(s). 
650 0 4 |a Biomedical relationship extraction 
650 0 4 |a Biomedical relationship extraction 
650 0 4 |a data mining 
650 0 4 |a Data Mining 
650 0 4 |a Deep learning 
650 0 4 |a Deep learning 
650 0 4 |a Deep neural networks 
650 0 4 |a Ensemble learning 
650 0 4 |a Ensemble learning 
650 0 4 |a Extraction 
650 0 4 |a human 
650 0 4 |a Human phenotype ontology 
650 0 4 |a Human phenotype ontology 
650 0 4 |a Human proteins 
650 0 4 |a Humans 
650 0 4 |a Learning algorithms 
650 0 4 |a Natural language processing systems 
650 0 4 |a Neural Networks, Computer 
650 0 4 |a Ontology's 
650 0 4 |a phenotype 
650 0 4 |a Phenotype 
650 0 4 |a Protein phenotype relationship 
650 0 4 |a Protein phenotype relationships 
650 0 4 |a Proteins 
650 0 4 |a Recurrent neural networks 
650 0 4 |a Relationship extraction 
650 0 4 |a Semi-supervised 
650 0 4 |a Semi-supervised learning 
650 0 4 |a Semi-supervised learning 
650 0 4 |a supervised machine learning 
650 0 4 |a Supervised Machine Learning 
700 1 |a Kahanda, I.  |e author 
700 1 |a Pourreza Shahri, M.  |e author 
773 |t BMC Bioinformatics