LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification

Background: Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a f...

Full description

Bibliographic Details
Main Authors: Duan, Q. (Author), Peng, L. (Author), Tang, J. (Author), Tian, X. (Author), Xu, H. (Author), Zhou, L. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 04460nam a2200601Ia 4500
001 10.1186-s12859-021-04485-x
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a LPI-HyADBS: a hybrid framework for lncRNA-protein interaction prediction integrating feature selection and classification 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04485-x 
520 3 |a Background: Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results: Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions: Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins. © 2021, The Author(s). 
650 0 4 |a Adaptive boosting 
650 0 4 |a amino acid sequence 
650 0 4 |a Amino Acid Sequence 
650 0 4 |a biology 
650 0 4 |a Classification (of information) 
650 0 4 |a Computational Biology 
650 0 4 |a Cross validation 
650 0 4 |a C-SVM 
650 0 4 |a C-SVM 
650 0 4 |a Deep neural network 
650 0 4 |a Deep neural networks 
650 0 4 |a Ensemble learning 
650 0 4 |a Ensemble learning 
650 0 4 |a Feature extraction 
650 0 4 |a Feature selection 
650 0 4 |a Feature selection and classification 
650 0 4 |a Features selection 
650 0 4 |a Forecasting 
650 0 4 |a genetics 
650 0 4 |a Hybrid framework 
650 0 4 |a Interaction prediction 
650 0 4 |a lncRNA-protein interaction 
650 0 4 |a Lncrna-protein interaction 
650 0 4 |a long untranslated RNA 
650 0 4 |a metabolism 
650 0 4 |a Neural Networks, Computer 
650 0 4 |a Protein interaction 
650 0 4 |a Proteins 
650 0 4 |a RNA binding protein 
650 0 4 |a RNA, Long Noncoding 
650 0 4 |a RNA-Binding Proteins 
650 0 4 |a Support vector machines 
650 0 4 |a Xgboost 
650 0 4 |a XGBoost 
700 1 |a Duan, Q.  |e author 
700 1 |a Peng, L.  |e author 
700 1 |a Tang, J.  |e author 
700 1 |a Tian, X.  |e author 
700 1 |a Xu, H.  |e author 
700 1 |a Zhou, L.  |e author 
773 |t BMC Bioinformatics