PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method

DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely...

Full description

Bibliographic Details
Main Authors: Jun Wang, Huiwen Zheng, Yang Yang, Wanyue Xiao, Taigang Liu
Format: Article
Language:English
Published: Hindawi Limited 2020-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2020/7297631
id doaj-bbcc80e84b784c8a8d84f4fb233a1f1f
record_format Article
spelling doaj-bbcc80e84b784c8a8d84f4fb233a1f1f2020-11-25T02:41:30ZengHindawi LimitedBioMed Research International2314-61332314-61412020-01-01202010.1155/2020/72976317297631PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble MethodJun Wang0Huiwen Zheng1Yang Yang2Wanyue Xiao3Taigang Liu4College of Information, Shanghai Ocean University, Shanghai 201306, ChinaSchool of Engineering, University of Melbourne, Victoria 3010, AustraliaSchool of Information Management, Nanjing University, Nanjing 210023, ChinaSchool of Information, Syracuse University, Syracuse, NY 13244, USACollege of Information, Shanghai Ocean University, Shanghai 201306, ChinaDNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.http://dx.doi.org/10.1155/2020/7297631
collection DOAJ
language English
format Article
sources DOAJ
author Jun Wang
Huiwen Zheng
Yang Yang
Wanyue Xiao
Taigang Liu
spellingShingle Jun Wang
Huiwen Zheng
Yang Yang
Wanyue Xiao
Taigang Liu
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
BioMed Research International
author_facet Jun Wang
Huiwen Zheng
Yang Yang
Wanyue Xiao
Taigang Liu
author_sort Jun Wang
title PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_short PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_full PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_fullStr PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_full_unstemmed PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
title_sort preddbp-stack: prediction of dna-binding proteins from hmm profiles using a stacked ensemble method
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2020-01-01
description DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.
url http://dx.doi.org/10.1155/2020/7297631
work_keys_str_mv AT junwang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT huiwenzheng preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT yangyang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT wanyuexiao preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
AT taigangliu preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod
_version_ 1715414500041031680