PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method
DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2020-01-01
|
Series: | BioMed Research International |
Online Access: | http://dx.doi.org/10.1155/2020/7297631 |
id |
doaj-bbcc80e84b784c8a8d84f4fb233a1f1f |
---|---|
record_format |
Article |
spelling |
doaj-bbcc80e84b784c8a8d84f4fb233a1f1f2020-11-25T02:41:30ZengHindawi LimitedBioMed Research International2314-61332314-61412020-01-01202010.1155/2020/72976317297631PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble MethodJun Wang0Huiwen Zheng1Yang Yang2Wanyue Xiao3Taigang Liu4College of Information, Shanghai Ocean University, Shanghai 201306, ChinaSchool of Engineering, University of Melbourne, Victoria 3010, AustraliaSchool of Information Management, Nanjing University, Nanjing 210023, ChinaSchool of Information, Syracuse University, Syracuse, NY 13244, USACollege of Information, Shanghai Ocean University, Shanghai 201306, ChinaDNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone.http://dx.doi.org/10.1155/2020/7297631 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jun Wang Huiwen Zheng Yang Yang Wanyue Xiao Taigang Liu |
spellingShingle |
Jun Wang Huiwen Zheng Yang Yang Wanyue Xiao Taigang Liu PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method BioMed Research International |
author_facet |
Jun Wang Huiwen Zheng Yang Yang Wanyue Xiao Taigang Liu |
author_sort |
Jun Wang |
title |
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_short |
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_full |
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_fullStr |
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_full_unstemmed |
PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method |
title_sort |
preddbp-stack: prediction of dna-binding proteins from hmm profiles using a stacked ensemble method |
publisher |
Hindawi Limited |
series |
BioMed Research International |
issn |
2314-6133 2314-6141 |
publishDate |
2020-01-01 |
description |
DNA-binding proteins (DBPs) play vital roles in all aspects of genetic activities. However, the identification of DBPs by using wet-lab experimental approaches is often time-consuming and laborious. In this study, we develop a novel computational method, called PredDBP-Stack, to predict DBPs solely based on protein sequences. First, amino acid composition (AAC) and transition probability composition (TPC) extracted from the hidden markov model (HMM) profile are adopted to represent a protein. Next, we establish a stacked ensemble model to identify DBPs, which involves two stages of learning. In the first stage, the four base classifiers are trained with the features of HMM-based compositions. In the second stage, the prediction probabilities of these base classifiers are used as inputs to the meta-classifier to perform the final prediction of DBPs. Based on the PDB1075 benchmark dataset, we conduct a jackknife cross validation with the proposed PredDBP-Stack predictor and obtain a balanced sensitivity and specificity of 92.47% and 92.36%, respectively. This outcome outperforms most of the existing classifiers. Furthermore, our method also achieves superior performance and model robustness on the PDB186 independent dataset. This demonstrates that the PredDBP-Stack is an effective classifier for accurately identifying DBPs based on protein sequence information alone. |
url |
http://dx.doi.org/10.1155/2020/7297631 |
work_keys_str_mv |
AT junwang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT huiwenzheng preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT yangyang preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT wanyuexiao preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod AT taigangliu preddbpstackpredictionofdnabindingproteinsfromhmmprofilesusingastackedensemblemethod |
_version_ |
1715414500041031680 |