iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein seque...

Full description

Bibliographic Details
Main Authors: Bin Liu, Jinghao Xu, Xun Lan, Ruifeng Xu, Jiyun Zhou, Xiaolong Wang, Kuo-Chen Chou
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4153653?pdf=render
id doaj-8e40c3ead96d4d979846d21e5c2f41c4
record_format Article
spelling doaj-8e40c3ead96d4d979846d21e5c2f41c42020-11-25T02:48:43ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0199e10669110.1371/journal.pone.0106691iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.Bin LiuJinghao XuXun LanRuifeng XuJiyun ZhouXiaolong WangKuo-Chen ChouPlaying crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.http://europepmc.org/articles/PMC4153653?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Bin Liu
Jinghao Xu
Xun Lan
Ruifeng Xu
Jiyun Zhou
Xiaolong Wang
Kuo-Chen Chou
spellingShingle Bin Liu
Jinghao Xu
Xun Lan
Ruifeng Xu
Jiyun Zhou
Xiaolong Wang
Kuo-Chen Chou
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
PLoS ONE
author_facet Bin Liu
Jinghao Xu
Xun Lan
Ruifeng Xu
Jiyun Zhou
Xiaolong Wang
Kuo-Chen Chou
author_sort Bin Liu
title iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
title_short iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
title_full iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
title_fullStr iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
title_full_unstemmed iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
title_sort idna-prot|dis: identifying dna-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.
url http://europepmc.org/articles/PMC4153653?pdf=render
work_keys_str_mv AT binliu idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT jinghaoxu idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT xunlan idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT ruifengxu idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT jiyunzhou idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT xiaolongwang idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
AT kuochenchou idnaprotdisidentifyingdnabindingproteinsbyincorporatingaminoaciddistancepairsandreducedalphabetprofileintothegeneralpseudoaminoacidcomposition
_version_ 1724747082313498624