An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.

As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN....

Full description

Bibliographic Details
Main Authors:	Siquan Hu, Ruixiong Ma, Haiou Wang
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2019-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0225317

id	doaj-c7019a5e6934437c9c97c0f06823c869
record_format	Article
spelling	doaj-c7019a5e6934437c9c97c0f06823c8692021-03-03T21:17:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-011411e022531710.1371/journal.pone.0225317An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.Siquan HuRuixiong MaHaiou WangAs the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%-7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%-12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins.https://doi.org/10.1371/journal.pone.0225317
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Siquan Hu Ruixiong Ma Haiou Wang
spellingShingle	Siquan Hu Ruixiong Ma Haiou Wang An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences. PLoS ONE
author_facet	Siquan Hu Ruixiong Ma Haiou Wang
author_sort	Siquan Hu
title	An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
title_short	An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
title_full	An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
title_fullStr	An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
title_full_unstemmed	An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.
title_sort	improved deep learning method for predicting dna-binding proteins based on contextual features in amino acid sequences.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2019-01-01
description	As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%-7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%-12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins.
url	https://doi.org/10.1371/journal.pone.0225317
work_keys_str_mv	AT siquanhu animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT ruixiongma animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT haiouwang animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT siquanhu improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT ruixiongma improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT haiouwang improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
_version_	1714817753819381760

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.

Similar Items