A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction

Abstract Background Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein fo...

Full description

Bibliographic Details
Main Authors:	Lei Deng, Chao Fan, Zhiwen Zeng
Format:	Article
Language:	English
Published:	BMC 2017-12-01
Series:	BMC Bioinformatics
Subjects:	Solvent accessibility Contact number Deep neural network Sequence-derived features
Online Access:	http://link.springer.com/article/10.1186/s12859-017-1971-7

id	doaj-620c0d19f61441e3afb8b3704ee064a3
record_format	Article
spelling	doaj-620c0d19f61441e3afb8b3704ee064a32020-11-24T21:59:46ZengBMCBMC Bioinformatics1471-21052017-12-0118S1621122010.1186/s12859-017-1971-7A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number predictionLei Deng0Chao Fan1Zhiwen Zeng2School of Software, Central South UniversitySchool of Software, Central South UniversitySchool of Information Science and Engineering, Central South UniversityAbstract Background Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. Results In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. Conclusions We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.http://link.springer.com/article/10.1186/s12859-017-1971-7Solvent accessibilityContact numberDeep neural networkSequence-derived features
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lei Deng Chao Fan Zhiwen Zeng
spellingShingle	Lei Deng Chao Fan Zhiwen Zeng A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction BMC Bioinformatics Solvent accessibility Contact number Deep neural network Sequence-derived features
author_facet	Lei Deng Chao Fan Zhiwen Zeng
author_sort	Lei Deng
title	A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
title_short	A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
title_full	A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
title_fullStr	A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
title_full_unstemmed	A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
title_sort	sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2017-12-01
description	Abstract Background Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. Results In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. Conclusions We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.
topic	Solvent accessibility Contact number Deep neural network Sequence-derived features
url	http://link.springer.com/article/10.1186/s12859-017-1971-7
work_keys_str_mv	AT leideng asparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction AT chaofan asparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction AT zhiwenzeng asparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction AT leideng sparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction AT chaofan sparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction AT zhiwenzeng sparseautoencoderbaseddeepneuralnetworkforproteinsolventaccessibilityandcontactnumberprediction
_version_	1725847281773576192

A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction

Similar Items