Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone

Artificial proteins can be constructed from stable substructures, whose stability is encoded in their protein sequence. Identifying stable protein substructures experimentally is the only available option at the moment because no suitable method exists to extract this information from a protein sequ...

Full description

Bibliographic Details
Main Authors: Michal Gala, Gabriel Žoldák
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Nanomaterials
Subjects:
Online Access:https://www.mdpi.com/2079-4991/11/9/2198
id doaj-64cf855c277144afbd167febab8bfbad
record_format Article
spelling doaj-64cf855c277144afbd167febab8bfbad2021-09-26T00:48:03ZengMDPI AGNanomaterials2079-49912021-08-01112198219810.3390/nano11092198Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 ChaperoneMichal Gala0Gabriel Žoldák1Department of Biophysics, Faculty of Science, P. J. Šafárik University, Jesena 5, 040 01 Košice, SlovakiaCenter for Interdisciplinary Biosciences, Technology and Innovation Park, P. J. Šafárik University, Trieda SNP 1, 040 11 Košice, SlovakiaArtificial proteins can be constructed from stable substructures, whose stability is encoded in their protein sequence. Identifying stable protein substructures experimentally is the only available option at the moment because no suitable method exists to extract this information from a protein sequence. In previous research, we examined the mechanics of <i>E. coli</i> Hsp70 and found four mechanically stable (S class) and three unstable substructures (U class). Of the total 603 residues in the folded domains of Hsp70, 234 residues belong to one of four mechanically stable substructures, and 369 residues belong to one of three unstable substructures. Here our goal is to develop a machine learning model to categorize Hsp70 residues using sequence information. We applied three supervised methods: logistic regression (LR), random forest, and support vector machine. The LR method showed the highest accuracy, 0.925, to predict the correct class of a particular residue only when context-dependent physico-chemical features were included. The cross-validation of the LR model yielded a prediction accuracy of 0.879 and revealed that most of the misclassified residues lie at the borders between substructures. We foresee machine learning models being used to identify stable substructures as candidates for building blocks to engineer new proteins.https://www.mdpi.com/2079-4991/11/9/2198Hsp70substructuresphysico-chemical featuresmachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Michal Gala
Gabriel Žoldák
spellingShingle Michal Gala
Gabriel Žoldák
Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
Nanomaterials
Hsp70
substructures
physico-chemical features
machine learning
author_facet Michal Gala
Gabriel Žoldák
author_sort Michal Gala
title Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
title_short Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
title_full Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
title_fullStr Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
title_full_unstemmed Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone
title_sort classifying residues in mechanically stable and unstable substructures based on a protein sequence: the case study of the dnak hsp70 chaperone
publisher MDPI AG
series Nanomaterials
issn 2079-4991
publishDate 2021-08-01
description Artificial proteins can be constructed from stable substructures, whose stability is encoded in their protein sequence. Identifying stable protein substructures experimentally is the only available option at the moment because no suitable method exists to extract this information from a protein sequence. In previous research, we examined the mechanics of <i>E. coli</i> Hsp70 and found four mechanically stable (S class) and three unstable substructures (U class). Of the total 603 residues in the folded domains of Hsp70, 234 residues belong to one of four mechanically stable substructures, and 369 residues belong to one of three unstable substructures. Here our goal is to develop a machine learning model to categorize Hsp70 residues using sequence information. We applied three supervised methods: logistic regression (LR), random forest, and support vector machine. The LR method showed the highest accuracy, 0.925, to predict the correct class of a particular residue only when context-dependent physico-chemical features were included. The cross-validation of the LR model yielded a prediction accuracy of 0.879 and revealed that most of the misclassified residues lie at the borders between substructures. We foresee machine learning models being used to identify stable substructures as candidates for building blocks to engineer new proteins.
topic Hsp70
substructures
physico-chemical features
machine learning
url https://www.mdpi.com/2079-4991/11/9/2198
work_keys_str_mv AT michalgala classifyingresiduesinmechanicallystableandunstablesubstructuresbasedonaproteinsequencethecasestudyofthednakhsp70chaperone
AT gabrielzoldak classifyingresiduesinmechanicallystableandunstablesubstructuresbasedonaproteinsequencethecasestudyofthednakhsp70chaperone
_version_ 1716869783187619840