Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>

Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naïve Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The...

Full description

Bibliographic Details
Main Author: David M. Rogers
Format: Article
Language:English
Published: MDPI AG 2020-10-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/22/11/1242
id doaj-9c1ec67370054fabbc027a354ad154a5
record_format Article
spelling doaj-9c1ec67370054fabbc027a354ad154a52020-11-25T04:08:11ZengMDPI AGEntropy1099-43002020-10-01221242124210.3390/e22111242Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>David M. Rogers0National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USAAutomated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naïve Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution’ over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method’s derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.https://www.mdpi.com/1099-4300/22/11/1242Bernoulli mixtureBayesian clusteringunsupervised classification
collection DOAJ
language English
format Article
sources DOAJ
author David M. Rogers
spellingShingle David M. Rogers
Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
Entropy
Bernoulli mixture
Bayesian clustering
unsupervised classification
author_facet David M. Rogers
author_sort David M. Rogers
title Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
title_short Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
title_full Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
title_fullStr Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
title_full_unstemmed Protein Conformational States—A First Principles Bayesian Method <sup>†</sup>
title_sort protein conformational states—a first principles bayesian method <sup>†</sup>
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2020-10-01
description Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naïve Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a `distribution’ over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method’s derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.
topic Bernoulli mixture
Bayesian clustering
unsupervised classification
url https://www.mdpi.com/1099-4300/22/11/1242
work_keys_str_mv AT davidmrogers proteinconformationalstatesafirstprinciplesbayesianmethodsupsup
_version_ 1724426405973852160