Using first passage statistics to extract environmentally dependent amino acid correlations.

In this work, we study the first passage statistics of amino acid primary sequences, that is the probability of observing an amino acid for the first time at a certain number of residues away from a fixed amino acid. By using this rich mathematical framework, we are able to capture the background di...

Full description

Bibliographic Details
Main Authors: Benjamin D Greenbaum, Pradeep Kumar, Albert Libchaber
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4084998?pdf=render
id doaj-7ed7b55835234725bea577a0e5ba780c
record_format Article
spelling doaj-7ed7b55835234725bea577a0e5ba780c2020-11-24T21:50:56ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0197e10166510.1371/journal.pone.0101665Using first passage statistics to extract environmentally dependent amino acid correlations.Benjamin D GreenbaumPradeep KumarAlbert LibchaberIn this work, we study the first passage statistics of amino acid primary sequences, that is the probability of observing an amino acid for the first time at a certain number of residues away from a fixed amino acid. By using this rich mathematical framework, we are able to capture the background distribution for an organism, and infer lengths at which the first passage has a probability that differs from what is expected. While many features of an organism's genome are due to natural selection, others are related to amino acid chemistry and the environment in which an organism lives, constraining the randomness of genomes upon which selection can further act. We therefore use this approach to infer amino acid correlations, and then study how these correlations vary across a wide range of organisms under a wide range of optimal growth temperatures. We find a nearly universal exponential background distribution, consistent with the idea that most amino acids are globally uncorrelated from other amino acids in genomes. When we are able to extract significant correlations, these correlations are reliably dependent on optimal growth temperature, across phylogenetic boundaries. Some of the correlations we extract, such as the enhanced probability of finding, for the first time, a cysteine three residues away from a cysteine or glutamic acid two residues away from an arginine, likely relate to thermal stability. However, other correlations, likely appearing on alpha helical surfaces, have a less clear physiochemical interpretation and may relate to thermal stability or unusual metabolic properties of organisms that live in a high temperature environment.http://europepmc.org/articles/PMC4084998?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Benjamin D Greenbaum
Pradeep Kumar
Albert Libchaber
spellingShingle Benjamin D Greenbaum
Pradeep Kumar
Albert Libchaber
Using first passage statistics to extract environmentally dependent amino acid correlations.
PLoS ONE
author_facet Benjamin D Greenbaum
Pradeep Kumar
Albert Libchaber
author_sort Benjamin D Greenbaum
title Using first passage statistics to extract environmentally dependent amino acid correlations.
title_short Using first passage statistics to extract environmentally dependent amino acid correlations.
title_full Using first passage statistics to extract environmentally dependent amino acid correlations.
title_fullStr Using first passage statistics to extract environmentally dependent amino acid correlations.
title_full_unstemmed Using first passage statistics to extract environmentally dependent amino acid correlations.
title_sort using first passage statistics to extract environmentally dependent amino acid correlations.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description In this work, we study the first passage statistics of amino acid primary sequences, that is the probability of observing an amino acid for the first time at a certain number of residues away from a fixed amino acid. By using this rich mathematical framework, we are able to capture the background distribution for an organism, and infer lengths at which the first passage has a probability that differs from what is expected. While many features of an organism's genome are due to natural selection, others are related to amino acid chemistry and the environment in which an organism lives, constraining the randomness of genomes upon which selection can further act. We therefore use this approach to infer amino acid correlations, and then study how these correlations vary across a wide range of organisms under a wide range of optimal growth temperatures. We find a nearly universal exponential background distribution, consistent with the idea that most amino acids are globally uncorrelated from other amino acids in genomes. When we are able to extract significant correlations, these correlations are reliably dependent on optimal growth temperature, across phylogenetic boundaries. Some of the correlations we extract, such as the enhanced probability of finding, for the first time, a cysteine three residues away from a cysteine or glutamic acid two residues away from an arginine, likely relate to thermal stability. However, other correlations, likely appearing on alpha helical surfaces, have a less clear physiochemical interpretation and may relate to thermal stability or unusual metabolic properties of organisms that live in a high temperature environment.
url http://europepmc.org/articles/PMC4084998?pdf=render
work_keys_str_mv AT benjamindgreenbaum usingfirstpassagestatisticstoextractenvironmentallydependentaminoacidcorrelations
AT pradeepkumar usingfirstpassagestatisticstoextractenvironmentallydependentaminoacidcorrelations
AT albertlibchaber usingfirstpassagestatisticstoextractenvironmentallydependentaminoacidcorrelations
_version_ 1725881522533171200