Misfolding Dominates Protein Evolution

The diverse array of protein functions depends upon these molecules' reliable ability to fold into the native structures determined by their amino-acid sequences. Because mutations that alter a protein's sequence frequently disrupt its folding, protein evolution explores protein sequence...

Full description

Bibliographic Details
Main Author: Drummond, David Allan
Format: Others
Published: 2006
Online Access:https://thesis.library.caltech.edu/2404/1/drummond-thesis.pdf
Drummond, David Allan (2006) Misfolding Dominates Protein Evolution. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/DH8E-2N10. https://resolver.caltech.edu/CaltechETD:etd-06022006-154329 <https://resolver.caltech.edu/CaltechETD:etd-06022006-154329>
Description
Summary:The diverse array of protein functions depends upon these molecules' reliable ability to fold into the native structures determined by their amino-acid sequences. Because mutations that alter a protein's sequence frequently disrupt its folding, protein evolution explores protein sequence space conservatively, either by point mutations or recombination between related sequences. Attempts to engineer proteins by co-opting the evolutionary algorithm have also largely proceeded by the stepwise accumulation of beneficial mutations. Other strategies for directed evolution have focused on introducing many mutations at once as a way to increase the likelihood of finding improved variants, attempting to balance higher mutational diversity with lower retention of folding. Using simple models, I explore this tradeoff and find that protein misfolding dominates whether increasing mutation levels increase the number of improved variants. I analyze results of a popular mutagenesis protocol, error-prone PCR, for evidence that coupling between mutations might favor higher mutation levels, as claimed by several groups. A comparison of high-mutation-rate mutagenesis to protein recombination between distantly related proteins reveals qualitative differences in protein tolerance for sequence changes introduced by each method. Mutational tolerance may also be reflected in the rate at which proteins accumulate sequence changes over evolutionary time; why proteins evolve at different rates remains a major open question in biology. An analysis of rate determinants suggests that one major variable, linked to how highly expressed the encoding gene is, dominates the rate of yeast protein evolution. To explain this trend, I hypothesize that proteins are selected to fold properly despite mistranslation, a property I call translational robustness, and test it using genomic data. To examine protein evolution at a higher level of detail, a large-scale simulation is constructed in which simulated organisms, with genomes containing genes expressing computationally foldable proteins at different levels, evolve over millions of generations with protein misfolding imposing the only fitness cost. The results suggest that protein misfolding suffices to explain many significant trends in genome evolution observed across taxa, predict a novel genomic trend which is then identified in yeast, and create insight into the causes of evolutionary rate variation in proteins.