Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering

<p>Understanding the protein fitness landscape is important for describing how natural proteins evolve and for engineering new proteins with useful properties. This mapping from protein sequence to protein function involves an extraordinarily complex balance of numerous physical interactions,...

Full description

Bibliographic Details
Main Author:	Romero, Philip Anthony
Format:	Others
Published:	2012
Online Access:	https://thesis.library.caltech.edu/6852/1/Romero_dissertation.pdf Romero, Philip Anthony (2012) Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/7W9R-Y338. https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929 <https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929>

id	ndltd-CALTECH-oai-thesis.library.caltech.edu-6852
record_format	oai_dc
spelling	ndltd-CALTECH-oai-thesis.library.caltech.edu-68522019-11-26T03:11:21Z Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering Romero, Philip Anthony <p>Understanding the protein fitness landscape is important for describing how natural proteins evolve and for engineering new proteins with useful properties. This mapping from protein sequence to protein function involves an extraordinarily complex balance of numerous physical interactions, many of which are still not well understood. Directed evolution circumvents our ignorance of how a protein’s sequence encodes its function by using iterative rounds of random mutation and artificial selection. The selection criteria is based on experimental measurements, which permits the optimization of protein sequence properties that are not understood. While directed evolution has been useful for exploring protein fitness landscapes, these searches have been relatively local in comparison to the vast space of possible protein sequences. Here, we present several classes of statistical models that map protein sequence space on a larger scale. We use these simple models to interpret data from SCHEMA recombination libraries, understand the evolutionary benefit of intragenic recombination, and design optimized protein sequences. By training on directly on experimental data, these models implicitly capture the numerous and possibly unknown factors that shape the protein fitness landscape. This provides an unrivaled quantitative accuracy across a massive number of protein sequences.</p> 2012 Thesis NonPeerReviewed application/pdf https://thesis.library.caltech.edu/6852/1/Romero_dissertation.pdf https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929 Romero, Philip Anthony (2012) Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/7W9R-Y338. https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929 <https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929> https://thesis.library.caltech.edu/6852/
collection	NDLTD
format	Others
sources	NDLTD
description	<p>Understanding the protein fitness landscape is important for describing how natural proteins evolve and for engineering new proteins with useful properties. This mapping from protein sequence to protein function involves an extraordinarily complex balance of numerous physical interactions, many of which are still not well understood. Directed evolution circumvents our ignorance of how a protein’s sequence encodes its function by using iterative rounds of random mutation and artificial selection. The selection criteria is based on experimental measurements, which permits the optimization of protein sequence properties that are not understood. While directed evolution has been useful for exploring protein fitness landscapes, these searches have been relatively local in comparison to the vast space of possible protein sequences. Here, we present several classes of statistical models that map protein sequence space on a larger scale. We use these simple models to interpret data from SCHEMA recombination libraries, understand the evolutionary benefit of intragenic recombination, and design optimized protein sequences. By training on directly on experimental data, these models implicitly capture the numerous and possibly unknown factors that shape the protein fitness landscape. This provides an unrivaled quantitative accuracy across a massive number of protein sequences.</p>
author	Romero, Philip Anthony
spellingShingle	Romero, Philip Anthony Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
author_facet	Romero, Philip Anthony
author_sort	Romero, Philip Anthony
title	Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
title_short	Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
title_full	Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
title_fullStr	Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
title_full_unstemmed	Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering
title_sort	statistical models of the protein fitness landscape: applications to protein evolution and engineering
publishDate	2012
url	https://thesis.library.caltech.edu/6852/1/Romero_dissertation.pdf Romero, Philip Anthony (2012) Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/7W9R-Y338. https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929 <https://resolver.caltech.edu/CaltechTHESIS:03172012-160452929>
work_keys_str_mv	AT romerophilipanthony statisticalmodelsoftheproteinfitnesslandscapeapplicationstoproteinevolutionandengineering
_version_	1719295646739988480

Statistical Models of the Protein Fitness Landscape: Applications to Protein Evolution and Engineering

Similar Items