Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach
Abstract Spatially proximate amino acid positions in a protein tend to co-evolve, so a protein's 3D-structure leaves an echo of correlations in the evolutionary record. Reverse engineering 3D-structures from such correlations is an open problem in structural biology, pursued with increasing vi...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Matematisk statistik
2012
|
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-99181 |
id |
ndltd-UPSALLA1-oai-DiVA.org-kth-99181 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-kth-991812013-01-08T13:43:58ZDetecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approachengEkeberg, MagnusKTH, Matematisk statistik2012Abstract Spatially proximate amino acid positions in a protein tend to co-evolve, so a protein's 3D-structure leaves an echo of correlations in the evolutionary record. Reverse engineering 3D-structures from such correlations is an open problem in structural biology, pursued with increasing vigor as new protein sequences continue to fill the data banks. Within this task lies a statistical stumbling block, rooted in the following: correlation between two amino acid positions can arise from firsthand interaction, but also be network-propagated via intermediate positions; observed correlation is not enough to guarantee proximity. The remedy, and the focus of this thesis, is to mathematically untangle the crisscross of correlations and extract direct interactions, which enables a clean depiction of co-evolution among the positions. Recently, analysts have used maximum-entropy modeling to recast this cause-and-effect puzzle as parameter learning in a Potts model (a kind of Markov random field). Unfortunately, a computationally expensive partition function puts this out of reach of straightforward maximum-likelihood estimation. Mean-field approximations have been used, but an arsenal of other approximate schemes exists. In this work, we re-implement an existing contact-detection procedure and replace its mean-field calculations with pseudo-likelihood maximization. We then feed both routines real protein data and highlight differences between their respective outputs. Our new program seems to offer a systematic boost in detection accuracy. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-99181Trita-MAT, 1401-2286 ; 14application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
description |
Abstract Spatially proximate amino acid positions in a protein tend to co-evolve, so a protein's 3D-structure leaves an echo of correlations in the evolutionary record. Reverse engineering 3D-structures from such correlations is an open problem in structural biology, pursued with increasing vigor as new protein sequences continue to fill the data banks. Within this task lies a statistical stumbling block, rooted in the following: correlation between two amino acid positions can arise from firsthand interaction, but also be network-propagated via intermediate positions; observed correlation is not enough to guarantee proximity. The remedy, and the focus of this thesis, is to mathematically untangle the crisscross of correlations and extract direct interactions, which enables a clean depiction of co-evolution among the positions. Recently, analysts have used maximum-entropy modeling to recast this cause-and-effect puzzle as parameter learning in a Potts model (a kind of Markov random field). Unfortunately, a computationally expensive partition function puts this out of reach of straightforward maximum-likelihood estimation. Mean-field approximations have been used, but an arsenal of other approximate schemes exists. In this work, we re-implement an existing contact-detection procedure and replace its mean-field calculations with pseudo-likelihood maximization. We then feed both routines real protein data and highlight differences between their respective outputs. Our new program seems to offer a systematic boost in detection accuracy. |
author |
Ekeberg, Magnus |
spellingShingle |
Ekeberg, Magnus Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
author_facet |
Ekeberg, Magnus |
author_sort |
Ekeberg, Magnus |
title |
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
title_short |
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
title_full |
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
title_fullStr |
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
title_full_unstemmed |
Detecting contacts in protein folds by solving the inverse Potts problem - a pseudolikelihood approach |
title_sort |
detecting contacts in protein folds by solving the inverse potts problem - a pseudolikelihood approach |
publisher |
KTH, Matematisk statistik |
publishDate |
2012 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-99181 |
work_keys_str_mv |
AT ekebergmagnus detectingcontactsinproteinfoldsbysolvingtheinversepottsproblemapseudolikelihoodapproach |
_version_ |
1716527469343801344 |