Summary: | The central dogma outlines the flow of information within a cell, whereby a DNA sequence is transcribed into RNA, which, in turn, is translated into a protein. This flow is unidirectional, meaning that once constructed, a protein should retain no knowledge of either the RNA or DNA sequence which led to its creation. Due to the degeneracy of the genetic code, multiple synonymous mRNA sequences can result in the same protein being produced. However, an increasing volume of experimental work shows that while these synonymous sequences produce the same amino acid sequence, the encoded proteins may differ in their physical properties. These results suggest that there is information contained in the mRNA sequence pertaining to the structure of the encoded protein above and beyond mere specification of the amino acid sequence. This thesis investigates whether the speed with which a codon is translated biases the protein structure produced. The initial chapters focus on determining a suitable metric for the translation speed, comparing various theoretical estimates to a new experimental measure. Finding that the estimators perform poorly, we construct a transcriptome-wide database relating the experimentally derived translation speeds directly to a large number of experimentally derived protein structures. Using this database to test our hypothesis, we observe various associations between the translation speed and the protein structure produced. Our analysis is the first time that the relationship between translation speed and protein structure has been investigated on a transcriptome-wide scale using purely experimental data. Our findings provide strong support for the cotranslational folding hypothesis which suggests a protein folds while it is being produced.
|