An analysis of intrinsically disordered proteins using hidden Markov models and experimental design of stochastic kinetic models

An intrinsically disordered protein (IDP) is a protein without a stable secondary or tertiary structure and just over one third of human proteins can be described as IDPs. There has been shown to be a link between neurodegenerative diseases, cancer and protein misfolding, with many of these misfolde...

Full description

Bibliographic Details
Main Author: Wilkinson, Nina
Published: University of Newcastle upon Tyne 2015
Subjects:
572
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.680316
Description
Summary:An intrinsically disordered protein (IDP) is a protein without a stable secondary or tertiary structure and just over one third of human proteins can be described as IDPs. There has been shown to be a link between neurodegenerative diseases, cancer and protein misfolding, with many of these misfolded proteins being intrinsically disordered. These IDPs may be cytotoxic by interacting and contributing to the aggregation process, which is why cells need to regulate these proteins carefully. Research has shown that hydrophobicity and charge may be important in determining if the amino acid sequence has unstructured areas. We study the sequence structure by rst recoding amino acid sequences according to their hydrophobicity and charge and then tting a hidden Markov model using Markov chain Monte Carlo methods to analyse the sequence structure and use a power posterior analysis to determine the number of distinct transition structures. The results show there to be distinct segment types within the amino acid sequences of the FET proteins which may have biological importance. The location of these segments can be used to guide laboratory work which tests the biological signi cance of these segment types within cells. One particular segment found in the FET proteins has been linked to oncogenic fusion proteins and experimental analysis has shown a link between this segment and oncogenic activity. When conducting an experiment, an experimenter needs to determine when and under what conditions they should take measurements. Often the choice of optimal design is made with respect to some statistical criteria. The aim of this work is to determine, for a stochastic kinetic model, the optimal location of the timepoints at which observations are taken. Commonly the statistical criteria involves maximising a utility function over the prior predictive distribution of possible experimental outcomes. Current methodologies for experimental design for models with intractable likelihoods are very computationally expensive as, within the iterative search for the optimal design, the calculation of the utility function requires the determination of the parameter posterior distribution at each iteration. We show how to use delta methods and a Gaussian process as an emulator for the utility to reduce the computational cost and illustrate their application for the simple death process and the Lotka{Volterra model.