Mathematical and statistical models for the analysis of protein

Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spect...

Full description

Bibliographic Details
Main Author: Chiang, T.
Published: University of Cambridge 2011
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600
id ndltd-bl.uk-oai-ethos.bl.uk-597600
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5976002015-03-20T06:01:32ZMathematical and statistical models for the analysis of proteinChiang, T.2011Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spectrometry, there is a need to analyse the data both quantitatively and qualitatively. One area of research is determining how reliable a report of a protein interaction is – whether it could be reproduced if the experiment were repeated, or if it were tested using an independent assay. One might aim to score each reported interaction using a quantitative measure of reliability. Ultimately, protein interactions need to be addressed at the systems level where both the dynamic and functional nature of protein complexes and other types of interactions is ascertained. In this dissertation, I present two methodological developments that are useful towards elucidating the nature of protein interaction graphs in the model organism <i>Saccharomyces cerevisiae</i>. The first one aims to estimate the <i>sensitivity</i> and <i>specificity</i> of a protein interaction data set, and does that, as much as possible, by looking at the data set’s internal consistency and reproducibility. The second method aims to estimate the node degree distribution, using a multinomial model which is fit by maximum likelihood. In the development of the methods for the analysis of the protein interactions, computational tools were built in the statistical environment R. Such tools are necessary for the implementation of each analytic step, for rendering visualisations of intermediate and conclusive results, and for the construction of optimal work-flows so as to make our research reproducible and extensible. We have also included such a work-flow in this dissertation as well as the software engineering component of the research.572.6University of Cambridgehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 572.6
spellingShingle 572.6
Chiang, T.
Mathematical and statistical models for the analysis of protein
description Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spectrometry, there is a need to analyse the data both quantitatively and qualitatively. One area of research is determining how reliable a report of a protein interaction is – whether it could be reproduced if the experiment were repeated, or if it were tested using an independent assay. One might aim to score each reported interaction using a quantitative measure of reliability. Ultimately, protein interactions need to be addressed at the systems level where both the dynamic and functional nature of protein complexes and other types of interactions is ascertained. In this dissertation, I present two methodological developments that are useful towards elucidating the nature of protein interaction graphs in the model organism <i>Saccharomyces cerevisiae</i>. The first one aims to estimate the <i>sensitivity</i> and <i>specificity</i> of a protein interaction data set, and does that, as much as possible, by looking at the data set’s internal consistency and reproducibility. The second method aims to estimate the node degree distribution, using a multinomial model which is fit by maximum likelihood. In the development of the methods for the analysis of the protein interactions, computational tools were built in the statistical environment R. Such tools are necessary for the implementation of each analytic step, for rendering visualisations of intermediate and conclusive results, and for the construction of optimal work-flows so as to make our research reproducible and extensible. We have also included such a work-flow in this dissertation as well as the software engineering component of the research.
author Chiang, T.
author_facet Chiang, T.
author_sort Chiang, T.
title Mathematical and statistical models for the analysis of protein
title_short Mathematical and statistical models for the analysis of protein
title_full Mathematical and statistical models for the analysis of protein
title_fullStr Mathematical and statistical models for the analysis of protein
title_full_unstemmed Mathematical and statistical models for the analysis of protein
title_sort mathematical and statistical models for the analysis of protein
publisher University of Cambridge
publishDate 2011
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600
work_keys_str_mv AT chiangt mathematicalandstatisticalmodelsfortheanalysisofprotein
_version_ 1716795478924853248