Mathematical and statistical models for the analysis of protein

Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spect...

Full description

Bibliographic Details
Main Author:	Chiang, T.
Published:	University of Cambridge 2011
Subjects:	572.6
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600

id	ndltd-bl.uk-oai-ethos.bl.uk-597600
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-5976002015-03-20T06:01:32ZMathematical and statistical models for the analysis of proteinChiang, T.2011Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spectrometry, there is a need to analyse the data both quantitatively and qualitatively. One area of research is determining how reliable a report of a protein interaction is – whether it could be reproduced if the experiment were repeated, or if it were tested using an independent assay. One might aim to score each reported interaction using a quantitative measure of reliability. Ultimately, protein interactions need to be addressed at the systems level where both the dynamic and functional nature of protein complexes and other types of interactions is ascertained. In this dissertation, I present two methodological developments that are useful towards elucidating the nature of protein interaction graphs in the model organism <i>Saccharomyces cerevisiae</i>. The first one aims to estimate the <i>sensitivity</i> and <i>specificity</i> of a protein interaction data set, and does that, as much as possible, by looking at the data set’s internal consistency and reproducibility. The second method aims to estimate the node degree distribution, using a multinomial model which is fit by maximum likelihood. In the development of the methods for the analysis of the protein interactions, computational tools were built in the statistical environment R. Such tools are necessary for the implementation of each analytic step, for rendering visualisations of intermediate and conclusive results, and for the construction of optimal work-flows so as to make our research reproducible and extensible. We have also included such a work-flow in this dissertation as well as the software engineering component of the research.572.6University of Cambridgehttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	572.6
spellingShingle	572.6 Chiang, T. Mathematical and statistical models for the analysis of protein
description	Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spectrometry, there is a need to analyse the data both quantitatively and qualitatively. One area of research is determining how reliable a report of a protein interaction is – whether it could be reproduced if the experiment were repeated, or if it were tested using an independent assay. One might aim to score each reported interaction using a quantitative measure of reliability. Ultimately, protein interactions need to be addressed at the systems level where both the dynamic and functional nature of protein complexes and other types of interactions is ascertained. In this dissertation, I present two methodological developments that are useful towards elucidating the nature of protein interaction graphs in the model organism <i>Saccharomyces cerevisiae</i>. The first one aims to estimate the <i>sensitivity</i> and <i>specificity</i> of a protein interaction data set, and does that, as much as possible, by looking at the data set’s internal consistency and reproducibility. The second method aims to estimate the node degree distribution, using a multinomial model which is fit by maximum likelihood. In the development of the methods for the analysis of the protein interactions, computational tools were built in the statistical environment R. Such tools are necessary for the implementation of each analytic step, for rendering visualisations of intermediate and conclusive results, and for the construction of optimal work-flows so as to make our research reproducible and extensible. We have also included such a work-flow in this dissertation as well as the software engineering component of the research.
author	Chiang, T.
author_facet	Chiang, T.
author_sort	Chiang, T.
title	Mathematical and statistical models for the analysis of protein
title_short	Mathematical and statistical models for the analysis of protein
title_full	Mathematical and statistical models for the analysis of protein
title_fullStr	Mathematical and statistical models for the analysis of protein
title_full_unstemmed	Mathematical and statistical models for the analysis of protein
title_sort	mathematical and statistical models for the analysis of protein
publisher	University of Cambridge
publishDate	2011
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597600
work_keys_str_mv	AT chiangt mathematicalandstatisticalmodelsfortheanalysisofprotein
_version_	1716795478924853248

Mathematical and statistical models for the analysis of protein

Similar Items