Summary: | Protein interactions, both amongst themselves and with other molecules, are responsible for much of the work within the cellular machine. As the number of protein interaction data sets grow in number and in size, from experiments such as Yeast 2-Hybrid or Affinity Purification followed by Mass Spectrometry, there is a need to analyse the data both quantitatively and qualitatively. One area of research is determining how reliable a report of a protein interaction is – whether it could be reproduced if the experiment were repeated, or if it were tested using an independent assay. One might aim to score each reported interaction using a quantitative measure of reliability. Ultimately, protein interactions need to be addressed at the systems level where both the dynamic and functional nature of protein complexes and other types of interactions is ascertained. In this dissertation, I present two methodological developments that are useful towards elucidating the nature of protein interaction graphs in the model organism <i>Saccharomyces cerevisiae</i>. The first one aims to estimate the <i>sensitivity</i> and <i>specificity</i> of a protein interaction data set, and does that, as much as possible, by looking at the data set’s internal consistency and reproducibility. The second method aims to estimate the node degree distribution, using a multinomial model which is fit by maximum likelihood. In the development of the methods for the analysis of the protein interactions, computational tools were built in the statistical environment R. Such tools are necessary for the implementation of each analytic step, for rendering visualisations of intermediate and conclusive results, and for the construction of optimal work-flows so as to make our research reproducible and extensible. We have also included such a work-flow in this dissertation as well as the software engineering component of the research.
|