Summary: | This dissertation presents novel Bayesian Monte Carlo inference techniques to aid in the understanding of gene transcription at the level of transcription factor binding and gene expression. The focus of effort is methodological rather than biological. Chapter 2 introduces the topic and provides a theoretical introduction to both the biology and the statistics. The next three chapters present the thesis contributions, which cover two main projects. The first project considers the precursor to transcription: transcription factors and the identification of their DNA binding sites. Chapter 3 introduces this topic, reviews existing techniques and then motivates, presents and analyses the performance of a new algorithm. The performance of this algorithm raises several questions regarding the modelling aspects of this problem. An analysis of this modelling is the subject of Chapter 4, which identifies one reason for the poor performance of this class of techniques, being the inapplicability of the widely used models. This chapter derives that these models may lead to pathological behaviour of the inference algorithm, and hence that they are inappropriate. The second project considers the product of transcription: messenger ribonucleic acid (mRNA) transcript mixtures. Chapter 5 considers the mixtures of mRNA that evolve over time as a result of transcription. It is clear to the human eye that some mRNA patterns are similar to others. It is of biological interest to group these similar patterns as they are probably under common transcriptional regulation, and hence possibly of related function. This chapter constructs an algorithm to perform blind classification of non-parametric functions, based upon a Dirichlet process mixture of Gaussian processes.
|