Analysis of association studies and inference of haplotypic phase using hidden Markov models

In this thesis I focus on the development and application of hidden Markov model (HMM) to solve problems in statistical genetics. Our method, based on a HMM, models the joint haplotype structure in the samples, where the parameters in the model are estimated by the Baum-Welch EM algorithm. Also, the...

Full description

Bibliographic Details
Main Author: Su, Shu Yi
Other Authors: Coin, Lachlan
Published: Imperial College London 2009
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.513494
Description
Summary:In this thesis I focus on the development and application of hidden Markov model (HMM) to solve problems in statistical genetics. Our method, based on a HMM, models the joint haplotype structure in the samples, where the parameters in the model are estimated by the Baum-Welch EM algorithm. Also, the model does not require pre-defined blocks or a sliding window scheme to define haplotype boundaries. Thus our method is computationally efficient and applicable for either the whole genome sequence or the candidate gene sequence. The first application of this model is for disease association testing using inferred ancestral haplotypes. We employed a HMM to cluster haplotypes into groups of predicted common ancestral haplotypes from diploid genotypes. The results from simulation studies show that our method greatly outperforms single-SNP analyses and has greater power than a haplotype-based method, CLADHC, in most simulation scenarios. The second application is for inferring haplotypic phase and to predict missing genotypes in polyploid organisms. Using a simulation study we demonstrate that the method provides accurate estimates of haplotypic phase and missing genotypes for diploids, triploids and tetraploids. The third application is for joint CNV/SNP haplotype and missing data inference. The results are very encouraging for this application. With the increasing availability of genotype data in both diploid and polyploid organisms, we believe that our programs can facilitate the investigation of genetic variations in genome-wide scale studies.