Summary: | Genome-wide association study (GWAS) has emerged as an effective method for detecting genetic polymorphisms associated with expressed phenotypes. Over the past decade, GWAS of human traits and diseases has revolutionized the field of complex disease genetics, identifying hundreds of genetic variants associated with several different phenotypes, ranging from metabolic diseases to cardiovascular and neuropsychiatric conditions. These associations have provided fundamental insights into the genetic architecture of disease susceptibility and led to initial forays into clinical applications, particularly in creation of genetic risk scores for improved disease risk prediction and identification of new drug targets for novel drug development. Despite this gratifying success, however, for almost all complex traits, the identified genetic loci explain only a small proportion, generally less than half, of the estimated heritability. A number of alternative explanations have been offered for this, including undetected genetic effects, unaccounted-for environmental factors, and gene–gene and gene–environment interaction effects. Although there is no consensus on these explanations, it is universally acknowledged that a substantial proportion of the trait heritability is attributable to existence of a large number of undetected genetic variants distributed across the entire allele frequency spectrum, each of which has very small to modest effect on the phenotype, and non host-DNA factors that contribute to phenotypic variation. In parallel to host GWAS, the advent of next-generation sequencing technologies (NGS) that enable culture-independent profiling of microbial communities has led to the rediscovery of the microbiome - the collective genome of the microorganisms that inhabit the body - and the emergence of microbiome-wide association studies. These studies have linked the gut microbiome to a variety of human conditions, ranging from neurological conditions, such as Parkinson's disease and autism, to metabolic diseases, such as obesity, diabetes, and cardiovascular disease. Given the critical importance of the microbiome in host phenotype, it is clear that in order to more comprehensively understand the basis of host phenotypic status, both the host's genotype and microbiome information have to be examined. This thesis explores the dissection of microbial taxa and host genetic polymorphisms associated with human complex traits and diseases, and the interaction of human host genetic polymorphisms with the microbiome. Then, a Bayesian statistical framework, based on the Dirichlet process random effects model, is proposed for identifying microbial species associated with host phenotype. The proposed method uses a weighted combination of phylogenetic and radial basis function kernels to model microbial taxa effects, and a non-parametrically defined latent variable to model latent heterogeneity among samples. Philosophically, the non-parametric specification amounts to the addition of an infinite amount of prior information about all fine details of the parameters being modelled; thus represents an attractive strategy. The utility of the method is demonstrated through simulation experiments and application to real microbiome datasets for schizophrenia, HIV/AIDS, and atherosclerosis diseases, where it is shown that the method is not only robust but also has high statistical power for association inference, resulting in a framework that can contribute to our understanding of the link between the microbiome and human diseases. Understanding the human genetic predisposing factors in concert with this link will make human GWAS fulfil its translational potential, from patient stratifcation and disease risk prediction to identification of new biology and drug discovery.
|