Summary: | Thesis: Ph. D., Harvard-MIT Program in Health Sciences and Technology, 2016. === Cataloged from PDF version of thesis. === Includes bibliographical references (pages 225-251). === A central goal in genomics is to understand the genetic variants that underlie molecular changes and lead to disease. Recent studies have identified thousands of genetic loci associated with human phenotypes. These have primarily analyzed point mutations, ignoring more complex types of variation. Here we focus on Short Tandem Repeats (STRs) as a model for complex variation. STRs are comprised of repeating motifs of 1-6bp that span over 1% of the human genome. The level of STR variation and its effect on phenotypes remains mostly uncharted, mainly due to the difficulty in accurately genotyping STRs on a large scale. To overcome bioinformatic challenges in STR genotyping, we developed lobSTR, an algorithm for profiling STRs from high throughput sequencing data. lobSTR employs a unique mapping strategy to rapidly align repetitive reads, and uses statistical learning techniques to account for STR-specific noise patterns. We applied lobSTR to generate the largest and highest quality STR catalog to date. This provided the first characterization of more than a million loci and gave novel insights into population-wide trends of STR variation. We used our catalog to conduct a genome-wide analysis of the contribution of STRs to gene expression in humans. This revealed that STRs explain 10-15% of the cis heritability of expression mediated by common variants and potentially play a role in various clinically relevant conditions. Overall these studies highlight the contribution of STRs to the genetic architecture of quantitative traits. We anticipate that integrating repetitive elements, specifically STRs, into genome-wide analyses will lead to the discovery of new genetic variants relevant to human conditions. === by Melissa A. Gymrek. === Ph. D.
|