Summary: | Thesis (Ph.D.)--Boston University
PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. === Next-generation sequencing technologies are ushering in the next generation of clinical diagnostics. However, even minute sequencing error rates can make for unwieldy numbers of false positives in single-genome variation analysis, potentially requiring prioritization and validation of hundreds of errors per patient. In order to interpret accurately the variation in an individual whole human genome, it is essential to fully characterize the quality of the data being interpreted. Here I present methods for improving the accuracy of next-generation sequencing variant calls, as well as assessing the specificity, sensitivity and thresholding of those calls. In particular, I present an algorithm for detecting heterozygous deletions that has clinical relevance to the most prevalent neuro-degenerative disease, neuronal ceroid lipofuscinosis (NCL). I describe a platform-independent method for choosing variant calling thresholds, and I present a toolkit for calibrating sequencing quality scores by applying this method to genome replicates(mkSProC). I illustrate the specificity and sensitivity of variables influencing phase confidence to enable targeted experimental phasing and also to quantify confidence in computationally finishing experimental phasing. I combine experimental phasing results with expression data to find allele-specifically expressed (ASE) genes, and describe a feature that I added to a web server of regulatory-motif binding sites (UniPROBE) that can be used for, among other things, finding motifs to potentially explain ASE. Applying the methods I describe to genomic sequence data, expression data and phase data will further our understanding of causal variation and reduce experimental costs through targeted validation.
|