Enabling clinical genomics by reducing false discovery in next-generation sequencing data

Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would...

Full description

Bibliographic Details
Main Author:	Robasky, Kimberly
Language:	en_US
Published:	Boston University 2015
Online Access:	https://hdl.handle.net/2144/12838

id	ndltd-bu.edu-oai-open.bu.edu-2144-12838
record_format	oai_dc
spelling	ndltd-bu.edu-oai-open.bu.edu-2144-128382021-03-18T17:01:29Z Enabling clinical genomics by reducing false discovery in next-generation sequencing data Robasky, Kimberly Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. Next-generation sequencing technologies are ushering in the next generation of clinical diagnostics. However, even minute sequencing error rates can make for unwieldy numbers of false positives in single-genome variation analysis, potentially requiring prioritization and validation of hundreds of errors per patient. In order to interpret accurately the variation in an individual whole human genome, it is essential to fully characterize the quality of the data being interpreted. Here I present methods for improving the accuracy of next-generation sequencing variant calls, as well as assessing the specificity, sensitivity and thresholding of those calls. In particular, I present an algorithm for detecting heterozygous deletions that has clinical relevance to the most prevalent neuro-degenerative disease, neuronal ceroid lipofuscinosis (NCL). I describe a platform-independent method for choosing variant calling thresholds, and I present a toolkit for calibrating sequencing quality scores by applying this method to genome replicates(mkSProC). I illustrate the specificity and sensitivity of variables influencing phase confidence to enable targeted experimental phasing and also to quantify confidence in computationally finishing experimental phasing. I combine experimental phasing results with expression data to find allele-specifically expressed (ASE) genes, and describe a feature that I added to a web server of regulatory-motif binding sites (UniPROBE) that can be used for, among other things, finding motifs to potentially explain ASE. Applying the methods I describe to genomic sequence data, expression data and phase data will further our understanding of causal variation and reduce experimental costs through targeted validation. 2015-08-07T03:35:55Z 2015-08-07T03:35:55Z 2013 2013 Thesis/Dissertation (ALMA)contemp https://hdl.handle.net/2144/12838 en_US Boston University
collection	NDLTD
language	en_US
sources	NDLTD
description	Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. === Next-generation sequencing technologies are ushering in the next generation of clinical diagnostics. However, even minute sequencing error rates can make for unwieldy numbers of false positives in single-genome variation analysis, potentially requiring prioritization and validation of hundreds of errors per patient. In order to interpret accurately the variation in an individual whole human genome, it is essential to fully characterize the quality of the data being interpreted. Here I present methods for improving the accuracy of next-generation sequencing variant calls, as well as assessing the specificity, sensitivity and thresholding of those calls. In particular, I present an algorithm for detecting heterozygous deletions that has clinical relevance to the most prevalent neuro-degenerative disease, neuronal ceroid lipofuscinosis (NCL). I describe a platform-independent method for choosing variant calling thresholds, and I present a toolkit for calibrating sequencing quality scores by applying this method to genome replicates(mkSProC). I illustrate the specificity and sensitivity of variables influencing phase confidence to enable targeted experimental phasing and also to quantify confidence in computationally finishing experimental phasing. I combine experimental phasing results with expression data to find allele-specifically expressed (ASE) genes, and describe a feature that I added to a web server of regulatory-motif binding sites (UniPROBE) that can be used for, among other things, finding motifs to potentially explain ASE. Applying the methods I describe to genomic sequence data, expression data and phase data will further our understanding of causal variation and reduce experimental costs through targeted validation.
author	Robasky, Kimberly
spellingShingle	Robasky, Kimberly Enabling clinical genomics by reducing false discovery in next-generation sequencing data
author_facet	Robasky, Kimberly
author_sort	Robasky, Kimberly
title	Enabling clinical genomics by reducing false discovery in next-generation sequencing data
title_short	Enabling clinical genomics by reducing false discovery in next-generation sequencing data
title_full	Enabling clinical genomics by reducing false discovery in next-generation sequencing data
title_fullStr	Enabling clinical genomics by reducing false discovery in next-generation sequencing data
title_full_unstemmed	Enabling clinical genomics by reducing false discovery in next-generation sequencing data
title_sort	enabling clinical genomics by reducing false discovery in next-generation sequencing data
publisher	Boston University
publishDate	2015
url	https://hdl.handle.net/2144/12838
work_keys_str_mv	AT robaskykimberly enablingclinicalgenomicsbyreducingfalsediscoveryinnextgenerationsequencingdata
_version_	1719384150085992448

Enabling clinical genomics by reducing false discovery in next-generation sequencing data

Similar Items