Expanding the horizons of next generation sequencing with RUFUS

Thesis advisor: Gabor T. Marth === To help improve the analysis of forward genetic screens, we have developed an efficient and automated pipeline for mutational profiling using our reference guided tools including MOSAIK and FREEBAYES. Studies using next generation sequencing technologies currently...

Full description

Bibliographic Details
Main Author: Farrell, Andrew R.
Format: Others
Language:English
Published: Boston College 2014
Subjects:
Online Access:http://hdl.handle.net/2345/bc-ir:104176
id ndltd-BOSTON-oai-dlib.bc.edu-bc-ir_104176
record_format oai_dc
spelling ndltd-BOSTON-oai-dlib.bc.edu-bc-ir_1041762019-05-10T07:37:01Z Expanding the horizons of next generation sequencing with RUFUS Farrell, Andrew R. Thesis advisor: Gabor T. Marth Text thesis 2014 Boston College English electronic application/pdf To help improve the analysis of forward genetic screens, we have developed an efficient and automated pipeline for mutational profiling using our reference guided tools including MOSAIK and FREEBAYES. Studies using next generation sequencing technologies currently employ either reference guided alignment or de novo assembly to analyze the massive amount of short read data produced by second generation sequencing technologies; the far more common approach being reference guided alignment due to the massive computational and sequencing costs associated with de novo assembly. The success of reference guided alignment is dependent on three factors; the accuracy of the reference, the ability of the mapper to correctly place a read, and the degree to which a variant allele differs from the reference. Reference assemblies are not perfect and none are entirely complete. Moreover, read mappers can only map reads in genomic locations that are unique enough to confidently place reads; paralogous sections, such as related gene families, cannot be characterized and are often ignored. Further, variant alleles that drastically alter the subject's DNA, such as insertions or deletions (INDELs), will not map to the reference and are either entirely missed or require further downstream analysis to characterize. Most importantly, reference guided methods are restricted to organisms for which such reference genomes have been assembled. The current alternative, de novo assembly of a genome, is prohibitively expensive for most labs requiring deep read coverage from numerous different library preparations as well as massive computing power. To address the shortcomings of current methods, while eliminating the costs intrinsic to de novo sequence assembly, we developed RUFUS, a novel, completely reference-independent variant discovery tool. RUFUS directly compares raw sequence data from two or more samples and identifies groups of reads unique to one or the other sample. RUFUS has at least the same variant detection sensitivity as mapping methods, with greatly increased specificity for SNPs and INDEL variation events. RUFUS is also capable of extremely sensitive copy number detection, without any restriction on event length. By modeling the underlying k-mer distribution, RUFUS produces a specific copy number spectrum for each individual sample. Applying a Bayesian detection method to detect changes in k-mer content between two samples, RUFUS produces copy number calls that are equally as sensitive as traditional copy number detection methods with far fewer false positives. Our data suggest that RUFUS' reference-free approach to variant discovery is able to substantially improve upon existing variant detection methods: reducing reference biases, reducing false positive variants, and detecting copy number variants with excellent sensitivity and specificity. k-mer nextgeneration sequencing Reference-Free RUFUS Varient detection Whole Genome Sequencing Copyright is held by the author, with all rights reserved, unless otherwise noted. Thesis (PhD) — Boston College, 2014. Submitted to: Boston College. Graduate School of Arts and Sciences. Discipline: Biology. http://hdl.handle.net/2345/bc-ir:104176
collection NDLTD
language English
format Others
sources NDLTD
topic k-mer
nextgeneration sequencing
Reference-Free
RUFUS
Varient detection
Whole Genome Sequencing
spellingShingle k-mer
nextgeneration sequencing
Reference-Free
RUFUS
Varient detection
Whole Genome Sequencing
Farrell, Andrew R.
Expanding the horizons of next generation sequencing with RUFUS
description Thesis advisor: Gabor T. Marth === To help improve the analysis of forward genetic screens, we have developed an efficient and automated pipeline for mutational profiling using our reference guided tools including MOSAIK and FREEBAYES. Studies using next generation sequencing technologies currently employ either reference guided alignment or de novo assembly to analyze the massive amount of short read data produced by second generation sequencing technologies; the far more common approach being reference guided alignment due to the massive computational and sequencing costs associated with de novo assembly. The success of reference guided alignment is dependent on three factors; the accuracy of the reference, the ability of the mapper to correctly place a read, and the degree to which a variant allele differs from the reference. Reference assemblies are not perfect and none are entirely complete. Moreover, read mappers can only map reads in genomic locations that are unique enough to confidently place reads; paralogous sections, such as related gene families, cannot be characterized and are often ignored. Further, variant alleles that drastically alter the subject's DNA, such as insertions or deletions (INDELs), will not map to the reference and are either entirely missed or require further downstream analysis to characterize. Most importantly, reference guided methods are restricted to organisms for which such reference genomes have been assembled. The current alternative, de novo assembly of a genome, is prohibitively expensive for most labs requiring deep read coverage from numerous different library preparations as well as massive computing power. To address the shortcomings of current methods, while eliminating the costs intrinsic to de novo sequence assembly, we developed RUFUS, a novel, completely reference-independent variant discovery tool. RUFUS directly compares raw sequence data from two or more samples and identifies groups of reads unique to one or the other sample. RUFUS has at least the same variant detection sensitivity as mapping methods, with greatly increased specificity for SNPs and INDEL variation events. RUFUS is also capable of extremely sensitive copy number detection, without any restriction on event length. By modeling the underlying k-mer distribution, RUFUS produces a specific copy number spectrum for each individual sample. Applying a Bayesian detection method to detect changes in k-mer content between two samples, RUFUS produces copy number calls that are equally as sensitive as traditional copy number detection methods with far fewer false positives. Our data suggest that RUFUS' reference-free approach to variant discovery is able to substantially improve upon existing variant detection methods: reducing reference biases, reducing false positive variants, and detecting copy number variants with excellent sensitivity and specificity. === Thesis (PhD) — Boston College, 2014. === Submitted to: Boston College. Graduate School of Arts and Sciences. === Discipline: Biology.
author Farrell, Andrew R.
author_facet Farrell, Andrew R.
author_sort Farrell, Andrew R.
title Expanding the horizons of next generation sequencing with RUFUS
title_short Expanding the horizons of next generation sequencing with RUFUS
title_full Expanding the horizons of next generation sequencing with RUFUS
title_fullStr Expanding the horizons of next generation sequencing with RUFUS
title_full_unstemmed Expanding the horizons of next generation sequencing with RUFUS
title_sort expanding the horizons of next generation sequencing with rufus
publisher Boston College
publishDate 2014
url http://hdl.handle.net/2345/bc-ir:104176
work_keys_str_mv AT farrellandrewr expandingthehorizonsofnextgenerationsequencingwithrufus
_version_ 1719079487588532224