Methods for Detecting Mutations in Non-model Organisms
abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific er...
Other Authors: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | http://hdl.handle.net/2286/R.I.63039 |
id |
ndltd-asu.edu-item-63039 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-asu.edu-item-630392021-01-15T05:01:18Z Methods for Detecting Mutations in Non-model Organisms abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific errors and other non-random biases cannot be de- tected by increased depth. The problem of accurate genotyping is exacerbated when there is not a reference genome or other auxiliary information available. I explore several methods for sensitively detecting mutations in non-model or- ganisms using an example Eucalyptus melliodora individual. I use the structure of the tree to find bounds on its somatic mutation rate and evaluate several algorithms for variant calling. I find that conventional methods are suitable if the genome of a close relative can be adapted to the study organism. However, with structured data, a likelihood framework that is aware of this structure is more accurate. I use the techniques developed here to evaluate a reference-free variant calling algorithm. I also use this data to evaluate a k-mer based base quality score recalibrator (KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing data. Base quality scores can help detect errors in sequencing reads, but are often inaccurate. The most popular method for correcting this issue requires a known set of variant sites, which is unavailable in most cases. I simulate data and show that errors in this set of variant sites can cause calibration errors. I then show that KBBQ accurately recalibrates base quality scores while requiring no reference or other information and performs as well as other methods. Finally, I use the Eucalyptus data to investigate the impact of quality score calibra- tion on the quality of output variant calls and show that improved base quality score calibration increases the sensitivity and reduces the false positive rate of a variant calling algorithm. Dissertation/Thesis Orr, Adam James (Author) Cartwright, Reed (Advisor) Wilson, Melissa (Committee member) Kusumi, Kenro (Committee member) Taylor, Jesse (Committee member) Pfeifer, Susanne (Committee member) Arizona State University (Publisher) Bioinformatics Computer science Biology DNA Sequencing Mutation Quality Scores Sequencing Error Variant Calling eng 270 pages Doctoral Dissertation Molecular and Cellular Biology 2020 Doctoral Dissertation http://hdl.handle.net/2286/R.I.63039 http://rightsstatements.org/vocab/InC/1.0/ 2020 |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Bioinformatics Computer science Biology DNA Sequencing Mutation Quality Scores Sequencing Error Variant Calling |
spellingShingle |
Bioinformatics Computer science Biology DNA Sequencing Mutation Quality Scores Sequencing Error Variant Calling Methods for Detecting Mutations in Non-model Organisms |
description |
abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm. === Dissertation/Thesis === Doctoral Dissertation Molecular and Cellular Biology 2020 |
author2 |
Orr, Adam James (Author) |
author_facet |
Orr, Adam James (Author) |
title |
Methods for Detecting Mutations in Non-model Organisms |
title_short |
Methods for Detecting Mutations in Non-model Organisms |
title_full |
Methods for Detecting Mutations in Non-model Organisms |
title_fullStr |
Methods for Detecting Mutations in Non-model Organisms |
title_full_unstemmed |
Methods for Detecting Mutations in Non-model Organisms |
title_sort |
methods for detecting mutations in non-model organisms |
publishDate |
2020 |
url |
http://hdl.handle.net/2286/R.I.63039 |
_version_ |
1719373023008522240 |