Methods for Detecting Mutations in Non-model Organisms

abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific er...

Full description

Bibliographic Details
Other Authors: Orr, Adam James (Author)
Format: Doctoral Thesis
Language:English
Published: 2020
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.63039
id ndltd-asu.edu-item-63039
record_format oai_dc
spelling ndltd-asu.edu-item-630392021-01-15T05:01:18Z Methods for Detecting Mutations in Non-model Organisms abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific errors and other non-random biases cannot be de- tected by increased depth. The problem of accurate genotyping is exacerbated when there is not a reference genome or other auxiliary information available. I explore several methods for sensitively detecting mutations in non-model or- ganisms using an example Eucalyptus melliodora individual. I use the structure of the tree to find bounds on its somatic mutation rate and evaluate several algorithms for variant calling. I find that conventional methods are suitable if the genome of a close relative can be adapted to the study organism. However, with structured data, a likelihood framework that is aware of this structure is more accurate. I use the techniques developed here to evaluate a reference-free variant calling algorithm. I also use this data to evaluate a k-mer based base quality score recalibrator (KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing data. Base quality scores can help detect errors in sequencing reads, but are often inaccurate. The most popular method for correcting this issue requires a known set of variant sites, which is unavailable in most cases. I simulate data and show that errors in this set of variant sites can cause calibration errors. I then show that KBBQ accurately recalibrates base quality scores while requiring no reference or other information and performs as well as other methods. Finally, I use the Eucalyptus data to investigate the impact of quality score calibra- tion on the quality of output variant calls and show that improved base quality score calibration increases the sensitivity and reduces the false positive rate of a variant calling algorithm. Dissertation/Thesis Orr, Adam James (Author) Cartwright, Reed (Advisor) Wilson, Melissa (Committee member) Kusumi, Kenro (Committee member) Taylor, Jesse (Committee member) Pfeifer, Susanne (Committee member) Arizona State University (Publisher) Bioinformatics Computer science Biology DNA Sequencing Mutation Quality Scores Sequencing Error Variant Calling eng 270 pages Doctoral Dissertation Molecular and Cellular Biology 2020 Doctoral Dissertation http://hdl.handle.net/2286/R.I.63039 http://rightsstatements.org/vocab/InC/1.0/ 2020
collection NDLTD
language English
format Doctoral Thesis
sources NDLTD
topic Bioinformatics
Computer science
Biology
DNA Sequencing
Mutation
Quality Scores
Sequencing Error
Variant Calling
spellingShingle Bioinformatics
Computer science
Biology
DNA Sequencing
Mutation
Quality Scores
Sequencing Error
Variant Calling
Methods for Detecting Mutations in Non-model Organisms
description abstract: Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates. This can make mutation detection difficult; and while increasing sequencing depth can often help, sequence-specific errors and other non-random biases cannot be de- tected by increased depth. The problem of accurate genotyping is exacerbated when there is not a reference genome or other auxiliary information available. I explore several methods for sensitively detecting mutations in non-model or- ganisms using an example Eucalyptus melliodora individual. I use the structure of the tree to find bounds on its somatic mutation rate and evaluate several algorithms for variant calling. I find that conventional methods are suitable if the genome of a close relative can be adapted to the study organism. However, with structured data, a likelihood framework that is aware of this structure is more accurate. I use the techniques developed here to evaluate a reference-free variant calling algorithm. I also use this data to evaluate a k-mer based base quality score recalibrator (KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing data. Base quality scores can help detect errors in sequencing reads, but are often inaccurate. The most popular method for correcting this issue requires a known set of variant sites, which is unavailable in most cases. I simulate data and show that errors in this set of variant sites can cause calibration errors. I then show that KBBQ accurately recalibrates base quality scores while requiring no reference or other information and performs as well as other methods. Finally, I use the Eucalyptus data to investigate the impact of quality score calibra- tion on the quality of output variant calls and show that improved base quality score calibration increases the sensitivity and reduces the false positive rate of a variant calling algorithm. === Dissertation/Thesis === Doctoral Dissertation Molecular and Cellular Biology 2020
author2 Orr, Adam James (Author)
author_facet Orr, Adam James (Author)
title Methods for Detecting Mutations in Non-model Organisms
title_short Methods for Detecting Mutations in Non-model Organisms
title_full Methods for Detecting Mutations in Non-model Organisms
title_fullStr Methods for Detecting Mutations in Non-model Organisms
title_full_unstemmed Methods for Detecting Mutations in Non-model Organisms
title_sort methods for detecting mutations in non-model organisms
publishDate 2020
url http://hdl.handle.net/2286/R.I.63039
_version_ 1719373023008522240