Mapping Bisulfite-Treated Short DNA Reads

Epigenetics are stable heritable traits that are not a result of the DNA sequence. Epigenetic modification of DNA cytosine plays a role in development and disease. The covalent bonding of a methyl group or a hydroxymethyl group to the 5-carbon of cytosine epigenetically modifies cytosine to 5-meth...

Full description

Bibliographic Details
Main Author: Porter, Jacob Stuart
Other Authors: Computer Science
Format: Others
Published: Virginia Tech 2018
Subjects:
Online Access:http://hdl.handle.net/10919/82870
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-82870
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-828702020-09-29T05:37:23Z Mapping Bisulfite-Treated Short DNA Reads Porter, Jacob Stuart Computer Science Zhang, Liqing Yiu, Siuming Xie, Hehuang David Watson, Layne T. Heath, Lenwood S. Wu, Xiaowei DNA read alignment hairpin whole genome bisulfite indels bisulfite Ion Torrent BisPin BFAST-Gap Epigenetics are stable heritable traits that are not a result of the DNA sequence. Epigenetic modification of DNA cytosine plays a role in development and disease. The covalent bonding of a methyl group or a hydroxymethyl group to the 5-carbon of cytosine epigenetically modifies cytosine to 5-methylcytosine or 5-hydroxymethylcytosine. Upon PCR amplification, the bisulfite treatment of DNA converts unmethylated cytosine to thymine, while 5-methylcytosine, 5-hydroxymethylcytosine, and other bases remain unchanged. The resulting sequences can be mapped to a reference genome; however, this can be challenging due to sequencing technology complexity, low sequence complexity, and biases and errors introduced with bisulfite treatment. Once the short read is mapped, the identity of 5-methylcytosine or 5-hydroxymethylcytosine can be determined by comparing the mapped read to the aligned reference genome. Bisulfite DNA read mapping is characterized by mapping performance as low as 40%. This research improves bisulfite short read mapping quality. First, reads generated from the bisulfite hairpin PCR protocol are used to study mapping failure and solutions. A read may not map to the genome; it may map uniquely, or it may map to multiple locations. Sequence complexity correlates with these mapping categories. The hairpin protocol allows for a recovery, in some cases, of the original untreated read, and mapping this read with the regular read mapper Bowtie2 improved mapper performance by 10%. New bisulfite read mapping software called BisPin was created that calls BFAST (BLAT-like Fast Accurate Search Tool) for mapping. BisPin resolves ambiguously mapped reads with a rescoring strategy, which yields a statistically significant improvement. BFAST-Gap for Ion Torrent reads was developed, since Ion Torrent machines are less expensive than Illumina machines and since Ion Torrent reads are longer. There are few mappers for Ion Torrent data. BFAST-Gap uses homopolymer run length for contextual gap penalty functions, since homopolymer runs cause errors in Ion Torrent reads. In conjunction with BisPin, this software performed well on real and simulated bisulfite Ion Torrent data and Illumina data. InfoTrim, a read trimmer with an entropy term, was developed with competitive results. Ph. D. 2018-04-24T08:00:55Z 2018-04-24T08:00:55Z 2018-04-23 Dissertation vt_gsexam:14990 http://hdl.handle.net/10919/82870 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf application/pdf application/x-zip-compressed Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic DNA read alignment
hairpin whole genome bisulfite
indels
bisulfite Ion Torrent
BisPin
BFAST-Gap
spellingShingle DNA read alignment
hairpin whole genome bisulfite
indels
bisulfite Ion Torrent
BisPin
BFAST-Gap
Porter, Jacob Stuart
Mapping Bisulfite-Treated Short DNA Reads
description Epigenetics are stable heritable traits that are not a result of the DNA sequence. Epigenetic modification of DNA cytosine plays a role in development and disease. The covalent bonding of a methyl group or a hydroxymethyl group to the 5-carbon of cytosine epigenetically modifies cytosine to 5-methylcytosine or 5-hydroxymethylcytosine. Upon PCR amplification, the bisulfite treatment of DNA converts unmethylated cytosine to thymine, while 5-methylcytosine, 5-hydroxymethylcytosine, and other bases remain unchanged. The resulting sequences can be mapped to a reference genome; however, this can be challenging due to sequencing technology complexity, low sequence complexity, and biases and errors introduced with bisulfite treatment. Once the short read is mapped, the identity of 5-methylcytosine or 5-hydroxymethylcytosine can be determined by comparing the mapped read to the aligned reference genome. Bisulfite DNA read mapping is characterized by mapping performance as low as 40%. This research improves bisulfite short read mapping quality. First, reads generated from the bisulfite hairpin PCR protocol are used to study mapping failure and solutions. A read may not map to the genome; it may map uniquely, or it may map to multiple locations. Sequence complexity correlates with these mapping categories. The hairpin protocol allows for a recovery, in some cases, of the original untreated read, and mapping this read with the regular read mapper Bowtie2 improved mapper performance by 10%. New bisulfite read mapping software called BisPin was created that calls BFAST (BLAT-like Fast Accurate Search Tool) for mapping. BisPin resolves ambiguously mapped reads with a rescoring strategy, which yields a statistically significant improvement. BFAST-Gap for Ion Torrent reads was developed, since Ion Torrent machines are less expensive than Illumina machines and since Ion Torrent reads are longer. There are few mappers for Ion Torrent data. BFAST-Gap uses homopolymer run length for contextual gap penalty functions, since homopolymer runs cause errors in Ion Torrent reads. In conjunction with BisPin, this software performed well on real and simulated bisulfite Ion Torrent data and Illumina data. InfoTrim, a read trimmer with an entropy term, was developed with competitive results. === Ph. D.
author2 Computer Science
author_facet Computer Science
Porter, Jacob Stuart
author Porter, Jacob Stuart
author_sort Porter, Jacob Stuart
title Mapping Bisulfite-Treated Short DNA Reads
title_short Mapping Bisulfite-Treated Short DNA Reads
title_full Mapping Bisulfite-Treated Short DNA Reads
title_fullStr Mapping Bisulfite-Treated Short DNA Reads
title_full_unstemmed Mapping Bisulfite-Treated Short DNA Reads
title_sort mapping bisulfite-treated short dna reads
publisher Virginia Tech
publishDate 2018
url http://hdl.handle.net/10919/82870
work_keys_str_mv AT porterjacobstuart mappingbisulfitetreatedshortdnareads
_version_ 1719344716256903168