Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response

DNA sequencing gives us insight into how viruses adapt to their host immune systems. Studies of viral populations typically employ deep amplicon sequencing with next-generation reads to capture a detailed sample of genetic variation in a population. The high amount of overlapping sites in a multiple...

Full description

Bibliographic Details
Main Author: Nguyen, Thuy
Language:English
Published: University of British Columbia 2016
Online Access:http://hdl.handle.net/2429/59108
id ndltd-UBC-oai-circle.library.ubc.ca-2429-59108
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-591082018-01-05T17:29:17Z Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response Nguyen, Thuy DNA sequencing gives us insight into how viruses adapt to their host immune systems. Studies of viral populations typically employ deep amplicon sequencing with next-generation reads to capture a detailed sample of genetic variation in a population. The high amount of overlapping sites in a multiple sequence alignment of reads from amplicon sequencing form ideal input for phylogenetic reconstruction, a necessary step for studying evolutionary relations in a population. However, the typical short read lengths of < 600 bp from next generation sequencing technology with the best sequence error rate impose a severe limit on the width of genomic regions for which evolutionary relationships can be analyzed. Shotgun sequencing, in which DNA is fragmented at random positions, is an efficient alternative to amplicon sequencing for covering wider regions of a genome with sufficient depth. Due to the random staggered positions of shotgun reads in a genome, an extremely high percentage of missing data can result in multiple sequence alignment of shotgun sequencing. The absence of sequence homology across the entire set of short reads makes it impossible to reconstruct a phylogenetic tree, limiting the utility of shotgun data for phylogenetic analysis. We developed the Umberjack software pipeline, which employs a 'sliding window' approach to minimize the effect of missing data during phylogenetic reconstruction and obtain evolutionary statistics to detect sites under selection. Using Umberjack to measure a new metric of directional selection I, significant directional selection was detected in treatment-naive HIV populations at sites with previously documented associations with cytotoxic T-lymphocyte (CTL) response. Further, substitutions towards wild-type amino acids were found to occur early within the population's history, but rarely occurred at a site after the appearance of a CTL escape mutation. Measuring the same metric I in drug treated HIV populations, the directional selection due to the constant pressure of drug treatment was much greater than the directional selection from the immune system. Science, Faculty of Graduate 2016-09-07T16:28:42Z 2016-09-08T02:02:33 2016 2016-02 Text Thesis/Dissertation http://hdl.handle.net/2429/59108 eng Attribution-NonCommercial 4.0 International http://creativecommons.org/licenses/by-nc/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description DNA sequencing gives us insight into how viruses adapt to their host immune systems. Studies of viral populations typically employ deep amplicon sequencing with next-generation reads to capture a detailed sample of genetic variation in a population. The high amount of overlapping sites in a multiple sequence alignment of reads from amplicon sequencing form ideal input for phylogenetic reconstruction, a necessary step for studying evolutionary relations in a population. However, the typical short read lengths of < 600 bp from next generation sequencing technology with the best sequence error rate impose a severe limit on the width of genomic regions for which evolutionary relationships can be analyzed. Shotgun sequencing, in which DNA is fragmented at random positions, is an efficient alternative to amplicon sequencing for covering wider regions of a genome with sufficient depth. Due to the random staggered positions of shotgun reads in a genome, an extremely high percentage of missing data can result in multiple sequence alignment of shotgun sequencing. The absence of sequence homology across the entire set of short reads makes it impossible to reconstruct a phylogenetic tree, limiting the utility of shotgun data for phylogenetic analysis. We developed the Umberjack software pipeline, which employs a 'sliding window' approach to minimize the effect of missing data during phylogenetic reconstruction and obtain evolutionary statistics to detect sites under selection. Using Umberjack to measure a new metric of directional selection I, significant directional selection was detected in treatment-naive HIV populations at sites with previously documented associations with cytotoxic T-lymphocyte (CTL) response. Further, substitutions towards wild-type amino acids were found to occur early within the population's history, but rarely occurred at a site after the appearance of a CTL escape mutation. Measuring the same metric I in drug treated HIV populations, the directional selection due to the constant pressure of drug treatment was much greater than the directional selection from the immune system. === Science, Faculty of === Graduate
author Nguyen, Thuy
spellingShingle Nguyen, Thuy
Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
author_facet Nguyen, Thuy
author_sort Nguyen, Thuy
title Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
title_short Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
title_full Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
title_fullStr Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
title_full_unstemmed Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response
title_sort overcoming missing data in phylogenetic analysis of shotgun sequencing to detect hiv adaptation to immune response
publisher University of British Columbia
publishDate 2016
url http://hdl.handle.net/2429/59108
work_keys_str_mv AT nguyenthuy overcomingmissingdatainphylogeneticanalysisofshotgunsequencingtodetecthivadaptationtoimmuneresponse
_version_ 1718585390179287040