Bioinformatic approaches for identifying single nucleotide variants and profiling alternative expression in cancer transcriptomes

Over the last decade, the advent of high-throughput sequencing (HTS) has given us the ability to study DNA and RNA sequences at nucleotide resolution at an unprecedented speed and at a relatively low cost. This has been an invaluable tool in the study of cancer, allowing projects such as The Cancer...

Full description

Bibliographic Details
Main Author: Goya, Rodrigo
Language:English
Published: University of British Columbia 2017
Online Access:http://hdl.handle.net/2429/64070
Description
Summary:Over the last decade, the advent of high-throughput sequencing (HTS) has given us the ability to study DNA and RNA sequences at nucleotide resolution at an unprecedented speed and at a relatively low cost. This has been an invaluable tool in the study of cancer, allowing projects such as The Cancer Genome Atlas and the International Cancer Genome Consortium to sequence thousands of tumours from multiple cancer types. The ever-increasing amounts of data created by these projects demanded new analysis methods: in the first part of this thesis, I focus on method development for mutation calling in genome and transcriptome data. I present SNVMix, a single nucleotide variant (SNV) caller based on a set of probabilistic models created to adapt to variations in allele representation in a tumour. Differential allele representation in DNA can occur when multiple clones are present in the sequenced tumour, and in RNA can occur due to differences in gene expression or allele bias. These situations are nearly ubiquitously encountered in cancer sequencing studies, and thus need to be accounted for. I demonstrate that SNVMix was able to outperform another contemporary SNV caller that does not account for variations in allele representation. I also present BWA-R, an adaptation of the Burrows Wheeler Aligner, that can properly align RNA-Seq paired-end reads to a genome reference extended with exon-exon junction sequences formed through splicing. I show that BWA-R provides better alignments for SNV calling in transcriptomes, resulting in an increase in the proportion of true positive calls obtained. In the second part of this thesis, I analyze RNA-Seq data from a triple negative breast cancer (TNBC) cohort and describe the alternative splicing profiles of the previously defined Basal and NonBasal subgroups. TNBC is characterized by the absence of estrogen and progesterone receptors and human epidermal growth factor receptor 2 (HER2), which precludes the use of currently available targeted therapies. TNBC patients are thus treated with chemotherapy, and outcomes are generally poor. I identify alternatively expressed genes that may be relevant to the biology of these two subgroups and that could provide clues for further studies or treatment options. === Science, Faculty of === Graduate