A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease

The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variat...

Full description

Bibliographic Details
Main Author: Couse, Madeline Hazel
Language:English
Published: University of British Columbia 2017
Online Access:http://hdl.handle.net/2429/61332
id ndltd-UBC-oai-circle.library.ubc.ca-2429-61332
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-613322018-01-05T17:29:43Z A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease Couse, Madeline Hazel The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variation in these non-coding sequences can cause disease, yet clinical sequencing in patients with rare Mendelian disease currently focuses mostly on variants in the ~2% of the genome that codes for protein. Indeed, variants in protein-coding genes that can explain a phenotype are identified in less than half of patients with suspected genetic disease by whole exome sequencing (WES). With the dramatic reduction in the cost of whole genome sequencing (WGS), development of algorithms to detect variants longer than 50 bp (structural variants, SVs), and improved annotation of the non-coding genome, it is now possible to interrogate the entire spectrum of genetic variation to identify a pathogenic mutation. A comprehensive pipeline is needed to analyze non-coding variation and structural variation from WGS. In this thesis, I developed and benchmarked a bioinformatics workflow to detect pathogenic non-coding SNVs/indels and pathogenic SVs, and applied this workflow to unsolved patients with rare Mendelian disorders. The pipeline detected ~80-90% of deletions, ~90% of duplications, ~65% inversions, and ~50% of insertions in a simulated genome and the NA12878 genome. The pipeline captured the majority of known pathogenic non-coding single nucleotide variant (SNVs) and insertion deletions (indels), and selectively prioritized a spiked-in known pathogenic non-coding SNV. Several interesting candidate variants were detected in patients, but none could be convincingly implicated as pathogenic. The bioinformatic workflow described in this thesis is complementary to sequencing pipelines that analyze only protein-coding variants from whole genomes. Application of this workflow to larger cohorts of patients with rare Mendelian diseases should identify pathogenic non-coding variants and SVs to increase diagnostic yield of clinical sequencing studies, assist management of genetic diseases, and contribute knowledge of novel pathogenic variants to the scientific community. Science, Faculty of Graduate 2017-04-24T17:47:32Z 2017-04-24T17:47:32Z 2017 2017-05 Text Thesis/Dissertation http://hdl.handle.net/2429/61332 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia
collection NDLTD
language English
sources NDLTD
description The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variation in these non-coding sequences can cause disease, yet clinical sequencing in patients with rare Mendelian disease currently focuses mostly on variants in the ~2% of the genome that codes for protein. Indeed, variants in protein-coding genes that can explain a phenotype are identified in less than half of patients with suspected genetic disease by whole exome sequencing (WES). With the dramatic reduction in the cost of whole genome sequencing (WGS), development of algorithms to detect variants longer than 50 bp (structural variants, SVs), and improved annotation of the non-coding genome, it is now possible to interrogate the entire spectrum of genetic variation to identify a pathogenic mutation. A comprehensive pipeline is needed to analyze non-coding variation and structural variation from WGS. In this thesis, I developed and benchmarked a bioinformatics workflow to detect pathogenic non-coding SNVs/indels and pathogenic SVs, and applied this workflow to unsolved patients with rare Mendelian disorders. The pipeline detected ~80-90% of deletions, ~90% of duplications, ~65% inversions, and ~50% of insertions in a simulated genome and the NA12878 genome. The pipeline captured the majority of known pathogenic non-coding single nucleotide variant (SNVs) and insertion deletions (indels), and selectively prioritized a spiked-in known pathogenic non-coding SNV. Several interesting candidate variants were detected in patients, but none could be convincingly implicated as pathogenic. The bioinformatic workflow described in this thesis is complementary to sequencing pipelines that analyze only protein-coding variants from whole genomes. Application of this workflow to larger cohorts of patients with rare Mendelian diseases should identify pathogenic non-coding variants and SVs to increase diagnostic yield of clinical sequencing studies, assist management of genetic diseases, and contribute knowledge of novel pathogenic variants to the scientific community. === Science, Faculty of === Graduate
author Couse, Madeline Hazel
spellingShingle Couse, Madeline Hazel
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
author_facet Couse, Madeline Hazel
author_sort Couse, Madeline Hazel
title A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
title_short A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
title_full A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
title_fullStr A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
title_full_unstemmed A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
title_sort bioinformatic workflow for analyzing whole genomes in rare mendelian disease
publisher University of British Columbia
publishDate 2017
url http://hdl.handle.net/2429/61332
work_keys_str_mv AT cousemadelinehazel abioinformaticworkflowforanalyzingwholegenomesinraremendeliandisease
AT cousemadelinehazel bioinformaticworkflowforanalyzingwholegenomesinraremendeliandisease
_version_ 1718585629801971712