A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease
The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variat...
Main Author: | |
---|---|
Language: | English |
Published: |
University of British Columbia
2017
|
Online Access: | http://hdl.handle.net/2429/61332 |
id |
ndltd-UBC-oai-circle.library.ubc.ca-2429-61332 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UBC-oai-circle.library.ubc.ca-2429-613322018-01-05T17:29:43Z A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease Couse, Madeline Hazel The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variation in these non-coding sequences can cause disease, yet clinical sequencing in patients with rare Mendelian disease currently focuses mostly on variants in the ~2% of the genome that codes for protein. Indeed, variants in protein-coding genes that can explain a phenotype are identified in less than half of patients with suspected genetic disease by whole exome sequencing (WES). With the dramatic reduction in the cost of whole genome sequencing (WGS), development of algorithms to detect variants longer than 50 bp (structural variants, SVs), and improved annotation of the non-coding genome, it is now possible to interrogate the entire spectrum of genetic variation to identify a pathogenic mutation. A comprehensive pipeline is needed to analyze non-coding variation and structural variation from WGS. In this thesis, I developed and benchmarked a bioinformatics workflow to detect pathogenic non-coding SNVs/indels and pathogenic SVs, and applied this workflow to unsolved patients with rare Mendelian disorders. The pipeline detected ~80-90% of deletions, ~90% of duplications, ~65% inversions, and ~50% of insertions in a simulated genome and the NA12878 genome. The pipeline captured the majority of known pathogenic non-coding single nucleotide variant (SNVs) and insertion deletions (indels), and selectively prioritized a spiked-in known pathogenic non-coding SNV. Several interesting candidate variants were detected in patients, but none could be convincingly implicated as pathogenic. The bioinformatic workflow described in this thesis is complementary to sequencing pipelines that analyze only protein-coding variants from whole genomes. Application of this workflow to larger cohorts of patients with rare Mendelian diseases should identify pathogenic non-coding variants and SVs to increase diagnostic yield of clinical sequencing studies, assist management of genetic diseases, and contribute knowledge of novel pathogenic variants to the scientific community. Science, Faculty of Graduate 2017-04-24T17:47:32Z 2017-04-24T17:47:32Z 2017 2017-05 Text Thesis/Dissertation http://hdl.handle.net/2429/61332 eng Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ University of British Columbia |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
The vast majority of the human genome (~98%) is non-coding. A symphony
of non-coding sequences resides in the genome, interacting with genes and
the environment to tune gene expression. Functional non-coding sequences
include enhancers, silencers, promoters, non-coding RNA and insulators.
Variation in these non-coding sequences can cause disease, yet clinical sequencing in patients with rare Mendelian disease currently focuses mostly on variants in the ~2% of the genome that codes for protein. Indeed,
variants in protein-coding genes that can explain a phenotype are identified in less than half of patients with suspected genetic disease by whole
exome sequencing (WES). With the dramatic reduction in the cost of whole
genome sequencing (WGS), development of algorithms to detect variants
longer than 50 bp (structural variants, SVs), and improved annotation of
the non-coding genome, it is now possible to interrogate the entire spectrum
of genetic variation to identify a pathogenic mutation.
A comprehensive pipeline is needed to analyze non-coding variation and
structural variation from WGS. In this thesis, I developed and benchmarked
a bioinformatics workflow to detect pathogenic non-coding SNVs/indels and
pathogenic SVs, and applied this workflow to unsolved patients with rare
Mendelian disorders. The pipeline detected ~80-90% of deletions, ~90% of
duplications, ~65% inversions, and ~50% of insertions in a simulated genome
and the NA12878 genome. The pipeline captured the majority of known
pathogenic non-coding single nucleotide variant (SNVs) and insertion deletions
(indels), and selectively prioritized a spiked-in known pathogenic non-coding
SNV. Several interesting candidate variants were detected in patients,
but none could be convincingly implicated as pathogenic.
The bioinformatic workflow described in this thesis is complementary to
sequencing pipelines that analyze only protein-coding variants from whole
genomes. Application of this workflow to larger cohorts of patients with rare
Mendelian diseases should identify pathogenic non-coding variants and SVs
to increase diagnostic yield of clinical sequencing studies, assist management
of genetic diseases, and contribute knowledge of novel pathogenic variants to the scientific community. === Science, Faculty of === Graduate |
author |
Couse, Madeline Hazel |
spellingShingle |
Couse, Madeline Hazel A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
author_facet |
Couse, Madeline Hazel |
author_sort |
Couse, Madeline Hazel |
title |
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
title_short |
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
title_full |
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
title_fullStr |
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
title_full_unstemmed |
A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease |
title_sort |
bioinformatic workflow for analyzing whole genomes in rare mendelian disease |
publisher |
University of British Columbia |
publishDate |
2017 |
url |
http://hdl.handle.net/2429/61332 |
work_keys_str_mv |
AT cousemadelinehazel abioinformaticworkflowforanalyzingwholegenomesinraremendeliandisease AT cousemadelinehazel bioinformaticworkflowforanalyzingwholegenomesinraremendeliandisease |
_version_ |
1718585629801971712 |