Computational recovery of enzyme haplotypes from a metagenome

Population-level diversity of microbial communities (microbiomes) represent a biotechnological resource for biomining, biorefining and synthetic biology; but industrial exploitation of enzymes responsible for catalyzing reactions of interest requires the recovery of the exact DNA sequences (or "...

Full description

Bibliographic Details
Main Author: Nicholls, Samuel
Other Authors: Clare, Amanda ; Creevey, Christopher
Published: Aberystwyth University 2018
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.767382
id ndltd-bl.uk-oai-ethos.bl.uk-767382
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7673822019-03-14T03:22:15ZComputational recovery of enzyme haplotypes from a metagenomeNicholls, SamuelClare, Amanda ; Creevey, Christopher2018Population-level diversity of microbial communities (microbiomes) represent a biotechnological resource for biomining, biorefining and synthetic biology; but industrial exploitation of enzymes responsible for catalyzing reactions of interest requires the recovery of the exact DNA sequences (or "haplotypes") that encode the genes. However, haplotype reconstruction is an extremely difficult computational problem, further complicated by the infancy of techniques for the handling of environmental sequencing data (metagenomics). Current haplotyping approaches cannot choose between alternative haplotype reconstructions and fail to provide biological evidence of correct predictions. Additionally, there is no philosophical framework under which we can consider the variation of genes within a microbial community, such as those that encode isoforms of enzymes of interest to us. To address this, my thesis proposes the "metahaplome" as a definition for the set of haplotypes for a genomic region of interest within a microbial community. This work will offer the first formalisation of the problem of recovering haplotypes from a metagenomic data set, and present Hansel and Gretel: a novel probabilistic framework that reconstructs the most likely haplotypes from complex microbiomes. The framework is robust to sequencing error and uses all available evidence from aligned reads, without altering or discarding observed variation. The approach is verified with multiple in silico experiments, including two de facto data sets that are currently used to benchmark algorithms for the recovery of viral quasispecies, and strain identification. With long-read sequencing, this thesis will demonstrate in vitro verification of the approach, presenting the first biologically validated method for the recovery of haplotypes from a microbial community. Finally, I will introduce the "Rumen Landscape" pilot study to demonstrate the sort of research questions and novel biological insight that can be obtained through exploration of the metahaplome.Aberystwyth Universityhttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.767382http://hdl.handle.net/2160/cfc1d884-cc17-439f-be6e-c3ab6d79ee1dElectronic Thesis or Dissertation
collection NDLTD
sources NDLTD
description Population-level diversity of microbial communities (microbiomes) represent a biotechnological resource for biomining, biorefining and synthetic biology; but industrial exploitation of enzymes responsible for catalyzing reactions of interest requires the recovery of the exact DNA sequences (or "haplotypes") that encode the genes. However, haplotype reconstruction is an extremely difficult computational problem, further complicated by the infancy of techniques for the handling of environmental sequencing data (metagenomics). Current haplotyping approaches cannot choose between alternative haplotype reconstructions and fail to provide biological evidence of correct predictions. Additionally, there is no philosophical framework under which we can consider the variation of genes within a microbial community, such as those that encode isoforms of enzymes of interest to us. To address this, my thesis proposes the "metahaplome" as a definition for the set of haplotypes for a genomic region of interest within a microbial community. This work will offer the first formalisation of the problem of recovering haplotypes from a metagenomic data set, and present Hansel and Gretel: a novel probabilistic framework that reconstructs the most likely haplotypes from complex microbiomes. The framework is robust to sequencing error and uses all available evidence from aligned reads, without altering or discarding observed variation. The approach is verified with multiple in silico experiments, including two de facto data sets that are currently used to benchmark algorithms for the recovery of viral quasispecies, and strain identification. With long-read sequencing, this thesis will demonstrate in vitro verification of the approach, presenting the first biologically validated method for the recovery of haplotypes from a microbial community. Finally, I will introduce the "Rumen Landscape" pilot study to demonstrate the sort of research questions and novel biological insight that can be obtained through exploration of the metahaplome.
author2 Clare, Amanda ; Creevey, Christopher
author_facet Clare, Amanda ; Creevey, Christopher
Nicholls, Samuel
author Nicholls, Samuel
spellingShingle Nicholls, Samuel
Computational recovery of enzyme haplotypes from a metagenome
author_sort Nicholls, Samuel
title Computational recovery of enzyme haplotypes from a metagenome
title_short Computational recovery of enzyme haplotypes from a metagenome
title_full Computational recovery of enzyme haplotypes from a metagenome
title_fullStr Computational recovery of enzyme haplotypes from a metagenome
title_full_unstemmed Computational recovery of enzyme haplotypes from a metagenome
title_sort computational recovery of enzyme haplotypes from a metagenome
publisher Aberystwyth University
publishDate 2018
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.767382
work_keys_str_mv AT nichollssamuel computationalrecoveryofenzymehaplotypesfromametagenome
_version_ 1719002186942251008