Probabilistic Modelling of Domain and Gene Evolution
Phylogenetic inference relies heavily on statistical models that have been extended and refined over the past years into complex hierarchical models to capture the intricacies of evolutionary processes. The wealth of information in the form of fully sequenced genomes has led to the development of me...
Main Author: | |
---|---|
Format: | Doctoral Thesis |
Language: | English |
Published: |
KTH, Beräkningsvetenskap och beräkningsteknik (CST)
2016
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191352 http://nbn-resolving.de/urn:isbn:978-91-7729-091-9 |
id |
ndltd-UPSALLA1-oai-DiVA.org-kth-191352 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-kth-1913522016-09-05T05:00:28ZProbabilistic Modelling of Domain and Gene EvolutionengMuhammad, Sayyed AuwnKTH, Beräkningsvetenskap och beräkningsteknik (CST)Stockholm, Sweden2016PhylogeneticsPhylogenomicsEvolutionDomain EvolutionGene treeDomain treeBayesian InferenceMarkov Chain Monte CarloHomology InferenceGene familiesC2H2 Zinc-FingerReelin ProteinPhylogenetic inference relies heavily on statistical models that have been extended and refined over the past years into complex hierarchical models to capture the intricacies of evolutionary processes. The wealth of information in the form of fully sequenced genomes has led to the development of methods that are used to reconstruct the gene and species evolutionary histories in greater and more accurate detail. However, genes are composed of evolutionary conserved sequence segments called domains, and domains can also be affected by duplications, losses, and bifurcations implied by gene or species evolution. This thesis proposes an extension of evolutionary models, such as duplication-loss, rate, and substitution, that have previously been used to model gene evolution, to model the domain evolution. In this thesis, I am proposing DomainDLRS: a comprehensive, hierarchical Bayesian method, based on the DLRS model by Åkerborg et al., 2009, that models domain evolution as occurring inside the gene and species tree. The method incorporates a birth-death process to model the domain duplications and losses along with a domain sequence evolution model with a relaxed molecular clock assumption. The method employs a variant of Markov Chain Monte Carlo technique called, Grouped Independence Metropolis-Hastings for the estimation of posterior distribution over domain and gene trees. By using this method, we performed analyses of Zinc-Finger and PRDM9 gene families, which provides an interesting insight of domain evolution. Finally, a synteny-aware approach for gene homology inference, called GenFamClust, is proposed that uses similarity and gene neighbourhood conservation to improve the homology inference. We evaluated the accuracy of our method on synthetic and two biological datasets consisting of Eukaryotes and Fungal species. Our results show that the use of synteny with similarity is providing a significant improvement in homology inference. <p>QC 20160904</p>Doctoral thesis, comprehensive summaryinfo:eu-repo/semantics/doctoralThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191352urn:isbn:978-91-7729-091-9TRITA-CSC-A, 1653-5723 ; 19application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Phylogenetics Phylogenomics Evolution Domain Evolution Gene tree Domain tree Bayesian Inference Markov Chain Monte Carlo Homology Inference Gene families C2H2 Zinc-Finger Reelin Protein |
spellingShingle |
Phylogenetics Phylogenomics Evolution Domain Evolution Gene tree Domain tree Bayesian Inference Markov Chain Monte Carlo Homology Inference Gene families C2H2 Zinc-Finger Reelin Protein Muhammad, Sayyed Auwn Probabilistic Modelling of Domain and Gene Evolution |
description |
Phylogenetic inference relies heavily on statistical models that have been extended and refined over the past years into complex hierarchical models to capture the intricacies of evolutionary processes. The wealth of information in the form of fully sequenced genomes has led to the development of methods that are used to reconstruct the gene and species evolutionary histories in greater and more accurate detail. However, genes are composed of evolutionary conserved sequence segments called domains, and domains can also be affected by duplications, losses, and bifurcations implied by gene or species evolution. This thesis proposes an extension of evolutionary models, such as duplication-loss, rate, and substitution, that have previously been used to model gene evolution, to model the domain evolution. In this thesis, I am proposing DomainDLRS: a comprehensive, hierarchical Bayesian method, based on the DLRS model by Åkerborg et al., 2009, that models domain evolution as occurring inside the gene and species tree. The method incorporates a birth-death process to model the domain duplications and losses along with a domain sequence evolution model with a relaxed molecular clock assumption. The method employs a variant of Markov Chain Monte Carlo technique called, Grouped Independence Metropolis-Hastings for the estimation of posterior distribution over domain and gene trees. By using this method, we performed analyses of Zinc-Finger and PRDM9 gene families, which provides an interesting insight of domain evolution. Finally, a synteny-aware approach for gene homology inference, called GenFamClust, is proposed that uses similarity and gene neighbourhood conservation to improve the homology inference. We evaluated the accuracy of our method on synthetic and two biological datasets consisting of Eukaryotes and Fungal species. Our results show that the use of synteny with similarity is providing a significant improvement in homology inference. === <p>QC 20160904</p> |
author |
Muhammad, Sayyed Auwn |
author_facet |
Muhammad, Sayyed Auwn |
author_sort |
Muhammad, Sayyed Auwn |
title |
Probabilistic Modelling of Domain and Gene Evolution |
title_short |
Probabilistic Modelling of Domain and Gene Evolution |
title_full |
Probabilistic Modelling of Domain and Gene Evolution |
title_fullStr |
Probabilistic Modelling of Domain and Gene Evolution |
title_full_unstemmed |
Probabilistic Modelling of Domain and Gene Evolution |
title_sort |
probabilistic modelling of domain and gene evolution |
publisher |
KTH, Beräkningsvetenskap och beräkningsteknik (CST) |
publishDate |
2016 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-191352 http://nbn-resolving.de/urn:isbn:978-91-7729-091-9 |
work_keys_str_mv |
AT muhammadsayyedauwn probabilisticmodellingofdomainandgeneevolution |
_version_ |
1718382392379441152 |