Bayesian Model-building in Phylogenetics
<p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, i...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en |
Published: |
LSU
2014
|
Subjects: | |
Online Access: | http://etd.lsu.edu/docs/available/etd-06112014-143801/ |
id |
ndltd-LSU-oai-etd.lsu.edu-etd-06112014-143801 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LSU-oai-etd.lsu.edu-etd-06112014-1438012014-06-19T03:56:38Z Bayesian Model-building in Phylogenetics Nelson, Bradley Biological Sciences <p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era.</p> <p>Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees.</p> <p>The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown.</p> <p>In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions.</p> Brown, Jeremy Elderd, Bret Hellberg, Michael Stevens, Richard LSU 2014-06-18 text application/pdf http://etd.lsu.edu/docs/available/etd-06112014-143801/ http://etd.lsu.edu/docs/available/etd-06112014-143801/ en unrestricted I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report. |
collection |
NDLTD |
language |
en |
format |
Others
|
sources |
NDLTD |
topic |
Biological Sciences |
spellingShingle |
Biological Sciences Nelson, Bradley Bayesian Model-building in Phylogenetics |
description |
<p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era.</p>
<p>Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees.</p>
<p>The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown.</p>
<p>In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions.</p> |
author2 |
Brown, Jeremy |
author_facet |
Brown, Jeremy Nelson, Bradley |
author |
Nelson, Bradley |
author_sort |
Nelson, Bradley |
title |
Bayesian Model-building in Phylogenetics |
title_short |
Bayesian Model-building in Phylogenetics |
title_full |
Bayesian Model-building in Phylogenetics |
title_fullStr |
Bayesian Model-building in Phylogenetics |
title_full_unstemmed |
Bayesian Model-building in Phylogenetics |
title_sort |
bayesian model-building in phylogenetics |
publisher |
LSU |
publishDate |
2014 |
url |
http://etd.lsu.edu/docs/available/etd-06112014-143801/ |
work_keys_str_mv |
AT nelsonbradley bayesianmodelbuildinginphylogenetics |
_version_ |
1716704094195810304 |