Bayesian Model-building in Phylogenetics

<p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, i...

Full description

Bibliographic Details
Main Author:	Nelson, Bradley
Other Authors:	Brown, Jeremy
Format:	Others
Language:	en
Published:	LSU 2014
Subjects:	Biological Sciences
Online Access:	http://etd.lsu.edu/docs/available/etd-06112014-143801/

id	ndltd-LSU-oai-etd.lsu.edu-etd-06112014-143801
record_format	oai_dc
spelling	ndltd-LSU-oai-etd.lsu.edu-etd-06112014-1438012014-06-19T03:56:38Z Bayesian Model-building in Phylogenetics Nelson, Bradley Biological Sciences <p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era.</p> <p>Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees.</p> <p>The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown.</p> <p>In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions.</p> Brown, Jeremy Elderd, Bret Hellberg, Michael Stevens, Richard LSU 2014-06-18 text application/pdf http://etd.lsu.edu/docs/available/etd-06112014-143801/ http://etd.lsu.edu/docs/available/etd-06112014-143801/ en unrestricted I hereby certify that, if appropriate, I have obtained and attached herein a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to LSU or its agents the non-exclusive license to archive and make accessible, under the conditions specified below and in appropriate University policies, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.
collection	NDLTD
language	en
format	Others
sources	NDLTD
topic	Biological Sciences
spellingShingle	Biological Sciences Nelson, Bradley Bayesian Model-building in Phylogenetics
description	<p>DNA sequencing costs have decreased dramatically over recent decades, resulting in a flood of phylogenetic information available to researchers. While it is often assumed that additional data will lead to more accurate conclusions, it also raises a number of problems for phylogeneticists, including mundane computational issues such as data management and complex statistical problems such as obtaining a single species tree from multiple conflicting gene trees. Developing new methods to make better use of existing data and probe the causes of conflicting signal will be necessary to confidently resolve phylogenies in the genomic era.</p> <p>Here, we examine two current problems in statistical phylogenetics and attempt to address them in a Bayesian framework. The first problem involves inflated tree lengths in Bayesian phylogenies, which can be an order of magnitude longer than maximum likelihood estimates. We developed EmpPrior, a program which queries TreeBASE for datasets similar to the focal data, then estimates parameters from each dataset to inform priors on the focal data. This approach greatly improves the tree length credible intervals in four exemplar datasets and, when combined with other approaches such as the use of a compound Dirichlet prior on tree length, can nearly eliminate the problem of inflated trees.</p> <p>The second problem involves incongruence between morphological and molecular phylogenies in squamates. Here, we use posterior prediction with inferential test statistics to investigate whether systematic error may be biasing inference in the molecular data. While we detected some model violation in most of the 44 genes, the genes with the most model violation were more distant from the molecular phylogeny. This suggests that model violation is not a major source of error in the molecular data. Hence, the source of incongruence between the molecular and morphological squamate topologies remains unknown.</p> <p>In both problems, we found that incorporating tools such as informed priors and posterior prediction from Bayesian statistical literature into phylogenetic analyses can improve results and help uncover why different datasets lead to conflicting topologies. As phylogenetic datasets continue to grow, using methodological best practices will only become more important if we want to have confidence in our conclusions.</p>
author2	Brown, Jeremy
author_facet	Brown, Jeremy Nelson, Bradley
author	Nelson, Bradley
author_sort	Nelson, Bradley
title	Bayesian Model-building in Phylogenetics
title_short	Bayesian Model-building in Phylogenetics
title_full	Bayesian Model-building in Phylogenetics
title_fullStr	Bayesian Model-building in Phylogenetics
title_full_unstemmed	Bayesian Model-building in Phylogenetics
title_sort	bayesian model-building in phylogenetics
publisher	LSU
publishDate	2014
url	http://etd.lsu.edu/docs/available/etd-06112014-143801/
work_keys_str_mv	AT nelsonbradley bayesianmodelbuildinginphylogenetics
_version_	1716704094195810304

Bayesian Model-building in Phylogenetics

Similar Items