Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Abstract Background The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several g...

Full description

Bibliographic Details
Main Authors:	Izquierdo-Carrasco Fernando, Smith Stephen A, Stamatakis Alexandros
Format:	Article
Language:	English
Published:	BMC 2011-12-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/12/470

id	doaj-27c6c73d896c44c3944763946fa56f34
record_format	Article
spelling	doaj-27c6c73d896c44c3944763946fa56f342020-11-24T22:12:50ZengBMCBMC Bioinformatics1471-21052011-12-0112147010.1186/1471-2105-12-470Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge treesIzquierdo-Carrasco FernandoSmith Stephen AStamatakis Alexandros<p>Abstract</p> <p>Background</p> <p>The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood.</p> <p>Results</p> <p>We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times <it>and </it>memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems.</p> <p>Conclusions</p> <p>We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.</p> http://www.biomedcentral.com/1471-2105/12/470
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Izquierdo-Carrasco Fernando Smith Stephen A Stamatakis Alexandros
spellingShingle	Izquierdo-Carrasco Fernando Smith Stephen A Stamatakis Alexandros Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees BMC Bioinformatics
author_facet	Izquierdo-Carrasco Fernando Smith Stephen A Stamatakis Alexandros
author_sort	Izquierdo-Carrasco Fernando
title	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_short	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_full	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_fullStr	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_full_unstemmed	Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
title_sort	algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2011-12-01
description	<p>Abstract</p> <p>Background</p> <p>The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood.</p> <p>Results</p> <p>We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times <it>and </it>memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems.</p> <p>Conclusions</p> <p>We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.</p>
url	http://www.biomedcentral.com/1471-2105/12/470
work_keys_str_mv	AT izquierdocarrascofernando algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees AT smithstephena algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees AT stamatakisalexandros algorithmsdatastructuresandnumericsforlikelihoodbasedphylogeneticinferenceofhugetrees
_version_	1725802185130770432

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees

Similar Items