Fast computation of distance estimators

<p>Abstract</p> <p>Background</p> <p>Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs o...

Full description

Bibliographic Details
Main Authors: Lagergren Jens, Elias Isaac
Format: Article
Language:English
Published: BMC 2007-03-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/8/89
id doaj-7d406f210392429faff163443216b85c
record_format Article
spelling doaj-7d406f210392429faff163443216b85c2020-11-24T23:56:31ZengBMCBMC Bioinformatics1471-21052007-03-01818910.1186/1471-2105-8-89Fast computation of distance estimatorsLagergren JensElias Isaac<p>Abstract</p> <p>Background</p> <p>Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of <it>n </it>taxa in time <it>O</it>(<it>n</it><sup>3</sup>). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from <it>n </it>sequences of length <it>l</it>, takes time proportional to <it>l</it>·<it>n</it><sup>2</sup>. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications.</p> <p>Results</p> <p>We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods.</p> <p>Conclusion</p> <p>Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.</p> http://www.biomedcentral.com/1471-2105/8/89
collection DOAJ
language English
format Article
sources DOAJ
author Lagergren Jens
Elias Isaac
spellingShingle Lagergren Jens
Elias Isaac
Fast computation of distance estimators
BMC Bioinformatics
author_facet Lagergren Jens
Elias Isaac
author_sort Lagergren Jens
title Fast computation of distance estimators
title_short Fast computation of distance estimators
title_full Fast computation of distance estimators
title_fullStr Fast computation of distance estimators
title_full_unstemmed Fast computation of distance estimators
title_sort fast computation of distance estimators
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2007-03-01
description <p>Abstract</p> <p>Background</p> <p>Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of <it>n </it>taxa in time <it>O</it>(<it>n</it><sup>3</sup>). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from <it>n </it>sequences of length <it>l</it>, takes time proportional to <it>l</it>·<it>n</it><sup>2</sup>. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications.</p> <p>Results</p> <p>We give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods.</p> <p>Conclusion</p> <p>Our novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.</p>
url http://www.biomedcentral.com/1471-2105/8/89
work_keys_str_mv AT lagergrenjens fastcomputationofdistanceestimators
AT eliasisaac fastcomputationofdistanceestimators
_version_ 1725457978899824640