Combining distance matrices on identical taxon sets for multi-gene analysis with singular value decomposition.

We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest...

Full description

Bibliographic Details
Main Authors: Melanie Abeysundera, Toby Kenney, Chris Field, Hong Gu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC3986248?pdf=render
Description
Summary:We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.
ISSN:1932-6203