Whole genome phylogenies for multiple <it>Drosophila</it> species

<p>Abstract</p> <p>Background</p> <p>Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is...

Full description

Bibliographic Details
Main Authors: Seetharam Arun, Stuart Gary W
Format: Article
Language:English
Published: BMC 2012-12-01
Series:BMC Research Notes
Subjects:
Online Access:http://www.biomedcentral.com/1756-0500/5/670
id doaj-09043eb642e54ca0b7ef4f32e9e839b6
record_format Article
spelling doaj-09043eb642e54ca0b7ef4f32e9e839b62020-11-25T02:14:12ZengBMCBMC Research Notes1756-05002012-12-015167010.1186/1756-0500-5-670Whole genome phylogenies for multiple <it>Drosophila</it> speciesSeetharam ArunStuart Gary W<p>Abstract</p> <p>Background</p> <p>Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced <it>Drosophila</it> species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees.</p> <p>Results</p> <p>An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 <it>Drosophila</it> species at higher dimensions except for the generally accepted but difficult to discern sister relationship between <it>D. erecta</it> and <it>D. yakuba</it>. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of <it>D. erecta</it> with <it>D. sechellia</it> when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the <it>melanogaster</it> subgroup was analyzed.</p> <p>Conclusions</p> <p>These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between <it>Drosophila</it> species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.</p> http://www.biomedcentral.com/1756-0500/5/670Singular value decompositionPhylogenomicsComparative genomics<it>Drosophila</it> phylogeny
collection DOAJ
language English
format Article
sources DOAJ
author Seetharam Arun
Stuart Gary W
spellingShingle Seetharam Arun
Stuart Gary W
Whole genome phylogenies for multiple <it>Drosophila</it> species
BMC Research Notes
Singular value decomposition
Phylogenomics
Comparative genomics
<it>Drosophila</it> phylogeny
author_facet Seetharam Arun
Stuart Gary W
author_sort Seetharam Arun
title Whole genome phylogenies for multiple <it>Drosophila</it> species
title_short Whole genome phylogenies for multiple <it>Drosophila</it> species
title_full Whole genome phylogenies for multiple <it>Drosophila</it> species
title_fullStr Whole genome phylogenies for multiple <it>Drosophila</it> species
title_full_unstemmed Whole genome phylogenies for multiple <it>Drosophila</it> species
title_sort whole genome phylogenies for multiple <it>drosophila</it> species
publisher BMC
series BMC Research Notes
issn 1756-0500
publishDate 2012-12-01
description <p>Abstract</p> <p>Background</p> <p>Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced <it>Drosophila</it> species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees.</p> <p>Results</p> <p>An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 <it>Drosophila</it> species at higher dimensions except for the generally accepted but difficult to discern sister relationship between <it>D. erecta</it> and <it>D. yakuba</it>. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of <it>D. erecta</it> with <it>D. sechellia</it> when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the <it>melanogaster</it> subgroup was analyzed.</p> <p>Conclusions</p> <p>These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between <it>Drosophila</it> species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.</p>
topic Singular value decomposition
Phylogenomics
Comparative genomics
<it>Drosophila</it> phylogeny
url http://www.biomedcentral.com/1756-0500/5/670
work_keys_str_mv AT seetharamarun wholegenomephylogeniesformultipleitdrosophilaitspecies
AT stuartgaryw wholegenomephylogeniesformultipleitdrosophilaitspecies
_version_ 1724901167625928704