Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis
Phylogenetic analysis based on multi-loci data sets is performed by means of supermatrix (SM) or supertree (ST) approaches. Recently, methods that rely on species tree (SppT) inference by the multi-species coalescence have also been implemented to tackle this problem. Generally, the relative perform...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
SAGE Publishing
2013-01-01
|
Series: | Evolutionary Bioinformatics |
Online Access: | https://doi.org/10.4137/EBO.S12483 |
id |
doaj-143ea4fef671417f9a5f5a71d7acbec7 |
---|---|
record_format |
Article |
spelling |
doaj-143ea4fef671417f9a5f5a71d7acbec72020-11-25T03:23:37ZengSAGE PublishingEvolutionary Bioinformatics1176-93432013-01-01910.4137/EBO.S12483Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic AnalysisBarbara O. Aguiar0Carlos G. Schrago1Department of Genetics, Federal University of RIO de Janeiro, RIO de Janeiro, Brazil.Department of Genetics, Federal University of RIO de Janeiro, RIO de Janeiro, Brazil.Phylogenetic analysis based on multi-loci data sets is performed by means of supermatrix (SM) or supertree (ST) approaches. Recently, methods that rely on species tree (SppT) inference by the multi-species coalescence have also been implemented to tackle this problem. Generally, the relative performance of these three major strategies has been calculated using simulation of biological sequences. However, sequence simulation may not entirely replicate the complexity of the evolutionary process. Thus, issues regarding the usefulness of in silico sequences in studying the performance of phylogenetic methods have been raised. Here, we used both classical simulation and empirical data to investigate the relative performance of ST, SM, and the SppT methods. SM analyses performed better than the ST and SppTs in simulations, but not in empirical analyses where some ST methods significantly outperformed the others. Additionally, SM was the only method that was robust under evolutionary model violations in simulations. These results show that conventional biological sequence simulation cannot adequately resolve which method is most efficient to recover the SppT. In such simulations, the SM approach recovers the established phylogeny in most instances, whereas the performance of the ST and SppT methods is downgraded in simpler cases. When compared, the analyses based on empirical and simulated sequences yielded largely inconsistent results, with the latter showing a bias towards a seemingly superiority of SM approaches.https://doi.org/10.4137/EBO.S12483 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Barbara O. Aguiar Carlos G. Schrago |
spellingShingle |
Barbara O. Aguiar Carlos G. Schrago Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis Evolutionary Bioinformatics |
author_facet |
Barbara O. Aguiar Carlos G. Schrago |
author_sort |
Barbara O. Aguiar |
title |
Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis |
title_short |
Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis |
title_full |
Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis |
title_fullStr |
Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis |
title_full_unstemmed |
Conventional Simulation of Biological Sequences Leads to a Biased Assessment of Multi-Loci Phylogenetic Analysis |
title_sort |
conventional simulation of biological sequences leads to a biased assessment of multi-loci phylogenetic analysis |
publisher |
SAGE Publishing |
series |
Evolutionary Bioinformatics |
issn |
1176-9343 |
publishDate |
2013-01-01 |
description |
Phylogenetic analysis based on multi-loci data sets is performed by means of supermatrix (SM) or supertree (ST) approaches. Recently, methods that rely on species tree (SppT) inference by the multi-species coalescence have also been implemented to tackle this problem. Generally, the relative performance of these three major strategies has been calculated using simulation of biological sequences. However, sequence simulation may not entirely replicate the complexity of the evolutionary process. Thus, issues regarding the usefulness of in silico sequences in studying the performance of phylogenetic methods have been raised. Here, we used both classical simulation and empirical data to investigate the relative performance of ST, SM, and the SppT methods. SM analyses performed better than the ST and SppTs in simulations, but not in empirical analyses where some ST methods significantly outperformed the others. Additionally, SM was the only method that was robust under evolutionary model violations in simulations. These results show that conventional biological sequence simulation cannot adequately resolve which method is most efficient to recover the SppT. In such simulations, the SM approach recovers the established phylogeny in most instances, whereas the performance of the ST and SppT methods is downgraded in simpler cases. When compared, the analyses based on empirical and simulated sequences yielded largely inconsistent results, with the latter showing a bias towards a seemingly superiority of SM approaches. |
url |
https://doi.org/10.4137/EBO.S12483 |
work_keys_str_mv |
AT barbaraoaguiar conventionalsimulationofbiologicalsequencesleadstoabiasedassessmentofmultilociphylogeneticanalysis AT carlosgschrago conventionalsimulationofbiologicalsequencesleadstoabiasedassessmentofmultilociphylogeneticanalysis |
_version_ |
1724605452993429504 |