LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference

Abstract Background Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS3, a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of...

Full description

Bibliographic Details
Main Authors: Carlos J. Rivera-Rivera, Juan I. Montoya-Burgos
Format: Article
Language:English
Published: BMC 2019-08-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-3020-1
id doaj-0d5543795f7047b8a191ac139157bee0
record_format Article
spelling doaj-0d5543795f7047b8a191ac139157bee02020-11-25T02:42:14ZengBMCBMC Bioinformatics1471-21052019-08-012011410.1186/s12859-019-3020-1LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inferenceCarlos J. Rivera-Rivera0Juan I. Montoya-Burgos1Department of Genetics and Evolution, University of GenevaDepartment of Genetics and Evolution, University of GenevaAbstract Background Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS3, a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a relatively homogeneous rate. However, this algorithm had two major shortcomings: (i) it was automated and published as a set of bash scripts, and hence was Linux-specific, and not user friendly, and (ii) it could result in very stringent sequence subselection when extremely slow-evolving sequences were present. Results We address these challenges and produce a new, platform-independent program, LSX, written in R, which includes a reprogrammed version of the original LS3 algorithm and has added features to make better lineage rate calculations. In addition, we developed and included an alternative version of the algorithm, LS4, which reduces lineage rate heterogeneity by detecting sequences that evolve too fast and sequences that evolve too slow, resulting in less stringent data subselection when extremely slow-evolving sequences are present. The efficiency of LSX and of LS4 with datasets with extremely slow-evolving sequences is demonstrated with simulated data, and by the resolution of a contentious node in the catfish phylogeny that was affected by an unusually high lineage rate heterogeneity in the dataset. Conclusions LSX is a new bioinformatic tool, with an accessible code, and with which the effect of lineage rate heterogeneity can be explored in gene sequence datasets of virtually any size. In addition, the two modalities of the sequence subsampling algorithm included, LS3 and LS4, allow the user to optimize the amount of non-phylogenetic signal removed while keeping a maximum of phylogenetic signal.http://link.springer.com/article/10.1186/s12859-019-3020-1Long branch attractionLineage rate heterogeneityPhylogenomicsPhylogenetic methodsSequence subsampling
collection DOAJ
language English
format Article
sources DOAJ
author Carlos J. Rivera-Rivera
Juan I. Montoya-Burgos
spellingShingle Carlos J. Rivera-Rivera
Juan I. Montoya-Burgos
LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
BMC Bioinformatics
Long branch attraction
Lineage rate heterogeneity
Phylogenomics
Phylogenetic methods
Sequence subsampling
author_facet Carlos J. Rivera-Rivera
Juan I. Montoya-Burgos
author_sort Carlos J. Rivera-Rivera
title LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_short LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_full LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_fullStr LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_full_unstemmed LSX: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
title_sort lsx: automated reduction of gene-specific lineage evolutionary rate heterogeneity for multi-gene phylogeny inference
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-08-01
description Abstract Background Lineage rate heterogeneity can be a major source of bias, especially in multi-gene phylogeny inference. We had previously tackled this issue by developing LS3, a data subselection algorithm that, by removing fast-evolving sequences in a gene-specific manner, identifies subsets of sequences that evolve at a relatively homogeneous rate. However, this algorithm had two major shortcomings: (i) it was automated and published as a set of bash scripts, and hence was Linux-specific, and not user friendly, and (ii) it could result in very stringent sequence subselection when extremely slow-evolving sequences were present. Results We address these challenges and produce a new, platform-independent program, LSX, written in R, which includes a reprogrammed version of the original LS3 algorithm and has added features to make better lineage rate calculations. In addition, we developed and included an alternative version of the algorithm, LS4, which reduces lineage rate heterogeneity by detecting sequences that evolve too fast and sequences that evolve too slow, resulting in less stringent data subselection when extremely slow-evolving sequences are present. The efficiency of LSX and of LS4 with datasets with extremely slow-evolving sequences is demonstrated with simulated data, and by the resolution of a contentious node in the catfish phylogeny that was affected by an unusually high lineage rate heterogeneity in the dataset. Conclusions LSX is a new bioinformatic tool, with an accessible code, and with which the effect of lineage rate heterogeneity can be explored in gene sequence datasets of virtually any size. In addition, the two modalities of the sequence subsampling algorithm included, LS3 and LS4, allow the user to optimize the amount of non-phylogenetic signal removed while keeping a maximum of phylogenetic signal.
topic Long branch attraction
Lineage rate heterogeneity
Phylogenomics
Phylogenetic methods
Sequence subsampling
url http://link.springer.com/article/10.1186/s12859-019-3020-1
work_keys_str_mv AT carlosjriverarivera lsxautomatedreductionofgenespecificlineageevolutionaryrateheterogeneityformultigenephylogenyinference
AT juanimontoyaburgos lsxautomatedreductionofgenespecificlineageevolutionaryrateheterogeneityformultigenephylogenyinference
_version_ 1724774349929447424