Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies

Abstract Background An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expa...

Full description

Bibliographic Details
Main Authors: Davide Heller, Damian Szklarczyk, Christian von Mering
Format: Article
Language:English
Published: BMC 2019-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2828-z
id doaj-c5150324fd7341f8a75b72375be81620
record_format Article
spelling doaj-c5150324fd7341f8a75b72375be816202020-11-25T02:41:49ZengBMCBMC Bioinformatics1471-21052019-05-0120111210.1186/s12859-019-2828-zTree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchiesDavide Heller0Damian Szklarczyk1Christian von Mering2Institute of Molecular Life Sciences, University of ZurichInstitute of Molecular Life Sciences, University of ZurichInstitute of Molecular Life Sciences, University of ZurichAbstract Background An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. Results Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. Conclusion The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline.http://link.springer.com/article/10.1186/s12859-019-2828-zTree reconciliationConsistencyOrthologous groups
collection DOAJ
language English
format Article
sources DOAJ
author Davide Heller
Damian Szklarczyk
Christian von Mering
spellingShingle Davide Heller
Damian Szklarczyk
Christian von Mering
Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
BMC Bioinformatics
Tree reconciliation
Consistency
Orthologous groups
author_facet Davide Heller
Damian Szklarczyk
Christian von Mering
author_sort Davide Heller
title Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
title_short Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
title_full Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
title_fullStr Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
title_full_unstemmed Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
title_sort tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-05-01
description Abstract Background An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. Results Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. Conclusion The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline.
topic Tree reconciliation
Consistency
Orthologous groups
url http://link.springer.com/article/10.1186/s12859-019-2828-z
work_keys_str_mv AT davideheller treereconciliationcombinedwithsubsamplingimproveslargescaleinferenceoforthologousgrouphierarchies
AT damianszklarczyk treereconciliationcombinedwithsubsamplingimproveslargescaleinferenceoforthologousgrouphierarchies
AT christianvonmering treereconciliationcombinedwithsubsamplingimproveslargescaleinferenceoforthologousgrouphierarchies
_version_ 1724777170502418432