AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to d...

Full description

Bibliographic Details
Main Authors: Chan Zhou, Fenglou Mao, Yanbin Yin, Jinling Huang, Johann Peter Gogarten, Ying Xu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2014-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC4044049?pdf=render
id doaj-3e3f78873ad34fb9a4ed601ff9a63d58
record_format Article
spelling doaj-3e3f78873ad34fb9a4ed601ff9a63d582020-11-25T01:52:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0196e9884410.1371/journal.pone.0098844AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.Chan ZhouFenglou MaoYanbin YinJinling HuangJohann Peter GogartenYing XuA challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.http://europepmc.org/articles/PMC4044049?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Chan Zhou
Fenglou Mao
Yanbin Yin
Jinling Huang
Johann Peter Gogarten
Ying Xu
spellingShingle Chan Zhou
Fenglou Mao
Yanbin Yin
Jinling Huang
Johann Peter Gogarten
Ying Xu
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
PLoS ONE
author_facet Chan Zhou
Fenglou Mao
Yanbin Yin
Jinling Huang
Johann Peter Gogarten
Ying Xu
author_sort Chan Zhou
title AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
title_short AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
title_full AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
title_fullStr AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
title_full_unstemmed AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
title_sort ast: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2014-01-01
description A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.
url http://europepmc.org/articles/PMC4044049?pdf=render
work_keys_str_mv AT chanzhou astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT fengloumao astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT yanbinyin astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT jinlinghuang astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT johannpetergogarten astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
AT yingxu astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees
_version_ 1724993322257219584