AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.
A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to d...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2014-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC4044049?pdf=render |
id |
doaj-3e3f78873ad34fb9a4ed601ff9a63d58 |
---|---|
record_format |
Article |
spelling |
doaj-3e3f78873ad34fb9a4ed601ff9a63d582020-11-25T01:52:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032014-01-0196e9884410.1371/journal.pone.0098844AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.Chan ZhouFenglou MaoYanbin YinJinling HuangJohann Peter GogartenYing XuA challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.http://europepmc.org/articles/PMC4044049?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chan Zhou Fenglou Mao Yanbin Yin Jinling Huang Johann Peter Gogarten Ying Xu |
spellingShingle |
Chan Zhou Fenglou Mao Yanbin Yin Jinling Huang Johann Peter Gogarten Ying Xu AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. PLoS ONE |
author_facet |
Chan Zhou Fenglou Mao Yanbin Yin Jinling Huang Johann Peter Gogarten Ying Xu |
author_sort |
Chan Zhou |
title |
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
title_short |
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
title_full |
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
title_fullStr |
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
title_full_unstemmed |
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
title_sort |
ast: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2014-01-01 |
description |
A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. |
url |
http://europepmc.org/articles/PMC4044049?pdf=render |
work_keys_str_mv |
AT chanzhou astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT fengloumao astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT yanbinyin astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT jinlinghuang astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT johannpetergogarten astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees AT yingxu astanautomatedsequencesamplingmethodforimprovingthetaxonomicdiversityofgenephylogenetictrees |
_version_ |
1724993322257219584 |