Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene...
Main Author: | |
---|---|
Other Authors: | |
Language: | en |
Published: |
The University of Arizona.
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/10150/301751 |
id |
ndltd-arizona.edu-oai-arizona.openrepository.com-10150-301751 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-arizona.edu-oai-arizona.openrepository.com-10150-3017512015-10-23T05:25:39Z Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony Shi, Tao Sanderson, Michael J. Sanderson, Michael J. Tax, Frans Worobey, Michael Gene duplication Gene tree Species tree Ecology & Evolutionary Biology Domain architecture Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment. 2013 text Electronic Thesis http://hdl.handle.net/10150/301751 en Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona. |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
Gene duplication Gene tree Species tree Ecology & Evolutionary Biology Domain architecture |
spellingShingle |
Gene duplication Gene tree Species tree Ecology & Evolutionary Biology Domain architecture Shi, Tao Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
description |
Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment. |
author2 |
Sanderson, Michael J. |
author_facet |
Sanderson, Michael J. Shi, Tao |
author |
Shi, Tao |
author_sort |
Shi, Tao |
title |
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
title_short |
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
title_full |
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
title_fullStr |
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
title_full_unstemmed |
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony |
title_sort |
impact of rates of gene duplication and domain shuffling on species tree inference with gene tree parsimony |
publisher |
The University of Arizona. |
publishDate |
2013 |
url |
http://hdl.handle.net/10150/301751 |
work_keys_str_mv |
AT shitao impactofratesofgeneduplicationanddomainshufflingonspeciestreeinferencewithgenetreeparsimony |
_version_ |
1718105966444019712 |