Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony

Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene...

Full description

Bibliographic Details
Main Author: Shi, Tao
Other Authors: Sanderson, Michael J.
Language:en
Published: The University of Arizona. 2013
Subjects:
Online Access:http://hdl.handle.net/10150/301751
id ndltd-arizona.edu-oai-arizona.openrepository.com-10150-301751
record_format oai_dc
spelling ndltd-arizona.edu-oai-arizona.openrepository.com-10150-3017512015-10-23T05:25:39Z Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony Shi, Tao Sanderson, Michael J. Sanderson, Michael J. Tax, Frans Worobey, Michael Gene duplication Gene tree Species tree Ecology & Evolutionary Biology Domain architecture Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment. 2013 text Electronic Thesis http://hdl.handle.net/10150/301751 en Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. The University of Arizona.
collection NDLTD
language en
sources NDLTD
topic Gene duplication
Gene tree
Species tree
Ecology & Evolutionary Biology
Domain architecture
spellingShingle Gene duplication
Gene tree
Species tree
Ecology & Evolutionary Biology
Domain architecture
Shi, Tao
Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
description Genome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment.
author2 Sanderson, Michael J.
author_facet Sanderson, Michael J.
Shi, Tao
author Shi, Tao
author_sort Shi, Tao
title Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
title_short Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
title_full Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
title_fullStr Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
title_full_unstemmed Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
title_sort impact of rates of gene duplication and domain shuffling on species tree inference with gene tree parsimony
publisher The University of Arizona.
publishDate 2013
url http://hdl.handle.net/10150/301751
work_keys_str_mv AT shitao impactofratesofgeneduplicationanddomainshufflingonspeciestreeinferencewithgenetreeparsimony
_version_ 1718105966444019712