Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models

<p>Abstract</p> <p>Background</p> <p>The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Anal...

Full description

Bibliographic Details
Main Authors: Berezovskaya Faina S, Wolf Yuri I, Karev Georgy P, Koonin Eugene V
Format: Article
Language:English
Published: BMC 2004-09-01
Series:BMC Evolutionary Biology
Online Access:http://www.biomedcentral.com/1471-2148/4/32
id doaj-f08d105564c34e47ad616aa1c402c398
record_format Article
spelling doaj-f08d105564c34e47ad616aa1c402c3982021-09-02T07:39:44ZengBMCBMC Evolutionary Biology1471-21482004-09-01413210.1186/1471-2148-4-32Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation modelsBerezovskaya Faina SWolf Yuri IKarev Georgy PKoonin Eugene V<p>Abstract</p> <p>Background</p> <p>The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution.</p> <p>Results</p> <p>In this work, we extend our previous analysis of stochastic BDIMs.</p> <p>In addition to the previously examined rational BDIMs, we introduce potentially more realistic logistic BDIMs, in which birth/death rates are limited for the largest families, and show that their properties are similar to those of models that include no such limitation. We show that the mean time required for the formation of the largest gene families detected in eukaryotic genomes is limited by the mean number of duplications per gene and does not increase indefinitely with the model degree. Instead, this time reaches a minimum value, which corresponds to a non-linear rational BDIM with the degree of approximately 2.7. Even for this BDIM, the mean time of the largest family formation is orders of magnitude greater than any realistic estimates based on the timescale of life's evolution. We employed the embedding chains technique to estimate the expected number of elementary evolutionary events (gene duplications and deletions) preceding the formation of gene families of the observed size and found that the mean number of events exceeds the family size by orders of magnitude, suggesting a highly dynamic process of genome evolution. The variance of the time required for the formation of the largest families was found to be extremely large, with the coefficient of variation >> 1. This indicates that some gene families might grow much faster than the mean rate such that the minimal time required for family formation is more relevant for a realistic representation of genome evolution than the mean time. We determined this minimal time using Monte Carlo simulations of family growth from an ensemble of simultaneously evolving singletons. In these simulations, the time elapsed before the formation of the largest family was much shorter than the estimated mean time and was compatible with the timescale of evolution of eukaryotes.</p> <p>Conclusions</p> <p>The analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.</p> http://www.biomedcentral.com/1471-2148/4/32
collection DOAJ
language English
format Article
sources DOAJ
author Berezovskaya Faina S
Wolf Yuri I
Karev Georgy P
Koonin Eugene V
spellingShingle Berezovskaya Faina S
Wolf Yuri I
Karev Georgy P
Koonin Eugene V
Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
BMC Evolutionary Biology
author_facet Berezovskaya Faina S
Wolf Yuri I
Karev Georgy P
Koonin Eugene V
author_sort Berezovskaya Faina S
title Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
title_short Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
title_full Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
title_fullStr Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
title_full_unstemmed Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
title_sort gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models
publisher BMC
series BMC Evolutionary Biology
issn 1471-2148
publishDate 2004-09-01
description <p>Abstract</p> <p>Background</p> <p>The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution.</p> <p>Results</p> <p>In this work, we extend our previous analysis of stochastic BDIMs.</p> <p>In addition to the previously examined rational BDIMs, we introduce potentially more realistic logistic BDIMs, in which birth/death rates are limited for the largest families, and show that their properties are similar to those of models that include no such limitation. We show that the mean time required for the formation of the largest gene families detected in eukaryotic genomes is limited by the mean number of duplications per gene and does not increase indefinitely with the model degree. Instead, this time reaches a minimum value, which corresponds to a non-linear rational BDIM with the degree of approximately 2.7. Even for this BDIM, the mean time of the largest family formation is orders of magnitude greater than any realistic estimates based on the timescale of life's evolution. We employed the embedding chains technique to estimate the expected number of elementary evolutionary events (gene duplications and deletions) preceding the formation of gene families of the observed size and found that the mean number of events exceeds the family size by orders of magnitude, suggesting a highly dynamic process of genome evolution. The variance of the time required for the formation of the largest families was found to be extremely large, with the coefficient of variation >> 1. This indicates that some gene families might grow much faster than the mean rate such that the minimal time required for family formation is more relevant for a realistic representation of genome evolution than the mean time. We determined this minimal time using Monte Carlo simulations of family growth from an ensemble of simultaneously evolving singletons. In these simulations, the time elapsed before the formation of the largest family was much shorter than the estimated mean time and was compatible with the timescale of evolution of eukaryotes.</p> <p>Conclusions</p> <p>The analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.</p>
url http://www.biomedcentral.com/1471-2148/4/32
work_keys_str_mv AT berezovskayafainas genefamilyevolutionanindepththeoreticalandsimulationanalysisofnonlinearbirthdeathinnovationmodels
AT wolfyurii genefamilyevolutionanindepththeoreticalandsimulationanalysisofnonlinearbirthdeathinnovationmodels
AT karevgeorgyp genefamilyevolutionanindepththeoreticalandsimulationanalysisofnonlinearbirthdeathinnovationmodels
AT koonineugenev genefamilyevolutionanindepththeoreticalandsimulationanalysisofnonlinearbirthdeathinnovationmodels
_version_ 1721178359248977920