Fractionation Statistics
Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is...
Main Author: | |
---|---|
Language: | en |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10393/31001 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-OOU.#10393-31001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-OOU.#10393-310012014-06-14T03:50:36ZFractionation StatisticsWang, Baoyongmathematical modelevolutionwhole genome doublinggene losstheory of runsParalog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean mu, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r; itself a random variable with distribution pi(.). A biologically more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying pi(r), as a function of mu, and show how sampling l allows us to estimate mu. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r) as a function of mu and the proportion of unreduced paralog pairs. This is based on a computing formula containing nested sums. The parameter mu can be estimated based on run lengths of single-copy regions. We then reduce the computing formulae, at least in the one-sided case, to closed form. This virtually eliminates computing time due to highly nested summations. We formulate a continuous version of the fractionation process, deleting line segments of exponentially distributed lengths in analogy to geometric distributed numbers of genes. We derive nested integrals and discover that the number of previously deleted regions to be skipped by a new deletion event is exactly geometrically distributed. We undertook a large simulation experiment to show how to discriminate between the gene-by-gene duplicate deletion model and the deletion of a geometrically distributed number of genes. This revealed the importance of the effects of genome size N, the mean of the geometric distribution, the progress towards completion of the fractionation process, and whether the data are based on runs of deleted genes or undeleted genes.2014-05-01T14:43:33Z2014-05-01T14:43:33Z20142014-05-01Thèse / Thesishttp://hdl.handle.net/10393/31001en |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
mathematical model evolution whole genome doubling gene loss theory of runs |
spellingShingle |
mathematical model evolution whole genome doubling gene loss theory of runs Wang, Baoyong Fractionation Statistics |
description |
Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD)
is a pervasive process. Whether this loss proceeds gene by gene or through deletion
of multi-gene DNA segments is controversial, as is the question of fractionation bias,
namely whether one homeologous chromosome is more vulnerable to gene deletion
than the other. As a null hypothesis, we first assume deletion events, on one homeolog
only, excise a geometrically distributed number of genes with unknown mean mu, and
these events combine to produce deleted runs of length l, distributed approximately
as a negative binomial with unknown parameter r; itself a random variable with
distribution pi(.). A biologically more realistic model requires deletion events on both
homeologs distributed as a truncated geometric. We simulate the distribution of run
lengths l in both models, as well as the underlying pi(r), as a function of mu, and
show how sampling l allows us to estimate mu. We apply this to data on a total of 15
genomes descended from 6 distinct WGD events and show how to correct the bias
towards shorter runs caused by genome rearrangements. Because of the difficulty in
deriving pi(.) analytically, we develop a deterministic recurrence to calculate each pi(r)
as a function of mu and the proportion of unreduced paralog pairs. This is based on a
computing formula containing nested sums. The parameter mu can be estimated based
on run lengths of single-copy regions. We then reduce the computing formulae, at least
in the one-sided case, to closed form. This virtually eliminates computing time due
to highly nested summations. We formulate a continuous version of the fractionation
process, deleting line segments of exponentially distributed lengths in analogy to
geometric distributed numbers of genes. We derive nested integrals and discover that
the number of previously deleted regions to be skipped by a new deletion event is
exactly geometrically distributed. We undertook a large simulation experiment to
show how to discriminate between the gene-by-gene duplicate deletion model and the
deletion of a geometrically distributed number of genes. This revealed the importance
of the effects of genome size N, the mean of the geometric distribution, the progress
towards completion of the fractionation process, and whether the data are based on
runs of deleted genes or undeleted genes. |
author |
Wang, Baoyong |
author_facet |
Wang, Baoyong |
author_sort |
Wang, Baoyong |
title |
Fractionation Statistics |
title_short |
Fractionation Statistics |
title_full |
Fractionation Statistics |
title_fullStr |
Fractionation Statistics |
title_full_unstemmed |
Fractionation Statistics |
title_sort |
fractionation statistics |
publishDate |
2014 |
url |
http://hdl.handle.net/10393/31001 |
work_keys_str_mv |
AT wangbaoyong fractionationstatistics |
_version_ |
1716669781214494720 |