Genome sizes and the Benford distribution.

<h4>Background</h4>Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with to...

Full description

Bibliographic Details
Main Authors: James L Friar, Terrance Goldman, Juan Pérez-Mercader
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2012-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22629319/?tool=EBI
id doaj-b1be61db3913430ab16705c6c399dc86
record_format Article
spelling doaj-b1be61db3913430ab16705c6c399dc862021-03-04T00:44:17ZengPublic Library of Science (PLoS)PLoS ONE1932-62032012-01-0175e3662410.1371/journal.pone.0036624Genome sizes and the Benford distribution.James L FriarTerrance GoldmanJuan Pérez-Mercader<h4>Background</h4>Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter.<h4>Results</h4>Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf's), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude.<h4>Conclusions</h4>In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22629319/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author James L Friar
Terrance Goldman
Juan Pérez-Mercader
spellingShingle James L Friar
Terrance Goldman
Juan Pérez-Mercader
Genome sizes and the Benford distribution.
PLoS ONE
author_facet James L Friar
Terrance Goldman
Juan Pérez-Mercader
author_sort James L Friar
title Genome sizes and the Benford distribution.
title_short Genome sizes and the Benford distribution.
title_full Genome sizes and the Benford distribution.
title_fullStr Genome sizes and the Benford distribution.
title_full_unstemmed Genome sizes and the Benford distribution.
title_sort genome sizes and the benford distribution.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2012-01-01
description <h4>Background</h4>Data on the number of Open Reading Frames (ORFs) coded by genomes from the 3 domains of Life show the presence of some notable general features. These include essential differences between the Prokaryotes and Eukaryotes, with the number of ORFs growing linearly with total genome size for the former, but only logarithmically for the latter.<h4>Results</h4>Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf's), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude.<h4>Conclusions</h4>In its linear regime the Benford distribution produces excellent fits to the Prokaryote data, while the full non-linear form of the distribution similarly provides an excellent fit to the Eukaryote data. Furthermore, in their region of overlap the salient features are statistically congruent. This allows us to interpret the difference between Prokaryotes and Eukaryotes as the manifestation of the increased demand in the biological functions required for the larger Eukaryotes, to estimate some minimal genome sizes, and to predict a maximal Prokaryote genome size on the order of 8-12 megabasepairs. These results naturally allow a mathematical interpretation in terms of maximal entropy and, therefore, most efficient information transmission.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22629319/?tool=EBI
work_keys_str_mv AT jameslfriar genomesizesandthebenforddistribution
AT terrancegoldman genomesizesandthebenforddistribution
AT juanperezmercader genomesizesandthebenforddistribution
_version_ 1714810126293008384