Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales

The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from red...

Full description

Bibliographic Details
Main Authors: Long Qian, Edo Kussell
Format: Article
Language:English
Published: American Physical Society 2016-10-01
Series:Physical Review X
Online Access:http://doi.org/10.1103/PhysRevX.6.041009
id doaj-923353c3b1be47e0bded8f40d4d0d6d4
record_format Article
spelling doaj-923353c3b1be47e0bded8f40d4d0d6d42020-11-24T23:34:40ZengAmerican Physical SocietyPhysical Review X2160-33082016-10-016404100910.1103/PhysRevX.6.041009Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time ScalesLong QianEdo KussellThe composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.http://doi.org/10.1103/PhysRevX.6.041009
collection DOAJ
language English
format Article
sources DOAJ
author Long Qian
Edo Kussell
spellingShingle Long Qian
Edo Kussell
Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
Physical Review X
author_facet Long Qian
Edo Kussell
author_sort Long Qian
title Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
title_short Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
title_full Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
title_fullStr Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
title_full_unstemmed Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
title_sort genome-wide motif statistics are shaped by dna binding proteins over evolutionary time scales
publisher American Physical Society
series Physical Review X
issn 2160-3308
publishDate 2016-10-01
description The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.
url http://doi.org/10.1103/PhysRevX.6.041009
work_keys_str_mv AT longqian genomewidemotifstatisticsareshapedbydnabindingproteinsoverevolutionarytimescales
AT edokussell genomewidemotifstatisticsareshapedbydnabindingproteinsoverevolutionarytimescales
_version_ 1716298946859499520