Variable structure motifs for transcription factor binding sites

<p>Abstract</p> <p>Background</p> <p>Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this...

Full description

Bibliographic Details
Main Authors: Wernisch Lorenz, Dyer Nigel, Evans Kenneth J, Reid John E, Ott Sascha
Format: Article
Language:English
Published: BMC 2010-01-01
Series:BMC Genomics
Online Access:http://www.biomedcentral.com/1471-2164/11/30
id doaj-5a9c6ca4b4724a7dbd35c2bfe70888fa
record_format Article
spelling doaj-5a9c6ca4b4724a7dbd35c2bfe70888fa2020-11-25T01:33:57ZengBMCBMC Genomics1471-21642010-01-011113010.1186/1471-2164-11-30Variable structure motifs for transcription factor binding sitesWernisch LorenzDyer NigelEvans Kenneth JReid John EOtt Sascha<p>Abstract</p> <p>Background</p> <p>Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.</p> <p>Results</p> <p>We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.</p> <p>Conclusions</p> <p>We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.</p> http://www.biomedcentral.com/1471-2164/11/30
collection DOAJ
language English
format Article
sources DOAJ
author Wernisch Lorenz
Dyer Nigel
Evans Kenneth J
Reid John E
Ott Sascha
spellingShingle Wernisch Lorenz
Dyer Nigel
Evans Kenneth J
Reid John E
Ott Sascha
Variable structure motifs for transcription factor binding sites
BMC Genomics
author_facet Wernisch Lorenz
Dyer Nigel
Evans Kenneth J
Reid John E
Ott Sascha
author_sort Wernisch Lorenz
title Variable structure motifs for transcription factor binding sites
title_short Variable structure motifs for transcription factor binding sites
title_full Variable structure motifs for transcription factor binding sites
title_fullStr Variable structure motifs for transcription factor binding sites
title_full_unstemmed Variable structure motifs for transcription factor binding sites
title_sort variable structure motifs for transcription factor binding sites
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2010-01-01
description <p>Abstract</p> <p>Background</p> <p>Classically, models of DNA-transcription factor binding sites (TFBSs) have been based on relatively few known instances and have treated them as sites of fixed length using position weight matrices (PWMs). Various extensions to this model have been proposed, most of which take account of dependencies between the bases in the binding sites. However, some transcription factors are known to exhibit some flexibility and bind to DNA in more than one possible physical configuration. In some cases this variation is known to affect the function of binding sites. With the increasing volume of ChIP-seq data available it is now possible to investigate models that incorporate this flexibility. Previous work on variable length models has been constrained by: a focus on specific zinc finger proteins in yeast using restrictive models; a reliance on hand-crafted models for just one transcription factor at a time; and a lack of evaluation on realistically sized data sets.</p> <p>Results</p> <p>We re-analysed binding sites from the TRANSFAC database and found motivating examples where our new variable length model provides a better fit. We analysed several ChIP-seq data sets with a novel motif search algorithm and compared the results to one of the best standard PWM finders and a recently developed alternative method for finding motifs of variable structure. All the methods performed comparably in held-out cross validation tests. Known motifs of variable structure were recovered for p53, Stat5a and Stat5b. In addition our method recovered a novel generalised version of an existing PWM for Sp1 that allows for variable length binding. This motif improved classification performance.</p> <p>Conclusions</p> <p>We have presented a new gapped PWM model for variable length DNA binding sites that is not too restrictive nor over-parameterised. Our comparison with existing tools shows that on average it does not have better predictive accuracy than existing methods. However, it does provide more interpretable models of motifs of variable structure that are suitable for follow-up structural studies. To our knowledge, we are the first to apply variable length motif models to eukaryotic ChIP-seq data sets and consequently the first to show their value in this domain. The results include a novel motif for the ubiquitous transcription factor Sp1.</p>
url http://www.biomedcentral.com/1471-2164/11/30
work_keys_str_mv AT wernischlorenz variablestructuremotifsfortranscriptionfactorbindingsites
AT dyernigel variablestructuremotifsfortranscriptionfactorbindingsites
AT evanskennethj variablestructuremotifsfortranscriptionfactorbindingsites
AT reidjohne variablestructuremotifsfortranscriptionfactorbindingsites
AT ottsascha variablestructuremotifsfortranscriptionfactorbindingsites
_version_ 1725074629739937792