Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks

<p>Abstract</p> <p>Background</p> <p>High throughput signature sequencing holds many promises, one of which is the ready identification of <it>in vivo </it>transcription factor binding sites, histone modifications, changes in chromatin structure and patterns...

Full description

Bibliographic Details
Main Authors: Courdy Samir J, Nix David A, Boucher Kenneth M
Format: Article
Language:English
Published: BMC 2008-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/523
id doaj-4014758314d545c2b48def68c627404e
record_format Article
spelling doaj-4014758314d545c2b48def68c627404e2020-11-24T20:54:16ZengBMCBMC Bioinformatics1471-21052008-12-019152310.1186/1471-2105-9-523Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaksCourdy Samir JNix David ABoucher Kenneth M<p>Abstract</p> <p>Background</p> <p>High throughput signature sequencing holds many promises, one of which is the ready identification of <it>in vivo </it>transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation.</p> <p>Results</p> <p>Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds.</p> <p>Conclusion</p> <p>The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from <url>http://useq.sourceforge.net/</url>.</p> http://www.biomedcentral.com/1471-2105/9/523
collection DOAJ
language English
format Article
sources DOAJ
author Courdy Samir J
Nix David A
Boucher Kenneth M
spellingShingle Courdy Samir J
Nix David A
Boucher Kenneth M
Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
BMC Bioinformatics
author_facet Courdy Samir J
Nix David A
Boucher Kenneth M
author_sort Courdy Samir J
title Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
title_short Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
title_full Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
title_fullStr Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
title_full_unstemmed Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks
title_sort empirical methods for controlling false positives and estimating confidence in chip-seq peaks
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-12-01
description <p>Abstract</p> <p>Background</p> <p>High throughput signature sequencing holds many promises, one of which is the ready identification of <it>in vivo </it>transcription factor binding sites, histone modifications, changes in chromatin structure and patterns of DNA methylation across entire genomes. In these experiments, chromatin immunoprecipitation is used to enrich for particular DNA sequences of interest and signature sequencing is used to map the regions to the genome (ChIP-Seq). Elucidation of these sites of DNA-protein binding/modification are proving instrumental in reconstructing networks of gene regulation and chromatin remodelling that direct development, response to cellular perturbation, and neoplastic transformation.</p> <p>Results</p> <p>Here we present a package of algorithms and software that makes use of control input data to reduce false positives and estimate confidence in ChIP-Seq peaks. Several different methods were compared using two simulated spike-in datasets. Use of control input data and a normalized difference score were found to more than double the recovery of ChIP-Seq peaks at a 5% false discovery rate (FDR). Moreover, both a binomial p-value/q-value and an empirical FDR were found to predict the true FDR within 2–3 fold and are more reliable estimators of confidence than a global Poisson p-value. These methods were then used to reanalyze Johnson et al.'s neuron-restrictive silencer factor (NRSF) ChIP-Seq data without relying on extensive qPCR validated NRSF sites and the presence of NRSF binding motifs for setting thresholds.</p> <p>Conclusion</p> <p>The methods developed and tested here show considerable promise for reducing false positives and estimating confidence in ChIP-Seq data without any prior knowledge of the chIP target. They are part of a larger open source package freely available from <url>http://useq.sourceforge.net/</url>.</p>
url http://www.biomedcentral.com/1471-2105/9/523
work_keys_str_mv AT courdysamirj empiricalmethodsforcontrollingfalsepositivesandestimatingconfidenceinchipseqpeaks
AT nixdavida empiricalmethodsforcontrollingfalsepositivesandestimatingconfidenceinchipseqpeaks
AT boucherkennethm empiricalmethodsforcontrollingfalsepositivesandestimatingconfidenceinchipseqpeaks
_version_ 1716795103159255040