Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

Background: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA reg...

Full description

Bibliographic Details
Main Authors: Hocking, T.D (Author), Liehrmann, A. (Author), Rigaill, G. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03110nam a2200469Ia 4500
001 10.1186-s12859-021-04221-5
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04221-5 
520 3 |a Background: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson distribution to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results: Our comparisons on seven reference datasets of histone modifications (H3K36me3 & H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model with alternative noise assumptions and supervised learning of the penalty parameter reduces the over-dispersion exhibited by count data. These models, implemented in the R package CROCS (https://github.com/aLiehrmann/CROCS), detect the peaks more accurately than algorithms which rely on natural assumptions. Conclusion: The segmentation models we propose can benefit researchers in the field of epigenetics by providing new high-quality peak prediction tracks for H3K36me3 and H3K4me3 histone modifications. © 2021, The Author(s). 
650 0 4 |a algorithm 
650 0 4 |a Algorithms 
650 0 4 |a Change point detection 
650 0 4 |a ChIP-seq 
650 0 4 |a chromatin immunoprecipitation 
650 0 4 |a Chromatin immunoprecipitation 
650 0 4 |a Chromatin Immunoprecipitation 
650 0 4 |a Chromatin Immunoprecipitation Sequencing 
650 0 4 |a DNA sequence 
650 0 4 |a Gene expression 
650 0 4 |a high throughput sequencing 
650 0 4 |a High-Throughput Nucleotide Sequencing 
650 0 4 |a High-throughput sequencing 
650 0 4 |a Histone modification 
650 0 4 |a Histone modifications 
650 0 4 |a Likelihood inference 
650 0 4 |a Multiple changepoint detection 
650 0 4 |a Over-dispersion 
650 0 4 |a Peak calling 
650 0 4 |a Penalty parameters 
650 0 4 |a Poisson distribution 
650 0 4 |a Segmentation models 
650 0 4 |a Sequence Analysis, DNA 
650 0 4 |a Statistical algorithm 
650 0 4 |a Supervised learning 
650 0 4 |a Supervised segmentation 
700 1 |a Hocking, T.D.  |e author 
700 1 |a Liehrmann, A.  |e author 
700 1 |a Rigaill, G.  |e author 
773 |t BMC Bioinformatics