About the variability, quality and reproducibility of ChIP-seq data

<p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.&nbsp; Despite the incorporation of increasingly quantitative technologies, the field suffers from important reprodu...

Full description

Bibliographic Details
Main Authors: Hamzavi-Pinon Violaine, Cholley, Marco Mendoza-Parra, Hinrich Gronemeyer
Format: Article
Language:English
Published: ScienceOpen 2016-06-01
Series:ScienceOpen Research
Online Access:https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec
id doaj-0d0be730ac6846be8b0d05cb8395c22a
record_format Article
spelling doaj-0d0be730ac6846be8b0d05cb8395c22a2020-12-15T17:21:37ZengScienceOpenScienceOpen Research2199-10062016-06-0110.14293/S2199-1006.1.SOR-LIFE.ARGGHM.v1About the variability, quality and reproducibility of ChIP-seq dataHamzavi-Pinon ViolaineCholleyMarco Mendoza-ParraHinrich Gronemeyer<p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.&nbsp; Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.&nbsp; Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.&nbsp;</p>https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec
collection DOAJ
language English
format Article
sources DOAJ
author Hamzavi-Pinon Violaine
Cholley
Marco Mendoza-Parra
Hinrich Gronemeyer
spellingShingle Hamzavi-Pinon Violaine
Cholley
Marco Mendoza-Parra
Hinrich Gronemeyer
About the variability, quality and reproducibility of ChIP-seq data
ScienceOpen Research
author_facet Hamzavi-Pinon Violaine
Cholley
Marco Mendoza-Parra
Hinrich Gronemeyer
author_sort Hamzavi-Pinon Violaine
title About the variability, quality and reproducibility of ChIP-seq data
title_short About the variability, quality and reproducibility of ChIP-seq data
title_full About the variability, quality and reproducibility of ChIP-seq data
title_fullStr About the variability, quality and reproducibility of ChIP-seq data
title_full_unstemmed About the variability, quality and reproducibility of ChIP-seq data
title_sort about the variability, quality and reproducibility of chip-seq data
publisher ScienceOpen
series ScienceOpen Research
issn 2199-1006
publishDate 2016-06-01
description <p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.&nbsp; Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.&nbsp; Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.&nbsp;</p>
url https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec
work_keys_str_mv AT hamzavipinonviolaine aboutthevariabilityqualityandreproducibilityofchipseqdata
AT cholley aboutthevariabilityqualityandreproducibilityofchipseqdata
AT marcomendozaparra aboutthevariabilityqualityandreproducibilityofchipseqdata
AT hinrichgronemeyer aboutthevariabilityqualityandreproducibilityofchipseqdata
_version_ 1724382295451762688