About the variability, quality and reproducibility of ChIP-seq data

<p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reprodu...

Full description

Bibliographic Details
Main Authors:	Hamzavi-Pinon Violaine, Cholley, Marco Mendoza-Parra, Hinrich Gronemeyer
Format:	Article
Language:	English
Published:	ScienceOpen 2016-06-01
Series:	ScienceOpen Research
Online Access:	https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec

id	doaj-0d0be730ac6846be8b0d05cb8395c22a
record_format	Article
spelling	doaj-0d0be730ac6846be8b0d05cb8395c22a2020-12-15T17:21:37ZengScienceOpenScienceOpen Research2199-10062016-06-0110.14293/S2199-1006.1.SOR-LIFE.ARGGHM.v1About the variability, quality and reproducibility of ChIP-seq dataHamzavi-Pinon ViolaineCholleyMarco Mendoza-ParraHinrich Gronemeyer<p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.  Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data. </p>https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hamzavi-Pinon Violaine Cholley Marco Mendoza-Parra Hinrich Gronemeyer
spellingShingle	Hamzavi-Pinon Violaine Cholley Marco Mendoza-Parra Hinrich Gronemeyer About the variability, quality and reproducibility of ChIP-seq data ScienceOpen Research
author_facet	Hamzavi-Pinon Violaine Cholley Marco Mendoza-Parra Hinrich Gronemeyer
author_sort	Hamzavi-Pinon Violaine
title	About the variability, quality and reproducibility of ChIP-seq data
title_short	About the variability, quality and reproducibility of ChIP-seq data
title_full	About the variability, quality and reproducibility of ChIP-seq data
title_fullStr	About the variability, quality and reproducibility of ChIP-seq data
title_full_unstemmed	About the variability, quality and reproducibility of ChIP-seq data
title_sort	about the variability, quality and reproducibility of chip-seq data
publisher	ScienceOpen
series	ScienceOpen Research
issn	2199-1006
publishDate	2016-06-01
description	<p>The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.  Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data. </p>
url	https://www.scienceopen.com/document?vid=708fd3cf-11ef-4ea4-bbca-629c343c2cec
work_keys_str_mv	AT hamzavipinonviolaine aboutthevariabilityqualityandreproducibilityofchipseqdata AT cholley aboutthevariabilityqualityandreproducibilityofchipseqdata AT marcomendozaparra aboutthevariabilityqualityandreproducibilityofchipseqdata AT hinrichgronemeyer aboutthevariabilityqualityandreproducibilityofchipseqdata
_version_	1724382295451762688

About the variability, quality and reproducibility of ChIP-seq data

Similar Items