A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept...

Full description

Bibliographic Details
Main Author: Daniel S Quintana
Format: Article
Language:English
Published: eLife Sciences Publications Ltd 2020-03-01
Series:eLife
Subjects:
Online Access:https://elifesciences.org/articles/53275
id doaj-70798b7c6e6f4ab4af1c0a3ca5beb430
record_format Article
spelling doaj-70798b7c6e6f4ab4af1c0a3ca5beb4302021-05-05T20:54:23ZengeLife Sciences Publications LtdeLife2050-084X2020-03-01910.7554/eLife.53275A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generationDaniel S Quintana0https://orcid.org/0000-0003-2876-0004Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and Addiction, University of Oslo, and Oslo University Hospital, Oslo, NorwayOpen research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.https://elifesciences.org/articles/53275meta-researchdatastatistics
collection DOAJ
language English
format Article
sources DOAJ
author Daniel S Quintana
spellingShingle Daniel S Quintana
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
eLife
meta-research
data
statistics
author_facet Daniel S Quintana
author_sort Daniel S Quintana
title A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_short A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_full A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_fullStr A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_full_unstemmed A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_sort synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
publisher eLife Sciences Publications Ltd
series eLife
issn 2050-084X
publishDate 2020-03-01
description Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.
topic meta-research
data
statistics
url https://elifesciences.org/articles/53275
work_keys_str_mv AT danielsquintana asyntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration
AT danielsquintana syntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration
_version_ 1721458501679579136