A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
eLife Sciences Publications Ltd
2020-03-01
|
Series: | eLife |
Subjects: | |
Online Access: | https://elifesciences.org/articles/53275 |
id |
doaj-70798b7c6e6f4ab4af1c0a3ca5beb430 |
---|---|
record_format |
Article |
spelling |
doaj-70798b7c6e6f4ab4af1c0a3ca5beb4302021-05-05T20:54:23ZengeLife Sciences Publications LtdeLife2050-084X2020-03-01910.7554/eLife.53275A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generationDaniel S Quintana0https://orcid.org/0000-0003-2876-0004Norwegian Centre for Mental Disorders Research (NORMENT), Division of Mental Health and Addiction, University of Oslo, and Oslo University Hospital, Oslo, NorwayOpen research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.https://elifesciences.org/articles/53275meta-researchdatastatistics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Daniel S Quintana |
spellingShingle |
Daniel S Quintana A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation eLife meta-research data statistics |
author_facet |
Daniel S Quintana |
author_sort |
Daniel S Quintana |
title |
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
title_short |
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
title_full |
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
title_fullStr |
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
title_full_unstemmed |
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
title_sort |
synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation |
publisher |
eLife Sciences Publications Ltd |
series |
eLife |
issn |
2050-084X |
publishDate |
2020-03-01 |
description |
Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy. |
topic |
meta-research data statistics |
url |
https://elifesciences.org/articles/53275 |
work_keys_str_mv |
AT danielsquintana asyntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration AT danielsquintana syntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration |
_version_ |
1721458501679579136 |