Fully synthetic neuroimaging data for replication and exploration

Scientific transparency, data exploration, and education are advanced through data sharing. However, risk for disclosure of personal information and institutional data sharing regulations can impede human subject/patient data sharing and thus limit open science initiatives. Sharing fully synthetic d...

Full description

Bibliographic Details
Main Authors: Kenneth I. Vaden, Jr., Mulugeta Gebregziabher, Dyslexia Data Consortium, Mark A. Eckert
Format: Article
Language:English
Published: Elsevier 2020-12-01
Series:NeuroImage
Subjects:
MRI
Online Access:http://www.sciencedirect.com/science/article/pii/S1053811920307709
id doaj-51aaaf85c211478b8a89c1668590da4e
record_format Article
spelling doaj-51aaaf85c211478b8a89c1668590da4e2020-11-25T03:40:11ZengElsevierNeuroImage1095-95722020-12-01223117284Fully synthetic neuroimaging data for replication and explorationKenneth I. Vaden, Jr.0Mulugeta Gebregziabher1 Dyslexia Data Consortium2Mark A. Eckert3Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, Unites States; Corresponding authors.Division of Biostatistics and Epidemiology, Medical University of South Carolina, Unites StatesDivision of Biostatistics and Epidemiology, Medical University of South Carolina, Unites StatesDepartment of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, Unites States; Corresponding authors.Scientific transparency, data exploration, and education are advanced through data sharing. However, risk for disclosure of personal information and institutional data sharing regulations can impede human subject/patient data sharing and thus limit open science initiatives. Sharing fully synthetic data is an alternative when it is not possible to share real or observed data. Here we describe a data sharing approach that borrows principles and methods from multiple imputation to replace observed values with synthetic values, thereby creating a fully synthetic neuroimaging dataset that accurately represents the covariance structure of the observed dataset. Predictor tables composed of demographic, site, behavioral and total intracranial volume (ICV) variables from 264 pediatric cases were used to create synthetic predictor tables, which were then used to synthesize gray matter images derived from T1-weighted data. The synthetic predictor tables demonstrated pooled variance and statistical estimates that closely approximated the observed data, as reflected in measures of efficiency and statistical bias. Similarly, the synthetic gray matter data accurately represented the variance and voxel-level associations with predictor variables (age, sex, verbal IQ, and ICV). The magnitude and spatial distribution of gray matter effects in the observed imaging data were replicated in the pooled results from the synthetic datasets. This approach for generating fully synthetic neuroimaging data has widespread potential for data sharing, including replication, new discovery, and education. Fully synthetic neuroimaging datasets can enable data-sharing because it accurately represents patterns of variance in the original data, while diminishing the risk of privacy disclosures that can accompany neuroimaging data sharing.http://www.sciencedirect.com/science/article/pii/S1053811920307709MRISynthesisData sharingMultiple imputationNeuroimaging methodsOpen science
collection DOAJ
language English
format Article
sources DOAJ
author Kenneth I. Vaden, Jr.
Mulugeta Gebregziabher
Dyslexia Data Consortium
Mark A. Eckert
spellingShingle Kenneth I. Vaden, Jr.
Mulugeta Gebregziabher
Dyslexia Data Consortium
Mark A. Eckert
Fully synthetic neuroimaging data for replication and exploration
NeuroImage
MRI
Synthesis
Data sharing
Multiple imputation
Neuroimaging methods
Open science
author_facet Kenneth I. Vaden, Jr.
Mulugeta Gebregziabher
Dyslexia Data Consortium
Mark A. Eckert
author_sort Kenneth I. Vaden, Jr.
title Fully synthetic neuroimaging data for replication and exploration
title_short Fully synthetic neuroimaging data for replication and exploration
title_full Fully synthetic neuroimaging data for replication and exploration
title_fullStr Fully synthetic neuroimaging data for replication and exploration
title_full_unstemmed Fully synthetic neuroimaging data for replication and exploration
title_sort fully synthetic neuroimaging data for replication and exploration
publisher Elsevier
series NeuroImage
issn 1095-9572
publishDate 2020-12-01
description Scientific transparency, data exploration, and education are advanced through data sharing. However, risk for disclosure of personal information and institutional data sharing regulations can impede human subject/patient data sharing and thus limit open science initiatives. Sharing fully synthetic data is an alternative when it is not possible to share real or observed data. Here we describe a data sharing approach that borrows principles and methods from multiple imputation to replace observed values with synthetic values, thereby creating a fully synthetic neuroimaging dataset that accurately represents the covariance structure of the observed dataset. Predictor tables composed of demographic, site, behavioral and total intracranial volume (ICV) variables from 264 pediatric cases were used to create synthetic predictor tables, which were then used to synthesize gray matter images derived from T1-weighted data. The synthetic predictor tables demonstrated pooled variance and statistical estimates that closely approximated the observed data, as reflected in measures of efficiency and statistical bias. Similarly, the synthetic gray matter data accurately represented the variance and voxel-level associations with predictor variables (age, sex, verbal IQ, and ICV). The magnitude and spatial distribution of gray matter effects in the observed imaging data were replicated in the pooled results from the synthetic datasets. This approach for generating fully synthetic neuroimaging data has widespread potential for data sharing, including replication, new discovery, and education. Fully synthetic neuroimaging datasets can enable data-sharing because it accurately represents patterns of variance in the original data, while diminishing the risk of privacy disclosures that can accompany neuroimaging data sharing.
topic MRI
Synthesis
Data sharing
Multiple imputation
Neuroimaging methods
Open science
url http://www.sciencedirect.com/science/article/pii/S1053811920307709
work_keys_str_mv AT kennethivadenjr fullysyntheticneuroimagingdataforreplicationandexploration
AT mulugetagebregziabher fullysyntheticneuroimagingdataforreplicationandexploration
AT dyslexiadataconsortium fullysyntheticneuroimagingdataforreplicationandexploration
AT markaeckert fullysyntheticneuroimagingdataforreplicationandexploration
_version_ 1724535716457742336