Privacy-preserving data sharing infrastructures for medical research: systematization and comparison

Abstract Background Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and...

Full description

Bibliographic Details
Main Authors: Felix Nikolaus Wirth, Thierry Meurers, Marco Johns, Fabian Prasser
Format: Article
Language:English
Published: BMC 2021-08-01
Series:BMC Medical Informatics and Decision Making
Subjects:
Online Access:https://doi.org/10.1186/s12911-021-01602-x
id doaj-499bd2bac2424def9c69c9a7e3bbe242
record_format Article
spelling doaj-499bd2bac2424def9c69c9a7e3bbe2422021-08-15T11:34:35ZengBMCBMC Medical Informatics and Decision Making1472-69472021-08-0121111310.1186/s12911-021-01602-xPrivacy-preserving data sharing infrastructures for medical research: systematization and comparisonFelix Nikolaus Wirth0Thierry Meurers1Marco Johns2Fabian Prasser3Berlin Institute of Health at Charité – Universitätsmedizin BerlinBerlin Institute of Health at Charité – Universitätsmedizin BerlinBerlin Institute of Health at Charité – Universitätsmedizin BerlinBerlin Institute of Health at Charité – Universitätsmedizin BerlinAbstract Background Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches. Methods The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes. Results Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility. Conclusions There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.https://doi.org/10.1186/s12911-021-01602-xBiomedical data sharingPrivacyUsefulnessSystematizationDistributed computingSecure multi-party computing
collection DOAJ
language English
format Article
sources DOAJ
author Felix Nikolaus Wirth
Thierry Meurers
Marco Johns
Fabian Prasser
spellingShingle Felix Nikolaus Wirth
Thierry Meurers
Marco Johns
Fabian Prasser
Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
BMC Medical Informatics and Decision Making
Biomedical data sharing
Privacy
Usefulness
Systematization
Distributed computing
Secure multi-party computing
author_facet Felix Nikolaus Wirth
Thierry Meurers
Marco Johns
Fabian Prasser
author_sort Felix Nikolaus Wirth
title Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
title_short Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
title_full Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
title_fullStr Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
title_full_unstemmed Privacy-preserving data sharing infrastructures for medical research: systematization and comparison
title_sort privacy-preserving data sharing infrastructures for medical research: systematization and comparison
publisher BMC
series BMC Medical Informatics and Decision Making
issn 1472-6947
publishDate 2021-08-01
description Abstract Background Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches. Methods The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes. Results Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility. Conclusions There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.
topic Biomedical data sharing
Privacy
Usefulness
Systematization
Distributed computing
Secure multi-party computing
url https://doi.org/10.1186/s12911-021-01602-x
work_keys_str_mv AT felixnikolauswirth privacypreservingdatasharinginfrastructuresformedicalresearchsystematizationandcomparison
AT thierrymeurers privacypreservingdatasharinginfrastructuresformedicalresearchsystematizationandcomparison
AT marcojohns privacypreservingdatasharinginfrastructuresformedicalresearchsystematizationandcomparison
AT fabianprasser privacypreservingdatasharinginfrastructuresformedicalresearchsystematizationandcomparison
_version_ 1721206628731060224