Linking Sensitive Data – Applications, Techniques, and Challenges
Introduction The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the li...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2020-12-01
|
Series: | International Journal of Population Data Science |
Online Access: | https://ijpds.org/article/view/1475 |
id |
doaj-6b82ebd00160482b85ac9b7ae6d3e2f1 |
---|---|
record_format |
Article |
spelling |
doaj-6b82ebd00160482b85ac9b7ae6d3e2f12021-02-10T16:43:03ZengSwansea UniversityInternational Journal of Population Data Science2399-49082020-12-015510.23889/ijpds.v5i5.1475Linking Sensitive Data – Applications, Techniques, and ChallengesPeter Christen0Thilina Ranbaduge1Rainer Schnell2Research School of Computer Science, Australian National University, Canberra, AustraliaResearch School of Computer Science, Australian National University, Canberra, AustraliaResearch Methodology Group, University Duisburg-Essen, Duisburg, Germany Introduction The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the linking of sensitive databases while at the same time preserving the privacy of individuals represented in these databases. Objectives and approach We present several case studies where the privacy-preserving linking of sensitive databases is crucial, and then discuss the advantages and limitations of existing algorithms and techniques to link sensitive databases. We discuss privacy techniques such as Bloom filter encoding, hashing, and secure multi-party computation, from the point of view of a linkage practitioner. We highlight those aspects that are important when selecting or implementing a privacy-preserving linkage technique within practical applications. Results Conceptually, linkage techniques can be evaluated across three main dimensions linkage quality, scalability to linking large or multiple databases, and the privacy protection provided by a technique. From a practical perspective, however, several other dimensions are crucial, including the availability of software or ease of implementation, technical knowledge available in an organisation, and the suitability of techniques for a given linkage scenario. Our analysis of a diverse range of linkage techniques has shown that currently no technique provides an adequate solution along all conceptual as well as all practical dimensions. Conclusions More research is required to develop novel techniques that facilitate the privacy-preserving linkage of large sensitive databases across organisations; including new encoding methods and cryptanalysis attacks (where until now most attacks have neglected the attack vectors that likely occur in practice), and novel evaluation measures to assess the privacy provided by linkage techniques. We encourage practitioners to be aware of the identified limitations – as well as the opportunities – of existing privacy-preserving linkage techniques and carefully assess the technical and organisational requirements of such techniques within their institution. https://ijpds.org/article/view/1475 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Peter Christen Thilina Ranbaduge Rainer Schnell |
spellingShingle |
Peter Christen Thilina Ranbaduge Rainer Schnell Linking Sensitive Data – Applications, Techniques, and Challenges International Journal of Population Data Science |
author_facet |
Peter Christen Thilina Ranbaduge Rainer Schnell |
author_sort |
Peter Christen |
title |
Linking Sensitive Data – Applications, Techniques, and Challenges |
title_short |
Linking Sensitive Data – Applications, Techniques, and Challenges |
title_full |
Linking Sensitive Data – Applications, Techniques, and Challenges |
title_fullStr |
Linking Sensitive Data – Applications, Techniques, and Challenges |
title_full_unstemmed |
Linking Sensitive Data – Applications, Techniques, and Challenges |
title_sort |
linking sensitive data – applications, techniques, and challenges |
publisher |
Swansea University |
series |
International Journal of Population Data Science |
issn |
2399-4908 |
publishDate |
2020-12-01 |
description |
Introduction
The linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging from health and social science research to national censuses. Various techniques have been developed to facilitate the linking of sensitive databases while at the same time preserving the privacy of individuals represented in these databases.
Objectives and approach
We present several case studies where the privacy-preserving linking of sensitive databases is crucial, and then discuss the advantages and limitations of existing algorithms and techniques to link sensitive databases. We discuss privacy techniques such as Bloom filter encoding, hashing, and secure multi-party computation, from the point of view of a linkage practitioner. We highlight those aspects that are important when selecting or implementing a privacy-preserving linkage technique within practical applications.
Results
Conceptually, linkage techniques can be evaluated across three main dimensions linkage quality, scalability to linking large or multiple databases, and the privacy protection provided by a technique. From a practical perspective, however, several other dimensions are crucial, including the availability of software or ease of implementation, technical knowledge available in an organisation, and the suitability of techniques for a given linkage scenario. Our analysis of a diverse range of linkage techniques has shown that currently no technique provides an adequate solution along all conceptual as well as all practical dimensions.
Conclusions
More research is required to develop novel techniques that facilitate the privacy-preserving linkage of large sensitive databases across organisations; including new encoding methods and cryptanalysis attacks (where until now most attacks have neglected the attack vectors that likely occur in practice), and novel evaluation measures to assess the privacy provided by linkage techniques. We encourage practitioners to be aware of the identified limitations – as well as the opportunities – of existing privacy-preserving linkage techniques and carefully assess the technical and organisational requirements of such techniques within their institution.
|
url |
https://ijpds.org/article/view/1475 |
work_keys_str_mv |
AT peterchristen linkingsensitivedataapplicationstechniquesandchallenges AT thilinaranbaduge linkingsensitivedataapplicationstechniquesandchallenges AT rainerschnell linkingsensitivedataapplicationstechniquesandchallenges |
_version_ |
1724275173849300992 |