Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Sciendo
2020-04-01
|
Series: | Proceedings on Privacy Enhancing Technologies |
Subjects: | |
Online Access: | https://doi.org/10.2478/popets-2020-0024 |
id |
doaj-300228444de245af950fb6ede676cede |
---|---|
record_format |
Article |
spelling |
doaj-300228444de245af950fb6ede676cede2021-09-05T14:01:10ZengSciendoProceedings on Privacy Enhancing Technologies2299-09842020-04-012020220922910.2478/popets-2020-0024popets-2020-0024Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the RescueSchoppmann Phillipp0Vogelsang Lennart1Gascón Adrià2Balle Borja3Humboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyHumboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyWork done while at the Alan Turing Institute, London, UK. Now at Google, London, UK.Work done at Amazon Research, Cambridge, UK. Now at DeepMind, London, UK.Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.https://doi.org/10.2478/popets-2020-0024text analysisdocument similaritymulti-party computationdifferential privacy |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja |
spellingShingle |
Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue Proceedings on Privacy Enhancing Technologies text analysis document similarity multi-party computation differential privacy |
author_facet |
Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja |
author_sort |
Schoppmann Phillipp |
title |
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue |
title_short |
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue |
title_full |
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue |
title_fullStr |
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue |
title_full_unstemmed |
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue |
title_sort |
secure and scalable document similarity on distributed databases: differential privacy to the rescue |
publisher |
Sciendo |
series |
Proceedings on Privacy Enhancing Technologies |
issn |
2299-0984 |
publishDate |
2020-04-01 |
description |
Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications. |
topic |
text analysis document similarity multi-party computation differential privacy |
url |
https://doi.org/10.2478/popets-2020-0024 |
work_keys_str_mv |
AT schoppmannphillipp secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT vogelsanglennart secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT gasconadria secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT balleborja secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue |
_version_ |
1717810682324320256 |