Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine...

Full description

Bibliographic Details
Main Authors: Schoppmann Phillipp, Vogelsang Lennart, Gascón Adrià, Balle Borja
Format: Article
Language:English
Published: Sciendo 2020-04-01
Series:Proceedings on Privacy Enhancing Technologies
Subjects:
Online Access:https://doi.org/10.2478/popets-2020-0024
id doaj-300228444de245af950fb6ede676cede
record_format Article
spelling doaj-300228444de245af950fb6ede676cede2021-09-05T14:01:10ZengSciendoProceedings on Privacy Enhancing Technologies2299-09842020-04-012020220922910.2478/popets-2020-0024popets-2020-0024Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the RescueSchoppmann Phillipp0Vogelsang Lennart1Gascón Adrià2Balle Borja3Humboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyHumboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyWork done while at the Alan Turing Institute, London, UK. Now at Google, London, UK.Work done at Amazon Research, Cambridge, UK. Now at DeepMind, London, UK.Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.https://doi.org/10.2478/popets-2020-0024text analysisdocument similaritymulti-party computationdifferential privacy
collection DOAJ
language English
format Article
sources DOAJ
author Schoppmann Phillipp
Vogelsang Lennart
Gascón Adrià
Balle Borja
spellingShingle Schoppmann Phillipp
Vogelsang Lennart
Gascón Adrià
Balle Borja
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
Proceedings on Privacy Enhancing Technologies
text analysis
document similarity
multi-party computation
differential privacy
author_facet Schoppmann Phillipp
Vogelsang Lennart
Gascón Adrià
Balle Borja
author_sort Schoppmann Phillipp
title Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_short Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_full Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_fullStr Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_full_unstemmed Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_sort secure and scalable document similarity on distributed databases: differential privacy to the rescue
publisher Sciendo
series Proceedings on Privacy Enhancing Technologies
issn 2299-0984
publishDate 2020-04-01
description Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.
topic text analysis
document similarity
multi-party computation
differential privacy
url https://doi.org/10.2478/popets-2020-0024
work_keys_str_mv AT schoppmannphillipp secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue
AT vogelsanglennart secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue
AT gasconadria secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue
AT balleborja secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue
_version_ 1717810682324320256