Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine...

Full description

Bibliographic Details
Main Authors:	Schoppmann Phillipp, Vogelsang Lennart, Gascón Adrià, Balle Borja
Format:	Article
Language:	English
Published:	Sciendo 2020-04-01
Series:	Proceedings on Privacy Enhancing Technologies
Subjects:	text analysis document similarity multi-party computation differential privacy
Online Access:	https://doi.org/10.2478/popets-2020-0024

id	doaj-300228444de245af950fb6ede676cede
record_format	Article
spelling	doaj-300228444de245af950fb6ede676cede2021-09-05T14:01:10ZengSciendoProceedings on Privacy Enhancing Technologies2299-09842020-04-012020220922910.2478/popets-2020-0024popets-2020-0024Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the RescueSchoppmann Phillipp0Vogelsang Lennart1Gascón Adrià2Balle Borja3Humboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyHumboldt-Universität zu Berlin and Alexander von Humboldt Institute for Internet and Society, Berlin, GermanyWork done while at the Alan Turing Institute, London, UK. Now at Google, London, UK.Work done at Amazon Research, Cambridge, UK. Now at DeepMind, London, UK.Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.https://doi.org/10.2478/popets-2020-0024text analysisdocument similaritymulti-party computationdifferential privacy
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja
spellingShingle	Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue Proceedings on Privacy Enhancing Technologies text analysis document similarity multi-party computation differential privacy
author_facet	Schoppmann Phillipp Vogelsang Lennart Gascón Adrià Balle Borja
author_sort	Schoppmann Phillipp
title	Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_short	Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_full	Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_fullStr	Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_full_unstemmed	Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
title_sort	secure and scalable document similarity on distributed databases: differential privacy to the rescue
publisher	Sciendo
series	Proceedings on Privacy Enhancing Technologies
issn	2299-0984
publishDate	2020-04-01
description	Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to k-nearest neighbors (k-NN) classification. Due to its non-parametric nature, k-NN presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications.
topic	text analysis document similarity multi-party computation differential privacy
url	https://doi.org/10.2478/popets-2020-0024
work_keys_str_mv	AT schoppmannphillipp secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT vogelsanglennart secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT gasconadria secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue AT balleborja secureandscalabledocumentsimilarityondistributeddatabasesdifferentialprivacytotherescue
_version_	1717810682324320256

Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Similar Items