Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings

We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p >> n scenario that has been the focus of much recent s...

Full description

Bibliographic Details
Main Authors: Stephen E. Fienberg, Jiashun Jin
Format: Article
Language:English
Published: Labor Dynamics Institute 2012-07-01
Series:The Journal of Privacy and Confidentiality
Subjects:
Online Access:https://journalprivacyconfidentiality.org/index.php/jpc/article/view/618
id doaj-56a1b3afc97b4d35a004519bf05b238a
record_format Article
spelling doaj-56a1b3afc97b4d35a004519bf05b238a2020-11-25T01:32:35ZengLabor Dynamics InstituteThe Journal of Privacy and Confidentiality2575-85272012-07-014110.29012/jpc.v4i1.618Privacy-Preserving Data Sharing in High Dimensional Regression and Classification SettingsStephen E. Fienberg0Jiashun Jin1Departmen t of Statistics, Machine Learning Department, Living analytics Research Center, Cylab, Carnegie Mellon University, Pittsburgh, PADepartment of Statistics, Carnegie Mellon University, Pittsburgh, PA We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p >> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of ``cautious", ``regular", and ``generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the ``feature privacy" rather than the ``sample privacy", though the violation of the former may lead to the latter. We evaluate the data sharing methods using {\it phase diagram} from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature)-privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation. https://journalprivacyconfidentiality.org/index.php/jpc/article/view/618Hamming distanceHigher CriticismLASSOMarginal RegressionNoise AdditionPhase Diagram
collection DOAJ
language English
format Article
sources DOAJ
author Stephen E. Fienberg
Jiashun Jin
spellingShingle Stephen E. Fienberg
Jiashun Jin
Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
The Journal of Privacy and Confidentiality
Hamming distance
Higher Criticism
LASSO
Marginal Regression
Noise Addition
Phase Diagram
author_facet Stephen E. Fienberg
Jiashun Jin
author_sort Stephen E. Fienberg
title Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_short Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_full Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_fullStr Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_full_unstemmed Privacy-Preserving Data Sharing in High Dimensional Regression and Classification Settings
title_sort privacy-preserving data sharing in high dimensional regression and classification settings
publisher Labor Dynamics Institute
series The Journal of Privacy and Confidentiality
issn 2575-8527
publishDate 2012-07-01
description We focus on the problem of multi-party data sharing in high dimensional data settings where the number of measured features (or the dimension) p is frequently much larger than the number of subjects (or the sample size) n, the so-called p >> n scenario that has been the focus of much recent statistical research. Here, we consider data sharing for two interconnected problems in high dimensional data analysis, namely the feature selection and classification. We characterize the notions of ``cautious", ``regular", and ``generous" data sharing in terms of their privacy-preserving implications for the parties and their share of data, with focus on the ``feature privacy" rather than the ``sample privacy", though the violation of the former may lead to the latter. We evaluate the data sharing methods using {\it phase diagram} from the statistical literature on multiplicity and Higher Criticism thresholding. In the two-dimensional phase space calibrated by the signal sparsity and signal strength, a phase diagram is a partition of the phase space and contains three distinguished regions, where we have no (feature)-privacy violation, relatively rare privacy violations, and an overwhelming amount of privacy violation.
topic Hamming distance
Higher Criticism
LASSO
Marginal Regression
Noise Addition
Phase Diagram
url https://journalprivacyconfidentiality.org/index.php/jpc/article/view/618
work_keys_str_mv AT stephenefienberg privacypreservingdatasharinginhighdimensionalregressionandclassificationsettings
AT jiashunjin privacypreservingdatasharinginhighdimensionalregressionandclassificationsettings
_version_ 1725081178693697536