On the utility of differentially private synthetic data generation and differentially private model release
碩士 === 國立中興大學 === 資訊科學與工程學系所 === 107 === In recent years, with the widespread use of neural networks, a large amount of personal data is being collected. To protect the model from leaking private information, it combines with differential privacy to achieve the goal. In our work, we use the followin...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5394031%22.&searchmode=basic |
id |
ndltd-TW-107NCHU5394031 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCHU53940312019-11-30T06:09:40Z http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5394031%22.&searchmode=basic On the utility of differentially private synthetic data generation and differentially private model release 論差分隱私合成資料與差分隱私機器學習模型釋出之資料可用性 Yen-Ting Chen 陳彥廷 碩士 國立中興大學 資訊科學與工程學系所 107 In recent years, with the widespread use of neural networks, a large amount of personal data is being collected. To protect the model from leaking private information, it combines with differential privacy to achieve the goal. In our work, we use the following two methods to achieve a balance between utility and privacy. The first method, which is called “Clean features with sloppy training”, adds noise to protect the sensitive data during training and finally generates a classifier. We use DP-SGD proposed by Abadi [1] and PATE proposed by Nicolas [2] for experimental analysis. The second method is DP-GAN proposed by Zhang [3] and he proposes a series of strategies for improving the clipping threshold. DP-GAN means the generative adversarial network (GAN) adds noise during training and finally releases a generative model that can generate differentially private synthetic data endlessly. Because both of them have privacy protection, we want to know how much difference between the classifier generated by synthetic data and generated by real data. If the accuracy of the model generated by synthetic data is close to or higher than the model generated by real data, then we can rely on synthetic data to generate the model that meets our needs and has the same privacy protection simultaneously. Therefore, in this paper, after training synthetic data by our proposed method, we compare the generated classifier with the classifier generated by “Clean features with sloppy training” under the same privacy budget on MNIST and FASHION-MNIST. Before the experiment, we speculate that the classifier generated by PATE has the highest accuracy. The results show that the classifier generated by PATE is more accurate than the classifier generated by synthetic data about 0.31% ∼ 7.66%. Chia-Mu Yu 游家牧 2019 學位論文 ; thesis 33 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中興大學 === 資訊科學與工程學系所 === 107 === In recent years, with the widespread use of neural networks, a large amount of personal data is being collected. To protect the model from leaking private information, it combines with differential privacy to achieve the goal. In our work, we use the following two methods to achieve a balance between utility and privacy. The first method, which is called “Clean features with sloppy training”, adds noise to protect the sensitive data during training and finally generates a classifier. We use DP-SGD proposed by Abadi [1] and PATE proposed by Nicolas [2] for experimental analysis. The second method is DP-GAN proposed by Zhang [3] and he proposes a series of strategies for improving the clipping threshold. DP-GAN means the generative adversarial network (GAN) adds noise during training and finally releases a generative model that can generate differentially private synthetic data endlessly. Because both of them have privacy protection, we want to know how much difference between the classifier generated by synthetic data and generated by real data. If the accuracy of the model generated by synthetic data is close to or higher than the model generated by real data, then we can rely on synthetic data to generate the model that meets our needs and has the same privacy protection simultaneously. Therefore, in this paper, after training synthetic data by our proposed method, we compare the generated classifier with the classifier generated by “Clean features with sloppy training” under the same privacy budget on MNIST and FASHION-MNIST. Before the experiment, we speculate that the classifier generated by PATE has the highest accuracy. The results show that the classifier generated by PATE is more accurate than the classifier generated by synthetic data about 0.31% ∼ 7.66%.
|
author2 |
Chia-Mu Yu |
author_facet |
Chia-Mu Yu Yen-Ting Chen 陳彥廷 |
author |
Yen-Ting Chen 陳彥廷 |
spellingShingle |
Yen-Ting Chen 陳彥廷 On the utility of differentially private synthetic data generation and differentially private model release |
author_sort |
Yen-Ting Chen |
title |
On the utility of differentially private synthetic data generation and differentially private model release |
title_short |
On the utility of differentially private synthetic data generation and differentially private model release |
title_full |
On the utility of differentially private synthetic data generation and differentially private model release |
title_fullStr |
On the utility of differentially private synthetic data generation and differentially private model release |
title_full_unstemmed |
On the utility of differentially private synthetic data generation and differentially private model release |
title_sort |
on the utility of differentially private synthetic data generation and differentially private model release |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/cgi-bin/gs32/gsweb.cgi/login?o=dnclcdr&s=id=%22107NCHU5394031%22.&searchmode=basic |
work_keys_str_mv |
AT yentingchen ontheutilityofdifferentiallyprivatesyntheticdatagenerationanddifferentiallyprivatemodelrelease AT chényàntíng ontheutilityofdifferentiallyprivatesyntheticdatagenerationanddifferentiallyprivatemodelrelease AT yentingchen lùnchàfēnyǐnsīhéchéngzīliàoyǔchàfēnyǐnsījīqìxuéxímóxíngshìchūzhīzīliàokěyòngxìng AT chényàntíng lùnchàfēnyǐnsīhéchéngzīliàoyǔchàfēnyǐnsījīqìxuéxímóxíngshìchūzhīzīliàokěyòngxìng |
_version_ |
1719300453980700672 |