Privacy Preservation for Cloud-Based Data Sharing and Data Analytics

Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challen...

Full description

Bibliographic Details
Main Author: Zheng, Yao
Other Authors: Computer Science
Format: Others
Published: Virginia Tech 2016
Subjects:
Online Access:http://hdl.handle.net/10919/73796
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-73796
record_format oai_dc
collection NDLTD
format Others
sources NDLTD
topic information privacy
cryptography
machine learning
spellingShingle information privacy
cryptography
machine learning
Zheng, Yao
Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
description Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challenges come into view. Our research goal in this dissertation is to develop privacy protection frameworks and techniques suitable for the emerging cloud-based data services, in particular privacy-preserving algorithms and protocols for the cloud-based data sharing and data analytics services. Cloud computing has enabled users to store, process, and communicate their personal information through third-party services. It has also raised privacy issues regarding losing control over data, mass harvesting of information, and un-consented disclosure of personal content. Above all, the main concern is the lack of understanding about data privacy in cloud environments. Currently, the cloud service providers either advocate the principle of third-party doctrine and deny users' rights to protect their data stored in the cloud; or rely the notice-and-choice framework and present users with ambiguous, incomprehensible privacy statements without any meaningful privacy guarantee. In this regard, our research has three main contributions. First, to capture users' privacy expectations in cloud environments, we conceptually divide personal data into two categories, i.e., visible data and invisible data. The visible data refer to information users intentionally create, upload to, and share through the cloud; the invisible data refer to users' information retained in the cloud that is aggregated, analyzed, and repurposed without their knowledge or understanding. Second, to address users' privacy concerns raised by cloud computing, we propose two privacy protection frameworks, namely individual control and use limitation. The individual control framework emphasizes users' capability to govern the access to the visible data stored in the cloud. The use limitation framework emphasizes users' expectation to remain anonymous when the invisible data are aggregated and analyzed by cloud-based data services. Finally, we investigate various techniques to accommodate the new privacy protection frameworks, in the context of four cloud-based data services: personal health record sharing, location-based proximity test, link recommendation for social networks, and face tagging in photo management applications. For the first case, we develop a key-based protection technique to enforce fine-grained access control to users' digital health records. For the second case, we develop a key-less protection technique to achieve location-specific user selection. For latter two cases, we develop distributed learning algorithms to prevent large scale data harvesting. We further combine these algorithms with query regulation techniques to achieve user anonymity. The picture that is emerging from the above works is a bleak one. Regarding to personal data, the reality is we can no longer control them all. As communication technologies evolve, the scope of personal data has expanded beyond local, discrete silos, and integrated into the Internet. The traditional understanding of privacy must be updated to reflect these changes. In addition, because privacy is a particularly nuanced problem that is governed by context, there is no one-size-fit-all solution. While some cases can be salvaged either by cryptography or by other means, in others a rethinking of the trade-offs between utility and privacy appears to be necessary. === Ph. D.
author2 Computer Science
author_facet Computer Science
Zheng, Yao
author Zheng, Yao
author_sort Zheng, Yao
title Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
title_short Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
title_full Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
title_fullStr Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
title_full_unstemmed Privacy Preservation for Cloud-Based Data Sharing and Data Analytics
title_sort privacy preservation for cloud-based data sharing and data analytics
publisher Virginia Tech
publishDate 2016
url http://hdl.handle.net/10919/73796
work_keys_str_mv AT zhengyao privacypreservationforcloudbaseddatasharinganddataanalytics
_version_ 1719356458475192320
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-737962020-11-12T05:42:52Z Privacy Preservation for Cloud-Based Data Sharing and Data Analytics Zheng, Yao Computer Science Lou, Wenjing Yu, Guoqiang Jajodia, Sushil Ramakrishnan, Naren Hou, Yiwei Thomas Chen, Ing Ray information privacy cryptography machine learning Data privacy is a globally recognized human right for individuals to control the access to their personal information, and bar the negative consequences from the use of this information. As communication technologies progress, the means to protect data privacy must also evolve to address new challenges come into view. Our research goal in this dissertation is to develop privacy protection frameworks and techniques suitable for the emerging cloud-based data services, in particular privacy-preserving algorithms and protocols for the cloud-based data sharing and data analytics services. Cloud computing has enabled users to store, process, and communicate their personal information through third-party services. It has also raised privacy issues regarding losing control over data, mass harvesting of information, and un-consented disclosure of personal content. Above all, the main concern is the lack of understanding about data privacy in cloud environments. Currently, the cloud service providers either advocate the principle of third-party doctrine and deny users' rights to protect their data stored in the cloud; or rely the notice-and-choice framework and present users with ambiguous, incomprehensible privacy statements without any meaningful privacy guarantee. In this regard, our research has three main contributions. First, to capture users' privacy expectations in cloud environments, we conceptually divide personal data into two categories, i.e., visible data and invisible data. The visible data refer to information users intentionally create, upload to, and share through the cloud; the invisible data refer to users' information retained in the cloud that is aggregated, analyzed, and repurposed without their knowledge or understanding. Second, to address users' privacy concerns raised by cloud computing, we propose two privacy protection frameworks, namely individual control and use limitation. The individual control framework emphasizes users' capability to govern the access to the visible data stored in the cloud. The use limitation framework emphasizes users' expectation to remain anonymous when the invisible data are aggregated and analyzed by cloud-based data services. Finally, we investigate various techniques to accommodate the new privacy protection frameworks, in the context of four cloud-based data services: personal health record sharing, location-based proximity test, link recommendation for social networks, and face tagging in photo management applications. For the first case, we develop a key-based protection technique to enforce fine-grained access control to users' digital health records. For the second case, we develop a key-less protection technique to achieve location-specific user selection. For latter two cases, we develop distributed learning algorithms to prevent large scale data harvesting. We further combine these algorithms with query regulation techniques to achieve user anonymity. The picture that is emerging from the above works is a bleak one. Regarding to personal data, the reality is we can no longer control them all. As communication technologies evolve, the scope of personal data has expanded beyond local, discrete silos, and integrated into the Internet. The traditional understanding of privacy must be updated to reflect these changes. In addition, because privacy is a particularly nuanced problem that is governed by context, there is no one-size-fit-all solution. While some cases can be salvaged either by cryptography or by other means, in others a rethinking of the trade-offs between utility and privacy appears to be necessary. Ph. D. 2016-12-22T09:00:44Z 2016-12-22T09:00:44Z 2016-12-21 Dissertation vt_gsexam:9216 http://hdl.handle.net/10919/73796 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech