Integrity and privacy in adversarial machine learning

Machine learning is being used for an increasing number of applications with societal impact. In such settings, models must be trusted to be fair, useful, and robust. In many applications, a large amount of training data is collected from a variety of sources, including from private or untrusted ind...

Full description

Bibliographic Details
Published:
Online Access:http://hdl.handle.net/2047/D20413920
id ndltd-NEU--neu-bz613971c
record_format oai_dc
spelling ndltd-NEU--neu-bz613971c2021-08-20T05:11:13ZIntegrity and privacy in adversarial machine learningMachine learning is being used for an increasing number of applications with societal impact. In such settings, models must be trusted to be fair, useful, and robust. In many applications, a large amount of training data is collected from a variety of sources, including from private or untrusted individuals. Manually sanitizing datasets is difficult as datasets increase in size, which allows a motivate adversary to adversarially corrupt training data, crafted to impact the model at test time, referred to as a poisoning attack. These poisoning attacks can cause specific users' data to be misclassified, which can be harmful if models are applied to sensitive tasks such as security applications. At the same time, models trained on datasets collected from real people must protect their privacy, preventing unscrupulous onlookers from learning more than they should; in sensitive domains such as personalized medicine, privacy is of utmost importance. In this thesis, we describe integrity and privacy vulnerabilities in these critical settings. We explore the variety of adversarial goals that can be accomplished with poisoning, and how to construct defenses against these attacks (if this is possible at all). Our work will high- light the difficulty of developing generic poisoning defenses; dependence on the adversarial objective appears to be necessary for large enough attacks. Next, we discuss the connection between differential privacy and poisoning attacks, showing that poisoning can be useful for interpreting privacy guarantees, and differential privacy may not serve as a defense from poisoning attacks. Finally, we discuss privacy leakage in the realistic training setting where models are updated repeatedly over time. Our privacy attacks highlight and tackle the cur- rent challenges in deploying private algorithms in real world settings. Overall, this thesis will demonstrate the diversity of types of both poisoning attacks and privacy attacks and the challenges in defending against these attacks and securing machine learning in critical settings.--Author's abstracthttp://hdl.handle.net/2047/D20413920
collection NDLTD
sources NDLTD
description Machine learning is being used for an increasing number of applications with societal impact. In such settings, models must be trusted to be fair, useful, and robust. In many applications, a large amount of training data is collected from a variety of sources, including from private or untrusted individuals. Manually sanitizing datasets is difficult as datasets increase in size, which allows a motivate adversary to adversarially corrupt training data, crafted to impact the model at test time, referred to as a poisoning attack. These poisoning attacks can cause specific users' data to be misclassified, which can be harmful if models are applied to sensitive tasks such as security applications. At the same time, models trained on datasets collected from real people must protect their privacy, preventing unscrupulous onlookers from learning more than they should; in sensitive domains such as personalized medicine, privacy is of utmost importance. In this thesis, we describe integrity and privacy vulnerabilities in these critical settings. We explore the variety of adversarial goals that can be accomplished with poisoning, and how to construct defenses against these attacks (if this is possible at all). Our work will high- light the difficulty of developing generic poisoning defenses; dependence on the adversarial objective appears to be necessary for large enough attacks. Next, we discuss the connection between differential privacy and poisoning attacks, showing that poisoning can be useful for interpreting privacy guarantees, and differential privacy may not serve as a defense from poisoning attacks. Finally, we discuss privacy leakage in the realistic training setting where models are updated repeatedly over time. Our privacy attacks highlight and tackle the cur- rent challenges in deploying private algorithms in real world settings. Overall, this thesis will demonstrate the diversity of types of both poisoning attacks and privacy attacks and the challenges in defending against these attacks and securing machine learning in critical settings.--Author's abstract
title Integrity and privacy in adversarial machine learning
spellingShingle Integrity and privacy in adversarial machine learning
title_short Integrity and privacy in adversarial machine learning
title_full Integrity and privacy in adversarial machine learning
title_fullStr Integrity and privacy in adversarial machine learning
title_full_unstemmed Integrity and privacy in adversarial machine learning
title_sort integrity and privacy in adversarial machine learning
publishDate
url http://hdl.handle.net/2047/D20413920
_version_ 1719460731055767552