Summary: | Machine learning is being used for an increasing number of applications with societal impact. In such settings, models must be trusted to be fair, useful, and robust. In many applications, a large amount of training data is collected from a variety of sources, including from private or untrusted individuals. Manually sanitizing datasets is difficult as datasets increase in size, which allows a motivate adversary to adversarially corrupt training data, crafted to impact the model at test time, referred to as a poisoning attack. These poisoning attacks can cause specific users' data to be misclassified, which can be harmful if models are applied to sensitive tasks such as security applications. At the same time, models trained on datasets collected from real people must protect their privacy, preventing unscrupulous onlookers from learning more than they should; in sensitive domains such as personalized medicine, privacy is of utmost importance. In this thesis, we describe integrity and privacy vulnerabilities in these critical settings. We explore the variety of adversarial goals that can be accomplished with poisoning, and how to construct defenses against these attacks (if this is possible at all). Our work will high- light the difficulty of developing generic poisoning defenses; dependence on the adversarial objective appears to be necessary for large enough attacks. Next, we discuss the connection between differential privacy and poisoning attacks, showing that poisoning can be useful for interpreting privacy guarantees, and differential privacy may not serve as a defense from poisoning attacks. Finally, we discuss privacy leakage in the realistic training setting where models are updated repeatedly over time. Our privacy attacks highlight and tackle the cur- rent challenges in deploying private algorithms in real world settings. Overall, this thesis will demonstrate the diversity of types of both poisoning attacks and privacy attacks and the challenges in defending against these attacks and securing machine learning in critical settings.--Author's abstract
|