Bandit algorithms with graphical feedback models and privacy awareness

This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model,...

Full description

Bibliographic Details
Main Author:	Hu, Bingshan
Other Authors:	Mehta, Nishant A.
Format:	Others
Language:	English en
Published:	2021
Online Access:	http://hdl.handle.net/1828/13411

id	ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-13411
record_format	oai_dc
spelling	ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-134112021-09-28T17:29:43Z Bandit algorithms with graphical feedback models and privacy awareness Hu, Bingshan Mehta, Nishant A. This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting. Graduate 2021-09-27T16:50:05Z 2021-09-27T16:50:05Z 2021 2021-09-27 Thesis http://hdl.handle.net/1828/13411 English en Available to the World Wide Web application/pdf
collection	NDLTD
language	English en
format	Others
sources	NDLTD
description	This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting. === Graduate
author2	Mehta, Nishant A.
author_facet	Mehta, Nishant A. Hu, Bingshan
author	Hu, Bingshan
spellingShingle	Hu, Bingshan Bandit algorithms with graphical feedback models and privacy awareness
author_sort	Hu, Bingshan
title	Bandit algorithms with graphical feedback models and privacy awareness
title_short	Bandit algorithms with graphical feedback models and privacy awareness
title_full	Bandit algorithms with graphical feedback models and privacy awareness
title_fullStr	Bandit algorithms with graphical feedback models and privacy awareness
title_full_unstemmed	Bandit algorithms with graphical feedback models and privacy awareness
title_sort	bandit algorithms with graphical feedback models and privacy awareness
publishDate	2021
url	http://hdl.handle.net/1828/13411
work_keys_str_mv	AT hubingshan banditalgorithmswithgraphicalfeedbackmodelsandprivacyawareness
_version_	1719486325481013248

Bandit algorithms with graphical feedback models and privacy awareness

Similar Items