Bandit algorithms with graphical feedback models and privacy awareness

This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model,...

Full description

Bibliographic Details
Main Author: Hu, Bingshan
Other Authors: Mehta, Nishant A.
Format: Others
Language:English
en
Published: 2021
Online Access:http://hdl.handle.net/1828/13411
id ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-13411
record_format oai_dc
spelling ndltd-uvic.ca-oai-dspace.library.uvic.ca-1828-134112021-09-28T17:29:43Z Bandit algorithms with graphical feedback models and privacy awareness Hu, Bingshan Mehta, Nishant A. This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting. Graduate 2021-09-27T16:50:05Z 2021-09-27T16:50:05Z 2021 2021-09-27 Thesis http://hdl.handle.net/1828/13411 English en Available to the World Wide Web application/pdf
collection NDLTD
language English
en
format Others
sources NDLTD
description This thesis focuses on two classes of learning problems in stochastic multi-armed bandits (MAB): graphical bandits and private bandits. Different from the basic MAB setting where the learning algorithm can only have one observation,for a bandit problem under a graphical feedback model, the learning algorithm may be able to have more than one observation every time it interacts with the environment. Meanwhile, the learning algorithm only needs to suffer a regret resulting from the pulled arm if it is not the optimal one, which is the same as the basic MAB setting. The first theme of this thesis is to derive instance-dependent regret bounds for stochastic bandits under graphical feedback models.In a basic MAB problem, the learning algorithm can always use the learnt in-formation to make future decisions. If each reward vector encodes information of an individual, this kind of non-private learning algorithm may “leak” sensitive information associated with individuals. In an MAB problem with privacy awareness, the learning algorithm cannot rely on the true information learnt to make future decisions in order to comply with privacy. What a private learning algorithm promises is even if an adversary sees the output of the learning algorithm, this adversary almost cannot infer any information associated with a single individual. The second theme of this thesis covers three variants of private online learning: the private bandit setting, the private full information setting, and the private graphical bandit setting. === Graduate
author2 Mehta, Nishant A.
author_facet Mehta, Nishant A.
Hu, Bingshan
author Hu, Bingshan
spellingShingle Hu, Bingshan
Bandit algorithms with graphical feedback models and privacy awareness
author_sort Hu, Bingshan
title Bandit algorithms with graphical feedback models and privacy awareness
title_short Bandit algorithms with graphical feedback models and privacy awareness
title_full Bandit algorithms with graphical feedback models and privacy awareness
title_fullStr Bandit algorithms with graphical feedback models and privacy awareness
title_full_unstemmed Bandit algorithms with graphical feedback models and privacy awareness
title_sort bandit algorithms with graphical feedback models and privacy awareness
publishDate 2021
url http://hdl.handle.net/1828/13411
work_keys_str_mv AT hubingshan banditalgorithmswithgraphicalfeedbackmodelsandprivacyawareness
_version_ 1719486325481013248