Online Combinatorial Optimization under Bandit Feedback
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each...
Main Author: | Talebi Mazraeh Shahi, Mohammad Sadegh |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Reglerteknik
2016
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-181321 http://nbn-resolving.de/urn:isbn:978-91-7595-836-1 |
Similar Items
-
Minimizing Regret in Combinatorial Bandits and Reinforcement Learning
by: Talebi Mazraeh Shahi, Mohammad Sadegh
Published: (2017) -
StreamingBandit: Experimenting with Bandit Policies
by: Jules Kruijswijk, et al.
Published: (2020-08-01) -
Efficient Online Learning with Bandit Feedback
by: Liu, Fang
Published: (2020) -
Multi-armed bandits with unconventional feedback
by: Gajane, Pratik
Published: (2017) -
Structured Stochastic Bandits
by: Magureanu, Stefan
Published: (2016)