|
|
|
|
LEADER |
01360 am a22001813u 4500 |
001 |
137064 |
042 |
|
|
|a dc
|
100 |
1 |
0 |
|a Cheung, Wang Chi
|e author
|
700 |
1 |
0 |
|a Simchi-Levi, David
|e author
|
700 |
1 |
0 |
|a Zhu, Ruihao
|e author
|
245 |
0 |
0 |
|a Learning to Optimize Under Non-Stationarity
|
260 |
|
|
|b Elsevier BV,
|c 2021-11-02T12:19:41Z.
|
856 |
|
|
|z Get fulltext
|u https://hdl.handle.net/1721.1/137064
|
520 |
|
|
|a © 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d, BT, and T as the problem dimension, the variation budget, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal Oe(d2/3(BT + 1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best Oe(d2/3(BT + 1)1/4T3/4) dynamic regret.
|
546 |
|
|
|a en
|
655 |
7 |
|
|a Article
|
773 |
|
|
|t 10.2139/ssrn.3261050
|
773 |
|
|
|t AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics
|