Learning to Optimize Under Non-Stationarity

© 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non...

Full description

Bibliographic Details
Main Authors:	Cheung, Wang Chi (Author), Simchi-Levi, David (Author), Zhu, Ruihao (Author)
Format:	Article
Language:	English
Published:	Elsevier BV, 2021-11-02T12:19:41Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01360 am a22001813u 4500
001	137064
042			\|a dc
100	1	0	\|a Cheung, Wang Chi \|e author
700	1	0	\|a Simchi-Levi, David \|e author
700	1	0	\|a Zhu, Ruihao \|e author
245	0	0	\|a Learning to Optimize Under Non-Stationarity
260			\|b Elsevier BV, \|c 2021-11-02T12:19:41Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/137064
520			\|a © 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d, BT, and T as the problem dimension, the variation budget, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal Oe(d2/3(BT + 1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best Oe(d2/3(BT + 1)1/4T3/4) dynamic regret.
546			\|a en
655	7		\|a Article
773			\|t 10.2139/ssrn.3261050
773			\|t AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics

Learning to Optimize Under Non-Stationarity

Similar Items