Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not ex...

Full description

Bibliographic Details
Main Authors: Cheung, Wang Chi (Author), Simchi-Levi, David (Author), Zhu, Ruihao (Author)
Format: Article
Language:English
Published: 2021-11-03T17:29:34Z.
Subjects:
Online Access:Get fulltext