Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do not ex...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
2021-11-03T17:29:34Z.
|
Subjects: | |
Online Access: | Get fulltext |