Q-learning with nearest neighbors

© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is avai...

Full description

Bibliographic Details
Main Authors: Shah, Devavrat (Author), Xie, Qiaomin (Author)
Format: Article
Language:English
Published: 2021-11-09T16:08:56Z.
Subjects:
Online Access:Get fulltext