Q-learning with nearest neighbors
© 2018 Curran Associates Inc.All rights reserved. We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is avai...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
2021-11-09T16:08:56Z.
|
Subjects: | |
Online Access: | Get fulltext |