Off-policy reinforcement learning with Gaussian processes
An off-policy Bayesian nonparameteric approximate reinforcement learning framework, termed as GPQ, that employs a Gaussian processes (GP) model of the value (Q) function is presented in both the batch and online settings. Sufficient conditions on GP hyperparameter selection are established to guaran...
Main Authors: | , , , , , |
---|---|
Other Authors: | , |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE),
2015-05-11T19:13:37Z.
|
Subjects: | |
Online Access: | Get fulltext |