A Markov chain Monte Carlo algorithm for Bayesian policy search
Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesi...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Taylor & Francis Group
2018-01-01
|
Series: | Systems Science & Control Engineering |
Subjects: | |
Online Access: | http://dx.doi.org/10.1080/21642583.2018.1528483 |