A Markov chain Monte Carlo algorithm for Bayesian policy search

Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesi...

Full description

Bibliographic Details
Main Authors: Vahid Tavakol Aghaei, Ahmet Onat, Sinan Yıldırım
Format: Article
Language:English
Published: Taylor & Francis Group 2018-01-01
Series:Systems Science & Control Engineering
Subjects:
Online Access:http://dx.doi.org/10.1080/21642583.2018.1528483