PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates

<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicabl...

Full description

Bibliographic Details
Main Author: Pazis, Jason
Other Authors: Parr, Ronald
Published: 2015
Subjects:
MDP
Online Access:http://hdl.handle.net/10161/11334