PAC-optimal, Non-parametric Algorithms and Bounds for Exploration in Concurrent MDPs with Delayed Updates
<p>As the reinforcement learning community has shifted its focus from heuristic methods to methods that have performance guarantees, PAC-optimal exploration algorithms have received significant attention. Unfortunately, the majority of current PAC-optimal exploration algorithms are inapplicabl...
Main Author: | |
---|---|
Other Authors: | |
Published: |
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10161/11334 |