Universal Reinforcement Learning

We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the ac...

Full description

Bibliographic Details
Main Authors:	Farias, Vivek F. (Contributor), Moallemi, Ciamac C. (Author), Van Roy, Benjamin (Author), Weissman, Tsachy (Author)
Other Authors:	Sloan School of Management (Contributor)
Format:	Article
Language:	English
Published:	Institute of Electrical and Electronics Engineers, 2010-10-13T19:43:17Z.
Subjects:	Article
Online Access:	Get fulltext

Description
Summary:	We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer K such that the future is conditionally independent of the past given a window of K consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm. National Science Foundation (U.S.) (MKIDS Program grant ECS-9985229) Benchmark Stanford Graduate Fellowship

Universal Reinforcement Learning

Similar Items