A unified framework for temporal difference methods

We propose a unified framework for a broad class of methods to solve projected equations that approximate the solution of a high-dimensional fixed point problem within a subspace S spanned by a small number of basis functions or features. These methods originated in approximate dynamic programming (...

Full description

Bibliographic Details
Main Author: Bertsekas, Dimitri P. (Contributor)
Other Authors: Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor), Massachusetts Institute of Technology. Laboratory for Information and Decision Systems (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers, 2010-10-01T18:17:46Z.
Subjects:
Online Access:Get fulltext
LEADER 01692 am a22002053u 4500
001 58831
042 |a dc 
100 1 0 |a Bertsekas, Dimitri P.  |e author 
100 1 0 |a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science  |e contributor 
100 1 0 |a Massachusetts Institute of Technology. Laboratory for Information and Decision Systems  |e contributor 
100 1 0 |a Bertsekas, Dimitri P.  |e contributor 
100 1 0 |a Bertsekas, Dimitri P.  |e contributor 
245 0 0 |a A unified framework for temporal difference methods 
260 |b Institute of Electrical and Electronics Engineers,   |c 2010-10-01T18:17:46Z. 
856 |z Get fulltext  |u http://hdl.handle.net/1721.1/58831 
520 |a We propose a unified framework for a broad class of methods to solve projected equations that approximate the solution of a high-dimensional fixed point problem within a subspace S spanned by a small number of basis functions or features. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Our framework is based on a connection with projection methods for monotone variational inequalities, which involve alternative representations of the subspace S (feature scaling). Our methods admit simulation-based implementations, and even when specialized to DP problems, include extensions/new versions of the standard TD algorithms, which offer some special implementation advantages and reduced overhead. 
520 |a National Science Foundation (U.S.) (NSF grant ECCS-0801549) 
546 |a en_US 
655 7 |a Article 
773 |t IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning