|
|
|
|
LEADER |
01692 am a22002053u 4500 |
001 |
58831 |
042 |
|
|
|a dc
|
100 |
1 |
0 |
|a Bertsekas, Dimitri P.
|e author
|
100 |
1 |
0 |
|a Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
|e contributor
|
100 |
1 |
0 |
|a Massachusetts Institute of Technology. Laboratory for Information and Decision Systems
|e contributor
|
100 |
1 |
0 |
|a Bertsekas, Dimitri P.
|e contributor
|
100 |
1 |
0 |
|a Bertsekas, Dimitri P.
|e contributor
|
245 |
0 |
0 |
|a A unified framework for temporal difference methods
|
260 |
|
|
|b Institute of Electrical and Electronics Engineers,
|c 2010-10-01T18:17:46Z.
|
856 |
|
|
|z Get fulltext
|u http://hdl.handle.net/1721.1/58831
|
520 |
|
|
|a We propose a unified framework for a broad class of methods to solve projected equations that approximate the solution of a high-dimensional fixed point problem within a subspace S spanned by a small number of basis functions or features. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Our framework is based on a connection with projection methods for monotone variational inequalities, which involve alternative representations of the subspace S (feature scaling). Our methods admit simulation-based implementations, and even when specialized to DP problems, include extensions/new versions of the standard TD algorithms, which offer some special implementation advantages and reduced overhead.
|
520 |
|
|
|a National Science Foundation (U.S.) (NSF grant ECCS-0801549)
|
546 |
|
|
|a en_US
|
655 |
7 |
|
|a Article
|
773 |
|
|
|t IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
|