Pathologies of Temporal Difference Methods in Approximate Dynamic Programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration pr...
Main Author: | |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers,
2011-06-21T19:35:53Z.
|
Subjects: | |
Online Access: | Get fulltext |