Temporal-difference reinforcement learning with distributed representations.

Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the beli...

Full description

Bibliographic Details
Main Authors: Zeb Kurth-Nelson, A David Redish
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-10-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19841749/pdf/?tool=EBI
id doaj-10b71edf81334d619f75d3ba97df1661
record_format Article
spelling doaj-10b71edf81334d619f75d3ba97df16612021-03-03T21:33:21ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-10-01410e736210.1371/journal.pone.0007362Temporal-difference reinforcement learning with distributed representations.Zeb Kurth-NelsonA David RedishTemporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma). Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19841749/pdf/?tool=EBI
collection DOAJ
language English
format Article
sources DOAJ
author Zeb Kurth-Nelson
A David Redish
spellingShingle Zeb Kurth-Nelson
A David Redish
Temporal-difference reinforcement learning with distributed representations.
PLoS ONE
author_facet Zeb Kurth-Nelson
A David Redish
author_sort Zeb Kurth-Nelson
title Temporal-difference reinforcement learning with distributed representations.
title_short Temporal-difference reinforcement learning with distributed representations.
title_full Temporal-difference reinforcement learning with distributed representations.
title_fullStr Temporal-difference reinforcement learning with distributed representations.
title_full_unstemmed Temporal-difference reinforcement learning with distributed representations.
title_sort temporal-difference reinforcement learning with distributed representations.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-10-01
description Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma). Each microAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each microAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.
url https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19841749/pdf/?tool=EBI
work_keys_str_mv AT zebkurthnelson temporaldifferencereinforcementlearningwithdistributedrepresentations
AT adavidredish temporaldifferencereinforcementlearningwithdistributedrepresentations
_version_ 1714816218376962048