Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.

Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous...

Full description

Bibliographic Details
Main Authors:	Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, Walter Senn, Wulfram Gerstner
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2009-12-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC2778872?pdf=render

id	doaj-7dda502a87f2451eb9abf5cd43f59674
record_format	Article
spelling	doaj-7dda502a87f2451eb9abf5cd43f596742020-11-25T02:31:46ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582009-12-01512e100058610.1371/journal.pcbi.1000586Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.Eleni VasilakiNicolas FrémauxRobert UrbanczikWalter SennWulfram GerstnerChanges of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties.http://europepmc.org/articles/PMC2778872?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Eleni Vasilaki Nicolas Frémaux Robert Urbanczik Walter Senn Wulfram Gerstner
spellingShingle	Eleni Vasilaki Nicolas Frémaux Robert Urbanczik Walter Senn Wulfram Gerstner Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail. PLoS Computational Biology
author_facet	Eleni Vasilaki Nicolas Frémaux Robert Urbanczik Walter Senn Wulfram Gerstner
author_sort	Eleni Vasilaki
title	Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
title_short	Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
title_full	Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
title_fullStr	Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
title_full_unstemmed	Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
title_sort	spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2009-12-01
description	Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties.
url	http://europepmc.org/articles/PMC2778872?pdf=render
work_keys_str_mv	AT elenivasilaki spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail AT nicolasfremaux spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail AT roberturbanczik spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail AT waltersenn spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail AT wulframgerstner spikebasedreinforcementlearningincontinuousstateandactionspacewhenpolicygradientmethodsfail
_version_	1724822117750407168

Spike-based reinforcement learning in continuous state and action space: when policy gradient methods fail.

Similar Items