Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.

Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet...

Full description

Bibliographic Details
Main Authors:	Jaron T Colas, Wolfgang M Pauli, Tobias Larsen, J Michael Tyszka, John P O'Doherty
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2017-10-01
Series:	PLoS Computational Biology
Online Access:	http://europepmc.org/articles/PMC5673235?pdf=render

id	doaj-c79a214eedd6463696c84593f305d05a
record_format	Article
spelling	doaj-c79a214eedd6463696c84593f305d05a2020-11-25T01:11:55ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582017-10-011310e100581010.1371/journal.pcbi.1005810Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.Jaron T ColasWolfgang M PauliTobias LarsenJ Michael TyszkaJohn P O'DohertyPrediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.http://europepmc.org/articles/PMC5673235?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jaron T Colas Wolfgang M Pauli Tobias Larsen J Michael Tyszka John P O'Doherty
spellingShingle	Jaron T Colas Wolfgang M Pauli Tobias Larsen J Michael Tyszka John P O'Doherty Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI. PLoS Computational Biology
author_facet	Jaron T Colas Wolfgang M Pauli Tobias Larsen J Michael Tyszka John P O'Doherty
author_sort	Jaron T Colas
title	Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.
title_short	Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.
title_full	Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.
title_fullStr	Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.
title_full_unstemmed	Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.
title_sort	distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fmri.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2017-10-01
description	Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.
url	http://europepmc.org/articles/PMC5673235?pdf=render
work_keys_str_mv	AT jarontcolas distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri AT wolfgangmpauli distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri AT tobiaslarsen distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri AT jmichaeltyszka distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri AT johnpodoherty distinctpredictionerrorsinmesostriatalcircuitsofthehumanbrainmediatelearningaboutthevaluesofbothstatesandactionsevidencefromhighresolutionfmri
_version_	1725168853603844096

Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.

Similar Items