Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task

Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high versus the long-term goal of sobriety. We use a computational model to identify learning and decis...

Full description

Bibliographic Details
Main Authors: Katia M Harlé, Shunan eZhang, Max eSchiff, Scott eMackey, Martin P Paulus, Angela J Yu
Format: Article
Language:English
Published: Frontiers Media S.A. 2015-12-01
Series:Frontiers in Psychology
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.01910/full
id doaj-51e52be5c35049dd8f26ebeaaa5cc07e
record_format Article
spelling doaj-51e52be5c35049dd8f26ebeaaa5cc07e2020-11-25T00:34:20ZengFrontiers Media S.A.Frontiers in Psychology1664-10782015-12-01610.3389/fpsyg.2015.01910163996Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit taskKatia M Harlé0Shunan eZhang1Max eSchiff2Scott eMackey3Martin P Paulus4Angela J Yu5University of California, San DiegoUniversity of California, San DiegoVanderbilt UniversityUniversity of VermontUniversity of California, San DiegoUniversity of California, San DiegoUnderstanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high versus the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n=16) versus healthy control subjects (HCS, n=16), in a two-armed bandit task.In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI’s tendency to avoid the most rewarding option is associated with a lower grey matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.01910/fullAddictiondecision-makingreward processingBayesian modelmethamphetamine stimulantmulti-armed bandit task
collection DOAJ
language English
format Article
sources DOAJ
author Katia M Harlé
Shunan eZhang
Max eSchiff
Scott eMackey
Martin P Paulus
Angela J Yu
spellingShingle Katia M Harlé
Shunan eZhang
Max eSchiff
Scott eMackey
Martin P Paulus
Angela J Yu
Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
Frontiers in Psychology
Addiction
decision-making
reward processing
Bayesian model
methamphetamine stimulant
multi-armed bandit task
author_facet Katia M Harlé
Shunan eZhang
Max eSchiff
Scott eMackey
Martin P Paulus
Angela J Yu
author_sort Katia M Harlé
title Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
title_short Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
title_full Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
title_fullStr Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
title_full_unstemmed Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task
title_sort altered statistical learning and decision-making in methamphetamine dependence: evidence from a two-armed bandit task
publisher Frontiers Media S.A.
series Frontiers in Psychology
issn 1664-1078
publishDate 2015-12-01
description Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high versus the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n=16) versus healthy control subjects (HCS, n=16), in a two-armed bandit task.In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI’s tendency to avoid the most rewarding option is associated with a lower grey matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.
topic Addiction
decision-making
reward processing
Bayesian model
methamphetamine stimulant
multi-armed bandit task
url http://journal.frontiersin.org/Journal/10.3389/fpsyg.2015.01910/full
work_keys_str_mv AT katiamharle alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
AT shunanezhang alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
AT maxeschiff alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
AT scottemackey alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
AT martinppaulus alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
AT angelajyu alteredstatisticallearninganddecisionmakinginmethamphetaminedependenceevidencefromatwoarmedbandittask
_version_ 1725314045987258368