Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative streng...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2015-01-01
|
Series: | PLoS Computational Biology |
Online Access: | http://europepmc.org/articles/PMC4583549?pdf=render |
id |
doaj-8ac934abb3ae4b62bc89af239dcb9b0d |
---|---|
record_format |
Article |
spelling |
doaj-8ac934abb3ae4b62bc89af239dcb9b0d2020-11-24T21:51:15ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582015-01-01119e100452310.1371/journal.pcbi.1004523Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model.Greg JensenFabian MuñozYelda AlkanVincent P FerreraHerbert S TerraceTransitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models.http://europepmc.org/articles/PMC4583549?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Greg Jensen Fabian Muñoz Yelda Alkan Vincent P Ferrera Herbert S Terrace |
spellingShingle |
Greg Jensen Fabian Muñoz Yelda Alkan Vincent P Ferrera Herbert S Terrace Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. PLoS Computational Biology |
author_facet |
Greg Jensen Fabian Muñoz Yelda Alkan Vincent P Ferrera Herbert S Terrace |
author_sort |
Greg Jensen |
title |
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. |
title_short |
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. |
title_full |
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. |
title_fullStr |
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. |
title_full_unstemmed |
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. |
title_sort |
implicit value updating explains transitive inference performance: the betasort model. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2015-01-01 |
description |
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species. Despite these robust behavioral effects, reinforcement learning models reliant on reward prediction error or associative strength routinely fail to perform these inferences. We propose an algorithm called betasort, inspired by cognitive processes, which performs transitive inference at low computational cost. This is accomplished by (1) representing stimulus positions along a unit span using beta distributions, (2) treating positive and negative feedback asymmetrically, and (3) updating the position of every stimulus during every trial, whether that stimulus was visible or not. Performance was compared for rhesus macaques, humans, and the betasort algorithm, as well as Q-learning, an established reward-prediction error (RPE) model. Of these, only Q-learning failed to respond above chance during critical test trials. Betasort's success (when compared to RPE models) and its computational efficiency (when compared to full Markov decision process implementations) suggests that the study of reinforcement learning in organisms will be best served by a feature-driven approach to comparing formal models. |
url |
http://europepmc.org/articles/PMC4583549?pdf=render |
work_keys_str_mv |
AT gregjensen implicitvalueupdatingexplainstransitiveinferenceperformancethebetasortmodel AT fabianmunoz implicitvalueupdatingexplainstransitiveinferenceperformancethebetasortmodel AT yeldaalkan implicitvalueupdatingexplainstransitiveinferenceperformancethebetasortmodel AT vincentpferrera implicitvalueupdatingexplainstransitiveinferenceperformancethebetasortmodel AT herbertsterrace implicitvalueupdatingexplainstransitiveinferenceperformancethebetasortmodel |
_version_ |
1725879651322036224 |