Time-in-action RL

The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoret...

Full description

Bibliographic Details
Main Authors: Jiangcheng Zhu, Zhepei Wang, Douglas Mcilwraith, Chao Wu, Chao Xu, Yike Guo
Format: Article
Language:English
Published: Wiley 2019-02-01
Series:IET Cyber-systems and Robotics
Subjects:
Online Access:https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001
id doaj-3e310de9eb4c496b8d147c520664550f
record_format Article
spelling doaj-3e310de9eb4c496b8d147c520664550f2021-04-02T09:57:40ZengWileyIET Cyber-systems and Robotics2631-63152019-02-0110.1049/iet-csr.2018.0001IET-CSR.2018.0001Time-in-action RLJiangcheng Zhu0Zhepei Wang1Zhepei Wang2Douglas Mcilwraith3Chao Wu4Chao Xu5Yike Guo6Institute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityData Science Institute, Department of Computing, Imperial College LondonZhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityData Science Institute, Department of Computing, Imperial College LondonThe authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent.https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001learning (artificial intelligence)reinforcement learning frameworkcontrol theoretical formalismexplicit time functionaction valueaction timerl value networkembedded time modeltime-in-action rl
collection DOAJ
language English
format Article
sources DOAJ
author Jiangcheng Zhu
Zhepei Wang
Zhepei Wang
Douglas Mcilwraith
Chao Wu
Chao Xu
Yike Guo
spellingShingle Jiangcheng Zhu
Zhepei Wang
Zhepei Wang
Douglas Mcilwraith
Chao Wu
Chao Xu
Yike Guo
Time-in-action RL
IET Cyber-systems and Robotics
learning (artificial intelligence)
reinforcement learning framework
control theoretical formalism
explicit time function
action value
action time
rl value network
embedded time model
time-in-action rl
author_facet Jiangcheng Zhu
Zhepei Wang
Zhepei Wang
Douglas Mcilwraith
Chao Wu
Chao Xu
Yike Guo
author_sort Jiangcheng Zhu
title Time-in-action RL
title_short Time-in-action RL
title_full Time-in-action RL
title_fullStr Time-in-action RL
title_full_unstemmed Time-in-action RL
title_sort time-in-action rl
publisher Wiley
series IET Cyber-systems and Robotics
issn 2631-6315
publishDate 2019-02-01
description The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent.
topic learning (artificial intelligence)
reinforcement learning framework
control theoretical formalism
explicit time function
action value
action time
rl value network
embedded time model
time-in-action rl
url https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001
work_keys_str_mv AT jiangchengzhu timeinactionrl
AT zhepeiwang timeinactionrl
AT zhepeiwang timeinactionrl
AT douglasmcilwraith timeinactionrl
AT chaowu timeinactionrl
AT chaoxu timeinactionrl
AT yikeguo timeinactionrl
_version_ 1724168282328530944