Time-in-action RL
The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoret...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2019-02-01
|
Series: | IET Cyber-systems and Robotics |
Subjects: | |
Online Access: | https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001 |
id |
doaj-3e310de9eb4c496b8d147c520664550f |
---|---|
record_format |
Article |
spelling |
doaj-3e310de9eb4c496b8d147c520664550f2021-04-02T09:57:40ZengWileyIET Cyber-systems and Robotics2631-63152019-02-0110.1049/iet-csr.2018.0001IET-CSR.2018.0001Time-in-action RLJiangcheng Zhu0Zhepei Wang1Zhepei Wang2Douglas Mcilwraith3Chao Wu4Chao Xu5Yike Guo6Institute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityData Science Institute, Department of Computing, Imperial College LondonZhejiang UniversityInstitute of Cyber-Systems and Control, Department of Control Science and Engineering, Zhejiang UniversityData Science Institute, Department of Computing, Imperial College LondonThe authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent.https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001learning (artificial intelligence)reinforcement learning frameworkcontrol theoretical formalismexplicit time functionaction valueaction timerl value networkembedded time modeltime-in-action rl |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jiangcheng Zhu Zhepei Wang Zhepei Wang Douglas Mcilwraith Chao Wu Chao Xu Yike Guo |
spellingShingle |
Jiangcheng Zhu Zhepei Wang Zhepei Wang Douglas Mcilwraith Chao Wu Chao Xu Yike Guo Time-in-action RL IET Cyber-systems and Robotics learning (artificial intelligence) reinforcement learning framework control theoretical formalism explicit time function action value action time rl value network embedded time model time-in-action rl |
author_facet |
Jiangcheng Zhu Zhepei Wang Zhepei Wang Douglas Mcilwraith Chao Wu Chao Xu Yike Guo |
author_sort |
Jiangcheng Zhu |
title |
Time-in-action RL |
title_short |
Time-in-action RL |
title_full |
Time-in-action RL |
title_fullStr |
Time-in-action RL |
title_full_unstemmed |
Time-in-action RL |
title_sort |
time-in-action rl |
publisher |
Wiley |
series |
IET Cyber-systems and Robotics |
issn |
2631-6315 |
publishDate |
2019-02-01 |
description |
The authors propose a novel reinforcement learning (RL) framework, where agent behaviour is governed by traditional control theory. This integrated approach, called time-in-action RL, enables RL to be applicable to many real-world systems, where underlying dynamics are known in their control theoretical formalism. The key insight to facilitate this integration is to model the explicit time function, mapping the state-action pair to the time accomplishing the action by its underlying controller. In their framework, they describe an action by its value (action value), and the time that it takes to perform (action time). An action-value results from the policy of RL regarding a state. Action time is estimated by an explicit time model learnt from the measured activities of the underlying controller. RL value network is then trained with embedded time model to predict action time. This approach is tested using a variant of Atari Pong and proved to be convergent. |
topic |
learning (artificial intelligence) reinforcement learning framework control theoretical formalism explicit time function action value action time rl value network embedded time model time-in-action rl |
url |
https://digital-library.theiet.org/content/journals/10.1049/iet-csr.2018.0001 |
work_keys_str_mv |
AT jiangchengzhu timeinactionrl AT zhepeiwang timeinactionrl AT zhepeiwang timeinactionrl AT douglasmcilwraith timeinactionrl AT chaowu timeinactionrl AT chaoxu timeinactionrl AT yikeguo timeinactionrl |
_version_ |
1724168282328530944 |