Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
Graphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path q...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9055369/ |
id |
doaj-a4430e285c9848deb0a781043a53ee8e |
---|---|
record_format |
Article |
spelling |
doaj-a4430e285c9848deb0a781043a53ee8e2021-03-30T01:32:10ZengIEEEIEEE Access2169-35362020-01-018639106392210.1109/ACCESS.2020.29852959055369Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy OptimizationSamuel Henrique Silva0https://orcid.org/0000-0003-0368-181XAdel Alaeddini1https://orcid.org/0000-0003-4451-3150Peyman Najafirad2https://orcid.org/0000-0001-9671-577XSecure AI and Autonomy Laboratory, The University of Texas at San Antonio, San Antonio, TX, USADepartment of Information Systems and Cyber Security, The University of Texas at San Antonio, San Antonio, TX, USASecure AI and Autonomy Laboratory, The University of Texas at San Antonio, San Antonio, TX, USAGraphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path query in spatio-temporal graphs which is a fundamental graph problem with numerous applications. In spatio-temporal graphs, shortest path query classical algorithms are insufficient or even flawed because information consistency can not be guaranteed between two timestamps and path recalculation is computationally costly. In this work, we address the complexity and dynamicity of the shortest path query in spatio-temporal graphs with a simple, yet effective model based on Reinforcement Learning with Proximal Policy Optimization. Our solution simplifies the problem by decomposing the spatio-temporal graph in two components: a static and a dynamic sub-graph. The static graph, known and immutable, is efficiently solved with A* algorithm. The sub-graphs interconnecting the static graph have unknown dynamics and we address such issue by estimating the unknown dynamic portion of the graph as a Markov Chain which correlates the observations of the agents in the environment and the path to be followed. We then derive an action policy through Proximal Policy Optimization to select the local optimal actions in the Markov Process that will lead to the shortest path, given the estimated system dynamics. We evaluate the system in a simulation environment constructed in Unity3D. In partially structured and unknown environments, with variable environment parameters we've obtained an efficiency 75% greater than the comparable deterministic solution.https://ieeexplore.ieee.org/document/9055369/Machine learninggraphsMarkov-chaindeep reinforcement learningpath-planning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Samuel Henrique Silva Adel Alaeddini Peyman Najafirad |
spellingShingle |
Samuel Henrique Silva Adel Alaeddini Peyman Najafirad Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization IEEE Access Machine learning graphs Markov-chain deep reinforcement learning path-planning |
author_facet |
Samuel Henrique Silva Adel Alaeddini Peyman Najafirad |
author_sort |
Samuel Henrique Silva |
title |
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization |
title_short |
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization |
title_full |
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization |
title_fullStr |
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization |
title_full_unstemmed |
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization |
title_sort |
temporal graph traversals using reinforcement learning with proximal policy optimization |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Graphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path query in spatio-temporal graphs which is a fundamental graph problem with numerous applications. In spatio-temporal graphs, shortest path query classical algorithms are insufficient or even flawed because information consistency can not be guaranteed between two timestamps and path recalculation is computationally costly. In this work, we address the complexity and dynamicity of the shortest path query in spatio-temporal graphs with a simple, yet effective model based on Reinforcement Learning with Proximal Policy Optimization. Our solution simplifies the problem by decomposing the spatio-temporal graph in two components: a static and a dynamic sub-graph. The static graph, known and immutable, is efficiently solved with A* algorithm. The sub-graphs interconnecting the static graph have unknown dynamics and we address such issue by estimating the unknown dynamic portion of the graph as a Markov Chain which correlates the observations of the agents in the environment and the path to be followed. We then derive an action policy through Proximal Policy Optimization to select the local optimal actions in the Markov Process that will lead to the shortest path, given the estimated system dynamics. We evaluate the system in a simulation environment constructed in Unity3D. In partially structured and unknown environments, with variable environment parameters we've obtained an efficiency 75% greater than the comparable deterministic solution. |
topic |
Machine learning graphs Markov-chain deep reinforcement learning path-planning |
url |
https://ieeexplore.ieee.org/document/9055369/ |
work_keys_str_mv |
AT samuelhenriquesilva temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization AT adelalaeddini temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization AT peymannajafirad temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization |
_version_ |
1724186930022711296 |