Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization

Graphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path q...

Full description

Bibliographic Details
Main Authors: Samuel Henrique Silva, Adel Alaeddini, Peyman Najafirad
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9055369/
id doaj-a4430e285c9848deb0a781043a53ee8e
record_format Article
spelling doaj-a4430e285c9848deb0a781043a53ee8e2021-03-30T01:32:10ZengIEEEIEEE Access2169-35362020-01-018639106392210.1109/ACCESS.2020.29852959055369Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy OptimizationSamuel Henrique Silva0https://orcid.org/0000-0003-0368-181XAdel Alaeddini1https://orcid.org/0000-0003-4451-3150Peyman Najafirad2https://orcid.org/0000-0001-9671-577XSecure AI and Autonomy Laboratory, The University of Texas at San Antonio, San Antonio, TX, USADepartment of Information Systems and Cyber Security, The University of Texas at San Antonio, San Antonio, TX, USASecure AI and Autonomy Laboratory, The University of Texas at San Antonio, San Antonio, TX, USAGraphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path query in spatio-temporal graphs which is a fundamental graph problem with numerous applications. In spatio-temporal graphs, shortest path query classical algorithms are insufficient or even flawed because information consistency can not be guaranteed between two timestamps and path recalculation is computationally costly. In this work, we address the complexity and dynamicity of the shortest path query in spatio-temporal graphs with a simple, yet effective model based on Reinforcement Learning with Proximal Policy Optimization. Our solution simplifies the problem by decomposing the spatio-temporal graph in two components: a static and a dynamic sub-graph. The static graph, known and immutable, is efficiently solved with A* algorithm. The sub-graphs interconnecting the static graph have unknown dynamics and we address such issue by estimating the unknown dynamic portion of the graph as a Markov Chain which correlates the observations of the agents in the environment and the path to be followed. We then derive an action policy through Proximal Policy Optimization to select the local optimal actions in the Markov Process that will lead to the shortest path, given the estimated system dynamics. We evaluate the system in a simulation environment constructed in Unity3D. In partially structured and unknown environments, with variable environment parameters we've obtained an efficiency 75% greater than the comparable deterministic solution.https://ieeexplore.ieee.org/document/9055369/Machine learninggraphsMarkov-chaindeep reinforcement learningpath-planning
collection DOAJ
language English
format Article
sources DOAJ
author Samuel Henrique Silva
Adel Alaeddini
Peyman Najafirad
spellingShingle Samuel Henrique Silva
Adel Alaeddini
Peyman Najafirad
Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
IEEE Access
Machine learning
graphs
Markov-chain
deep reinforcement learning
path-planning
author_facet Samuel Henrique Silva
Adel Alaeddini
Peyman Najafirad
author_sort Samuel Henrique Silva
title Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
title_short Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
title_full Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
title_fullStr Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
title_full_unstemmed Temporal Graph Traversals Using Reinforcement Learning With Proximal Policy Optimization
title_sort temporal graph traversals using reinforcement learning with proximal policy optimization
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Graphs in real-world applications are dynamic both in terms of structures and inputs. Information discovery in such networks, which present dense and deeply connected patterns locally and sparsity globally can be time consuming and computationally costly. In this paper we address the shortest path query in spatio-temporal graphs which is a fundamental graph problem with numerous applications. In spatio-temporal graphs, shortest path query classical algorithms are insufficient or even flawed because information consistency can not be guaranteed between two timestamps and path recalculation is computationally costly. In this work, we address the complexity and dynamicity of the shortest path query in spatio-temporal graphs with a simple, yet effective model based on Reinforcement Learning with Proximal Policy Optimization. Our solution simplifies the problem by decomposing the spatio-temporal graph in two components: a static and a dynamic sub-graph. The static graph, known and immutable, is efficiently solved with A* algorithm. The sub-graphs interconnecting the static graph have unknown dynamics and we address such issue by estimating the unknown dynamic portion of the graph as a Markov Chain which correlates the observations of the agents in the environment and the path to be followed. We then derive an action policy through Proximal Policy Optimization to select the local optimal actions in the Markov Process that will lead to the shortest path, given the estimated system dynamics. We evaluate the system in a simulation environment constructed in Unity3D. In partially structured and unknown environments, with variable environment parameters we've obtained an efficiency 75% greater than the comparable deterministic solution.
topic Machine learning
graphs
Markov-chain
deep reinforcement learning
path-planning
url https://ieeexplore.ieee.org/document/9055369/
work_keys_str_mv AT samuelhenriquesilva temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization
AT adelalaeddini temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization
AT peymannajafirad temporalgraphtraversalsusingreinforcementlearningwithproximalpolicyoptimization
_version_ 1724186930022711296