Path Planning of Coastal Ships Based on Optimized DQN Reward Function

Path planning is a key issue in the field of coastal ships, and it is also the core foundation of ship intelligent development. In order to better realize the ship path planning in the process of navigation, this paper proposes a coastal ship path planning model based on the optimized deep Q network...

Full description

Bibliographic Details
Main Authors:	Siyu Guo, Xiuguo Zhang, Yiquan Du, Yisong Zheng, Zhiying Cao
Format:	Article
Language:	English
Published:	MDPI AG 2021-02-01
Series:	Journal of Marine Science and Engineering
Subjects:	path planning deep reinforcement learning decision-making obstacle avoidance
Online Access:	https://www.mdpi.com/2077-1312/9/2/210

id	doaj-611d18fd30c343728a4b3312fd8bb7f1
record_format	Article
spelling	doaj-611d18fd30c343728a4b3312fd8bb7f12021-04-02T19:57:30ZengMDPI AGJournal of Marine Science and Engineering2077-13122021-02-01921021010.3390/jmse9020210Path Planning of Coastal Ships Based on Optimized DQN Reward FunctionSiyu Guo0Xiuguo Zhang1Yiquan Du2Yisong Zheng3Zhiying Cao4School of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaSchool of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaSchool of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaSchool of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaSchool of Information Science and Technology, Dalian Maritime University, Dalian 116026, ChinaPath planning is a key issue in the field of coastal ships, and it is also the core foundation of ship intelligent development. In order to better realize the ship path planning in the process of navigation, this paper proposes a coastal ship path planning model based on the optimized deep Q network (DQN) algorithm. The model is mainly composed of environment status information and the DQN algorithm. The environment status information provides training space for the DQN algorithm and is quantified according to the actual navigation environment and international rules for collision avoidance at sea. The DQN algorithm mainly includes four components which are ship state space, action space, action exploration strategy and reward function. The traditional reward function of DQN may lead to the low learning efficiency and convergence speed of the model. This paper optimizes the traditional reward function from three aspects: (a) the potential energy reward of the target point to the ship is set; (b) the reward area is added near the target point; and (c) the danger area is added near the obstacle. Through the above optimized method, the ship can avoid obstacles to reach the target point faster, and the convergence speed of the model is accelerated. The traditional DQN algorithm, A* algorithm, BUG2 algorithm and artificial potential field (APF) algorithm are selected for experimental comparison, and the experimental data are analyzed from the path length, planning time, number of path corners. The experimental results show that the optimized DQN algorithm has better stability and convergence, and greatly reduces the calculation time. It can plan the optimal path in line with the actual navigation rules, and improve the safety, economy and autonomous decision-making ability of ship navigation.https://www.mdpi.com/2077-1312/9/2/210path planningdeep reinforcement learningdecision-makingobstacle avoidance
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Siyu Guo Xiuguo Zhang Yiquan Du Yisong Zheng Zhiying Cao
spellingShingle	Siyu Guo Xiuguo Zhang Yiquan Du Yisong Zheng Zhiying Cao Path Planning of Coastal Ships Based on Optimized DQN Reward Function Journal of Marine Science and Engineering path planning deep reinforcement learning decision-making obstacle avoidance
author_facet	Siyu Guo Xiuguo Zhang Yiquan Du Yisong Zheng Zhiying Cao
author_sort	Siyu Guo
title	Path Planning of Coastal Ships Based on Optimized DQN Reward Function
title_short	Path Planning of Coastal Ships Based on Optimized DQN Reward Function
title_full	Path Planning of Coastal Ships Based on Optimized DQN Reward Function
title_fullStr	Path Planning of Coastal Ships Based on Optimized DQN Reward Function
title_full_unstemmed	Path Planning of Coastal Ships Based on Optimized DQN Reward Function
title_sort	path planning of coastal ships based on optimized dqn reward function
publisher	MDPI AG
series	Journal of Marine Science and Engineering
issn	2077-1312
publishDate	2021-02-01
description	Path planning is a key issue in the field of coastal ships, and it is also the core foundation of ship intelligent development. In order to better realize the ship path planning in the process of navigation, this paper proposes a coastal ship path planning model based on the optimized deep Q network (DQN) algorithm. The model is mainly composed of environment status information and the DQN algorithm. The environment status information provides training space for the DQN algorithm and is quantified according to the actual navigation environment and international rules for collision avoidance at sea. The DQN algorithm mainly includes four components which are ship state space, action space, action exploration strategy and reward function. The traditional reward function of DQN may lead to the low learning efficiency and convergence speed of the model. This paper optimizes the traditional reward function from three aspects: (a) the potential energy reward of the target point to the ship is set; (b) the reward area is added near the target point; and (c) the danger area is added near the obstacle. Through the above optimized method, the ship can avoid obstacles to reach the target point faster, and the convergence speed of the model is accelerated. The traditional DQN algorithm, A* algorithm, BUG2 algorithm and artificial potential field (APF) algorithm are selected for experimental comparison, and the experimental data are analyzed from the path length, planning time, number of path corners. The experimental results show that the optimized DQN algorithm has better stability and convergence, and greatly reduces the calculation time. It can plan the optimal path in line with the actual navigation rules, and improve the safety, economy and autonomous decision-making ability of ship navigation.
topic	path planning deep reinforcement learning decision-making obstacle avoidance
url	https://www.mdpi.com/2077-1312/9/2/210
work_keys_str_mv	AT siyuguo pathplanningofcoastalshipsbasedonoptimizeddqnrewardfunction AT xiuguozhang pathplanningofcoastalshipsbasedonoptimizeddqnrewardfunction AT yiquandu pathplanningofcoastalshipsbasedonoptimizeddqnrewardfunction AT yisongzheng pathplanningofcoastalshipsbasedonoptimizeddqnrewardfunction AT zhiyingcao pathplanningofcoastalshipsbasedonoptimizeddqnrewardfunction
_version_	1721548129800552448

Path Planning of Coastal Ships Based on Optimized DQN Reward Function

Similar Items