Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation

In cognitive electronic warfare, when a typical combat vehicle, such as an unmanned combat air vehicle (UCAV), uses radar sensors to explore an unknown space, the target-searching fails due to an inefficient servoing/tracking system. Thus, to solve this problem, we developed an autonomous reasoning...

Full description

Bibliographic Details
Main Authors:	Shixun You, Ming Diao, Lipeng Gao
Format:	Article
Language:	English
Published:	MDPI AG 2019-05-01
Series:	Electronics
Subjects:	target-searching cognitive electronic warfare deep reinforcement learning continuous control-based navigation optimization behavior angle
Online Access:	https://www.mdpi.com/2079-9292/8/5/576

id	doaj-85f2206bb0374d59b0b9660bc87844da
record_format	Article
spelling	doaj-85f2206bb0374d59b0b9660bc87844da2020-11-24T21:32:33ZengMDPI AGElectronics2079-92922019-05-018557610.3390/electronics8050576electronics8050576Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle NavigationShixun You0Ming Diao1Lipeng Gao2College of Information and Communication, Harbin Engineering University, Harbin 150001, ChinaCollege of Information and Communication, Harbin Engineering University, Harbin 150001, ChinaCollege of Information and Communication, Harbin Engineering University, Harbin 150001, ChinaIn cognitive electronic warfare, when a typical combat vehicle, such as an unmanned combat air vehicle (UCAV), uses radar sensors to explore an unknown space, the target-searching fails due to an inefficient servoing/tracking system. Thus, to solve this problem, we developed an autonomous reasoning search method that can generate efficient decision-making actions and guide the UCAV as early as possible to the target area. For high-dimensional continuous action space, the UCAV’s maneuvering strategies are subject to certain physical constraints. We first record the path histories of the UCAV as a sample set of supervised experiments and then construct a grid cell network using long short-term memory (LSTM) to generate a new displacement prediction to replace the target location estimation. Finally, we enable a variety of continuous-control-based deep reinforcement learning algorithms to output optimal/sub-optimal decision-making actions. All these tasks are performed in a three-dimensional target-searching simulator, i.e., the Explorer game. Please note that we use the behavior angle (BHA) for the first time as the main factor of the reward-shaping of the deep reinforcement learning framework and successfully make the trained UCAV achieve a 99.96% target destruction rate, i.e., the game win rate, in a 0.1 s operating cycle.https://www.mdpi.com/2079-9292/8/5/576target-searchingcognitive electronic warfaredeep reinforcement learningcontinuous control-based navigation optimizationbehavior angle
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shixun You Ming Diao Lipeng Gao
spellingShingle	Shixun You Ming Diao Lipeng Gao Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation Electronics target-searching cognitive electronic warfare deep reinforcement learning continuous control-based navigation optimization behavior angle
author_facet	Shixun You Ming Diao Lipeng Gao
author_sort	Shixun You
title	Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation
title_short	Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation
title_full	Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation
title_fullStr	Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation
title_full_unstemmed	Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation
title_sort	completing explorer games with a deep reinforcement learning framework based on behavior angle navigation
publisher	MDPI AG
series	Electronics
issn	2079-9292
publishDate	2019-05-01
description	In cognitive electronic warfare, when a typical combat vehicle, such as an unmanned combat air vehicle (UCAV), uses radar sensors to explore an unknown space, the target-searching fails due to an inefficient servoing/tracking system. Thus, to solve this problem, we developed an autonomous reasoning search method that can generate efficient decision-making actions and guide the UCAV as early as possible to the target area. For high-dimensional continuous action space, the UCAV’s maneuvering strategies are subject to certain physical constraints. We first record the path histories of the UCAV as a sample set of supervised experiments and then construct a grid cell network using long short-term memory (LSTM) to generate a new displacement prediction to replace the target location estimation. Finally, we enable a variety of continuous-control-based deep reinforcement learning algorithms to output optimal/sub-optimal decision-making actions. All these tasks are performed in a three-dimensional target-searching simulator, i.e., the Explorer game. Please note that we use the behavior angle (BHA) for the first time as the main factor of the reward-shaping of the deep reinforcement learning framework and successfully make the trained UCAV achieve a 99.96% target destruction rate, i.e., the game win rate, in a 0.1 s operating cycle.
topic	target-searching cognitive electronic warfare deep reinforcement learning continuous control-based navigation optimization behavior angle
url	https://www.mdpi.com/2079-9292/8/5/576
work_keys_str_mv	AT shixunyou completingexplorergameswithadeepreinforcementlearningframeworkbasedonbehavioranglenavigation AT mingdiao completingexplorergameswithadeepreinforcementlearningframeworkbasedonbehavioranglenavigation AT lipenggao completingexplorergameswithadeepreinforcementlearningframeworkbasedonbehavioranglenavigation
_version_	1725957016664408064

Completing Explorer Games with a Deep Reinforcement Learning Framework Based on Behavior Angle Navigation

Similar Items