Generalization strategy design of UAVs pursuit evasion game based on DDPG

Xiaowei FU; Zhe XU; Hui WANG

doi:10.1051/jnwpu/20224010047

All issues

Volume 40 / No 1 (February 2022)

JNWPU, 40 1 (2022) 47-55

Abstract

Open Access

Issue		JNWPU Volume 40, Number 1, February 2022


Page(s)		47 - 55
DOI		https://doi.org/10.1051/jnwpu/20224010047
Published online		02 May 2022

JNWPU 2022, 40(1): 47–55

Generalization strategy design of UAVs pursuit evasion game based on DDPG

基于DDPG的无人机追捕任务泛化策略设计

Xiaowei FU (符小卫) (fxw@nwpu.edu.cn), Zhe XU (徐哲) and Hui WANG (王辉)

School of Electronics and Information, Northwestern Polytechnical University, Xi′an 710129, China

Received: 11 June 2021

Abstract

UAVs pursuit evasion game is a research hotspot in the field of air combat. Traditional solutions have many limitations to this problem, such as the difficulty of the model to adapt to complex dynamic environments to quickly make decisions, and the poor generalization of different mission scenarios. Based on the DDPG(deep deterministic policy gradient) algorithm, a mathematical model of UAVs pursuit and evasion countermeasures is established in this paper. On this basis, this research designs a variety of countermaneuver strategies for escaping UAV, and uses the training method of course learning ideas. In the training process, the intelligence of the escaping UAV is gradually improved, so as to progressively train the confrontation strategy of the chasing UAV. The simulation results show that compared with direct training, the pursuit strategy of the chasing UAV trained by the research method of course learning can converge faster, and can better perform the hunting mission of enemy aircraft, and can be applied to a variety of enemy aircraft with a variety of maneuvering strategies, which effectively improved the generalization of the UAV′s pursuit and escape confrontation decision model.

摘要

无人机追逃对抗问题是当今空战领域的研究热点，传统解决方案对此问题存在诸多限制，如模型难以适应复杂动态环境从而快速做出决策、对不同任务场景泛化性较差等问题。基于DDPG(deep deterministic policy gradient)算法设计了无人机追逃对抗策略；在此基础上，设计多种逃逸无人机的对抗机动策略，利用课程学习思想，在DDPG的训练过程中逐步提高逃逸无人机的智能程度，从而递进式地训练追捕无人机的对抗策略。仿真结果表明，相较于直接进行训练，利用课程学习的方法所训练的追捕无人机的追捕策略能够更快收敛，并能更好地执行对敌机的追捕任务，且能够适用于具有多种对抗机动策略的敌机，有效地提升了无人机追逃对抗决策模型的泛化性。

Key words: UAV / pursuit-evasion game / deep reinforcement learning / DDPG / curriculum learning

关键字 : 无人机 / 追逃对抗 / 深度强化学习 / DDPG / 课程学习

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.