UAV's air combat decision-making based on deep deterministic policy gradient and prediction

Yongfeng LI; Yongxi LYU; Jingping SHI; Weihua LI

doi:10.1051/jnwpu/20234110056

All issues

Volume 41 / No 1 (February 2023)

JNWPU, 41 1 (2023) 56-64

Abstract

Open Access

Issue		JNWPU Volume 41, Number 1, February 2023


Page(s)		56 - 64
DOI		https://doi.org/10.1051/jnwpu/20234110056
Published online		02 June 2023

JNWPU 2023, 41(1): 56–64

UAV's air combat decision-making based on deep deterministic policy gradient and prediction

深度确定性策略梯度和预测相结合的无人机空战决策研究

Yongfeng LI (李永丰)¹, Yongxi LYU (吕永玺)¹^,2, Jingping SHI (史静平)¹^,2 and Weihua LI (李卫华)¹

¹ School of Automation, Northwestern Polytechnical University, Xi'an 710129, China
² Shaanxi Provincial Key Laboratory of Flight Control and Simulation Technology, Xi'an 710129, China

Received: 25 April 2022

Abstract

To solve the enemy uncertain manipulation problem during a UAV's autonomous air combat maneuver decision-making, this paper proposes an autonomous air combat maneuver decision-making method that combines target maneuver command prediction with the deep deterministic policy algorithm. The situation data of both sides of air combat are effectively fused and processed, the UAV's six-degree-of-freedom model and maneuver library are built. In air combat, the target generates its corresponding maneuver library instructions through the deep Q network algorithm; at the same time, the UAV on our side gives the target maneuver prediction results through the probabilistic neural network. A deep deterministic policy gradient reinforcement learning method that considers both the situation information of two aircraft and the prediction results of enemy aircraft is proposed, so that the UAV can choose the appropriate maneuver decision according to the current air combat situation. The simulation results show that the method can effectively use the air combat situation information and target maneuver prediction information so that it can improve the effectiveness of the reinforcement learning method for UAV's autonomous air combat decision-making on the premise of ensuring convergence.

摘要

针对无人机自主空战机动决策过程中遇到的敌方不确定性操纵问题, 提出了一种目标机动指令预测和深度确定性策略梯度算法相结合的无人机空战自主机动决策方法。对空战双方的态势数据进行有效的融合和处理, 搭建无人机六自由度模型和机动动作库, 在空战中目标通过深度Q网络算法生成相应机动动作库指令, 同时我方无人机通过概率神经网络给出目标机动的预测结果。提出了一种同时考虑了两机态势信息和敌机预测结果的深度确定性策略梯度强化学习方法, 使得无人机能够根据当前空战态势选择合适的机动决策。仿真结果表明, 该算法可以有效利用空战态势信息和目标机动预测信息, 在保证收敛性的前提下提高无人机自主空战决策强化学习算法的有效性。

Key words: UAV / air combat maneuver decision-making / prediction / deep deterministic policy gradient

关键字 : 无人机 / 空战机动决策 / 预测 / 深度确定性策略梯度

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.