A prediction and correction reentry guidance method based on BP network and deep Q-learning network

Kuan WANG; Xunliang YAN; Bei HONG; Wenjiang NAN; Peichen WANG

doi:10.1051/jnwpu/20254320201

Open Access

Issue		JNWPU Volume 43, Number 2, April 2025


Page(s)		201 - 211
DOI		https://doi.org/10.1051/jnwpu/20254320201
Published online		04 June 2025

JNWPU 2025, 43(2): 201–211

A prediction and correction reentry guidance method based on BP network and deep Q-learning network

基于BP网络和DQN的预测-校正再入制导方法

Kuan WANG (王宽)¹, Xunliang YAN (闫循良)¹, Bei HONG (洪蓓)², Wenjiang NAN (南汶江)¹ and Peichen WANG (王培臣)¹

¹ School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
² Beijing Institute of Astronautical Systems Engineering, Beijing 100076, China

Received: 21 March 2024

Abstract

A prediction and correction reentry guidance method based on the BP network and the deep Q-learning network (DQN) is proposed to address the issues of low computational efficiency and difficulty in the online application of a traditional numerical prediction and correction guidance algorithm. This method adopts the design concept of longitudinal and lateral guidance decoupling. For longitudinal guidance, a residual range prediction BP network is constructed and trained, and the predicted range deviation is used to correct the pitch angle profile parameters. For lateral guidance, firstly, the state and action space needed by the reinforcement learning are constructed to solve re-entry guidance problems. Secondly, the decision points are determined and the reward function that considers comprehensive performance is designed. The reinforcement learning training network is constructed to achieve tilt reversal decision-making through the learning network. Simulations are carried out with the CAV-H reentry gliding as example. The simulation results show that compared with the traditional numerical prediction and correction method, the longitudinal guidance method based on the BP network is significantly superior in terminal accuracy and computational efficiency. Compared with the traditional lateral guidance method based on the heading angle corridor, the lateral guidance method based on the DQN has considerable computational accuracy and fewer reversal times.

摘要

针对传统数值预测-校正制导算法计算效率低、难以在线应用等问题, 提出了一种基于BP网络和深度Q学习网络(DQN)的预测-校正制导方法。该方法采用纵、侧向制导解耦设计思想, 在纵向制导方面, 构建并训练了剩余航程预测BP网络, 利用预测航程偏差校正倾侧角幅值剖面参数; 在侧向制导方面, 针对再入制导问题构建强化学习所需的状态、动作空间; 确定决策点并设计考虑综合性能的奖励函数; 构建强化学习训练网络, 进而通过学习网络实现倾侧反转决策。以CAV-H再入滑翔为例进行仿真, 结果表明: 与传统数值预测-校正方法相比, 所提基于BP网络的纵向制导方法具有相当的终端精度和较高的计算效率; 与传统基于航向角走廊的侧向制导方法相比, 所提基于DQN的侧向制导方法具有相当的计算精度以及更少的反转次数。

Key words: reentry guidance / prediction and correction / BP network / reinforcement learning / deep Q-learning network

关键字 : 再入滑翔制导 / 预测-校正 / BP网络 / 强化学习 / 深度Q学习网络

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.