Research on game strategy of spacecraft chase and escape based on adaptive augmented random search

Jie JIAO; Yongjie GOU; Wenbo WU; Binfeng PAN

doi:10.1051/jnwpu/20244210117

All issues

Volume 42 / No 1 (February 2024)

JNWPU, 42 1 (2024) 117-128

Abstract

Open Access

Issue		JNWPU Volume 42, Number 1, February 2024


Page(s)		117 - 128
DOI		https://doi.org/10.1051/jnwpu/20244210117
Published online		29 March 2024

JNWPU 2024, 42(1): 117–128

Research on game strategy of spacecraft chase and escape based on adaptive augmented random search

基于自适应增强随机搜索的航天器追逃博弈策略研究

Jie JIAO (焦杰)¹^,2, Yongjie GOU (苟永杰)³, Wenbo WU (吴文博)¹^,2 and Binfeng PAN (泮斌峰)¹^,2

¹ School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China
² National Key Laboratory of Aerospace Flight Dynamics, Xi'an 710072, China
³ Shanghai Aerospace Systems Engineering Institute, Shanghai 201108, China

Received: 29 December 2022

Abstract

To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.

摘要

针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题, 基于强化学习研究了追逃博弈策略, 提出了自适应增强随机搜索(adaptive-augmented random search, A-ARS)算法。针对序贯决策的稀疏奖励难题, 设计了基于策略参数空间扰动的探索方法, 加快策略收敛速度; 针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新, 可提升数据利用效率; 通过数值仿真验证并与增强随机搜索(augmented random search, ARS)、近端策略优化算法(proximal policy optimization, PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient, DDPG)进行对比, 验证了此方法的有效性和先进性。

Key words: non-cooperative target / pursuit game / differential game theory / reinforcement learning / sparse reward

关键字 : 非合作目标 / 追逃博弈 / 微分对策 / 强化学习 / 稀疏奖励

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.