Multi energy dynamic soaring trajectory optimization method based on reinforcement learning

Yunfei ZHANG; Honglun WANG; Menghua ZHANG; Yinan GONG

doi:10.1051/jnwpu/20254310128

Open Access

Issue		JNWPU Volume 43, Number 1, February 2025


Page(s)		128 - 139
DOI		https://doi.org/10.1051/jnwpu/20254310128
Published online		18 April 2025

JNWPU 2025, 43(1): 128–139

Multi energy dynamic soaring trajectory optimization method based on reinforcement learning

基于强化学习的多能源动态滑翔航迹优化方法

Yunfei ZHANG (张云飞)¹^,2, Honglun WANG (王宏伦)¹^,2, Menghua ZHANG (张梦华)¹^,2 and Yinan GONG (巩轶男)³

¹ School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
² The Science and Technology on Aircraft Control Laboratory, Beihang University, Beijing 100191, China
³ Hiwing Aviation General Equipment Co., Ltd., Beijing 100074, China

Received: 6 March 2024

Abstract

In addressing the issue of dynamic soaring in unmanned aerial vehicles, a trajectory optimization approach based on deep reinforcement learning is proposed. This method synergistically utilizes gradient wind energy and solar energy and incorporates obstacle constraints to simulate complex barrier environments. It employs neural networks to approximate the Gaussian pseudospectral method for solving trajectory policies. On the foundation of the trained policies, the method utilizes the twin delayed deep deterministic policy gradient algorithm for policy enhancement. This significantly boosts the real-time inference capabilities while addressing the challenges traditional optimal control algorithms face in dynamic soaring due to varying wind fields. The experiments initially validate the approach through simulation of two classic modes of dynamic soaring, followed by Monte Carlo simulations considering multiple energy sources. The results indicate that the dynamic soaring trajectory optimization method based on deep reinforcement learning achieves energy acquisition comparable to optimal outcomes within a single soaring cycle, with a 91% reduction in real-time inference decision time. Moreover, in changing wind field environments, this method demonstrates superior adaptability compared to traditional approaches.

摘要

针对无人机动态滑翔问题, 提出了一种基于深度强化学习的航迹优化方法。该方法综合利用梯度风能和太阳能, 引入了障碍物约束以模拟复杂障碍环境。使用神经网络近似逼近高斯伪谱方法求解航迹的策略, 在训练得到的策略基础上利用双延迟深度确定性策略梯度算法进行策略改进, 在大幅度提升推理实时性的同时解决了传统最优控制算法在动态滑翔领域难以应对变化风场的问题。实验针对动态滑翔2种经典模式进行仿真验证, 之后在考虑多种能量源的情况下进行蒙特卡洛仿真。结果表明, 基于深度强化学习的动态滑翔航迹优化方法在单个滑翔周期内获能与最优结果相近, 而实时推理决策时间减少了91%。在变化风场环境下, 文中方法相较于传统方法具有更强的适应性。

Key words: dynamic soaring / reinforcement learning / Gaussian pseudospectral method / trajectory optimization

关键字 : 动态滑翔 / 强化学习 / 高斯伪谱 / 航迹优化

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.