End-to-end UAV obstacle avoidance decision based on deep reinforcement learning

Yunyan ZHANG; Yao WEI; Hao LIU; Yao YANG

doi:10.1051/jnwpu/20224051055

All issues

Volume 40 / No 5 (October 2022)

JNWPU, 40 5 (2022) 1055-1064

Abstract

Open Access

Issue		JNWPU Volume 40, Number 5, October 2022


Page(s)		1055 - 1064
DOI		https://doi.org/10.1051/jnwpu/20224051055
Published online		28 November 2022

JNWPU 2022, 40(5): 1055–1064

End-to-end UAV obstacle avoidance decision based on deep reinforcement learning

基于深度强化学习的端到端无人机避障决策

Yunyan ZHANG (张云燕)¹, Yao WEI (魏瑶)², Hao LIU (刘昊)² and Yao YANG (杨尧)³

¹ School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
² School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China
³ Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

Received: 14 December 2021

Abstract

Aiming at the problem that the traditional UAV obstacle avoidance algorithm needs to build offline three-dimensional maps, discontinuous speed control and limited speed direction selection, we study the end-to-end obstacle avoidance decision method of UAV continuous action output based on DDPG(deep deterministic policy gradient) deep reinforcement learning algorithm. Firstly, an end-to-end decision control model based on DDPG algorithm is established. The model can output continuous control variables, namely UAV obstacle avoidance actions, according to the continuous state information perceived. Secondly, the training verification is carried out on the platform of UE4 + Airsim. The results show that the model can realize the end-to-end UAV obstacle avoidance decision. Finally, the 3DVFH(three dimensional vector field histogram) obstacle avoidance algorithm model with the same data source is compared and analyzed. The experiment shows that DDPG algorithm has better optimization effect on the obstacle avoidance trajectory of UAV.

摘要

针对传统无人机避障算法需要构建离线三维地图以及速度控制不连续、速度方向选择受限的问题, 基于深度确定性策略梯度(deep deterministic policy gradient, DDPG)的深度强化学习算法, 对无人机连续型动作输出的端到端避障决策方法展开研究。建立了基于DDPG算法的端到端决策控制模型, 该模型可以根据感知得到的连续状态信息输出连续控制变量即无人机避障动作; 在UE4+Airsim的平台下进行了训练验证表明该模型可以实现端到端的无人机避障决策, 与数据来源相同的三维向量场直方图(three dimensional vector field histogram, 3DVFH)避障算法模型进行了对比分析, 实验表明DDPG算法对无人机的避障轨迹有更好的优化效果。

Key words: UAV / obstacle avoidance / deep deterministic policy gradient (DDPG) / reinforcement learning

关键字 : 无人机 / 避障 / DDPG / 强化学习

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.