Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network

Yao WEI; Zhicheng LIU; Bin CAI; Jiaxin CHEN; Yao YANG; Kai ZHANG

doi:10.1051/jnwpu/20224050970

All issues

Volume 40 / No 5 (October 2022)

JNWPU, 40 5 (2022) 970-979

Abstract

Open Access

Issue		JNWPU Volume 40, Number 5, October 2022


Page(s)		970 - 979
DOI		https://doi.org/10.1051/jnwpu/20224050970
Published online		28 November 2022

JNWPU 2022, 40(5): 970–979

Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network

基于深度循环双Q网络的无人机避障算法研究

Yao WEI (魏瑶)¹, Zhicheng LIU (刘志成)², Bin CAI (蔡彬)³^,4, Jiaxin CHEN (陈家新)³^,4, Yao YANG (杨尧)⁵ and Kai ZHANG (张凯)⁵

¹ School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China
² The Third Military Representative Office of Beijing Military Representative Office of Air Force Equipment Department in Tianjin, Tianjin 300000, China
³ Shanghai Aerospace Control Technology Institute, Shanghai 201109, China
⁴ Infrared Detection Technology R & D Center of China Aerospace Science and Technology Corporation, Shanghai 201109, China
⁵ Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China

Received: 17 December 2021

Abstract

The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.

摘要

针对传统强化学习方法在机器运动规划领域, 尤其是无人机避障问题上存在价值函数过度估计以及部分可观测性导致网络训练过程中训练时间长、难以收敛的问题, 提出一种基于深度循环双Q网络的无人机避障算法。通过将单网络结构变换为双网络结构, 解耦最优动作选择和动作价值估计降低价值函数过度估计; 在双网络模块的全连接层引入GRU循环神经网络模块, 利用GRU处理时间维度信息, 增强真实神经网络的可分析性, 提高算法在部分可观察环境中的性能。在此基础上, 结合强化学习优先经验回放机制加快网络收敛。在仿真环境中分别对原有算法以及改进算法进行测试, 实验结果表明, 该算法在训练时间、避障成功率以及鲁棒性方面均有更好的性能。

Key words: deep reinforcement learning / UAV / obstacle avoidance / recurrent neural network / DDQN

关键字 : 深度强化学习 / 无人机 / 避障 / 循环神经网络 / DDQN

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.