Study on learning algorithm of transfer reinforcement for multi-agent formation control

Penglin HU; Quan PAN; Yaning GUO; Chunhui ZHAO

doi:10.1051/jnwpu/20234120389

All issues

Volume 41 / No 2 (April 2023)

JNWPU, 41 2 (2023) 389-399

Abstract

Open Access

Issue		JNWPU Volume 41, Number 2, April 2023


Page(s)		389 - 399
DOI		https://doi.org/10.1051/jnwpu/20234120389
Published online		07 June 2023

JNWPU 2023, 41(2): 389–399

Study on learning algorithm of transfer reinforcement for multi-agent formation control

多智能体编队控制中的迁移强化学习算法研究

Penglin HU (胡鹏林), Quan PAN (潘泉), Yaning GUO (郭亚宁) and Chunhui ZHAO (赵春晖)

School of Automation, Northwestern Polytechnical University, Xi'an 710129, China

Received: 15 June 2022

Abstract

Considering the obstacle avoidance and collision avoidance for multi-agent cooperative formation in multi-obstacle environment, a formation control algorithm based on transfer learning and reinforcement learning is proposed. Firstly, in the source task learning stage, the large storage space required by Q-table solution is avoided by using the value function approximation method, which effectively reduces the storage space requirement and improves the solving speed of the algorithm. Secondly, in the learning phase of the target task, Gaussian clustering algorithm was used to classify the source tasks. According to the distance between the clustering center and the target task, the optimal source task class was selected for target task learning, which effectively avoided the negative transfer phenomenon, and improved the generalization ability and convergence speed of reinforcement learning algorithm. Finally, the simulation results show that this method can effectively form and maintain formation configuration of multi-agent system in complex environment with obstacles, and realize obstacle avoidance and collision avoidance at the same time.

摘要

针对多障碍环境下的多智能体系统协同编队避障与防撞问题, 提出一种迁移学习与强化学习相结合的编队控制算法。在源任务学习阶段, 利用值函数近似方法避免Q-表格求解法所需的大规模存储空间问题, 有效降低对存储空间的需求, 提升算法求解速度; 在目标任务学习阶段, 采用高斯聚类算法对源任务进行分类, 根据聚类中心和目标任务之间的距离, 选择最优的源任务类进行目标任务学习, 有效避免了负迁移现象, 进而提升了强化学习算法的泛化能力及收敛速度。仿真实验结果表明, 所提方法能使多智能体系统在复杂的障碍环境下有效地形成并保持编队构型, 同时实现避障与防撞。

Key words: multi-agent system / transfer reinforcement learning / value function approximation / formation control / Gaussian clustering

关键字 : 多智能体系统 / 迁移强化学习 / 值函数近似 / 编队控制 / 高斯聚类

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.