Allocation method of communication interference resource based on deep reinforcement learning of maximum policy entropy

Ning RAO; Hua XU; Zisen QI; Bailin SONG; Yunhao SHI

doi:10.1051/jnwpu/20213951077

All issues

Volume 39 / No 5 (October 2021)

JNWPU, 39 5 (2021) 1077-1086

Abstract

Open Access

Issue		JNWPU Volume 39, Number 5, October 2021


Page(s)		1077 - 1086
DOI		https://doi.org/10.1051/jnwpu/20213951077
Published online		14 December 2021

JNWPU 2021, 39(5): 1077–1086

Allocation method of communication interference resource based on deep reinforcement learning of maximum policy entropy

基于最大策略熵深度强化学习的通信干扰资源分配方法

Ning RAO (饶宁), Hua XU (许华), Zisen QI (齐子森), Bailin SONG (宋佰霖) and Yunhao SHI (史蕴豪)

College of Information and Navigation, Air Force Engineering University, Xi'an 710077, China

Received: 20 January 2021

Abstract

In order to solve the optimization of the interference resource allocation in communication network countermeasures, an interference resource allocation method based on the maximum policy entropy deep reinforcement learning (MPEDRL) was proposed. The method introduced the idea of deep reinforcement learning into the communication countermeasures resource allocation, it could enhance the exploration of the policy and accelerate the convergence to the global optimum with adding the maximum policy entropy criterion and adaptively adjusting the entropy coefficient. The method modeled interference resource allocation as Markov decision process, then established the interference strategy network to output allocation scheme, constructing the interference effect evaluation network of the clipped twin structure for efficiency evaluation, and trained the policy network and the evaluation network with the goal of maximizing the strategy entropy and the cumulative interference efficacy, then decided the optimal interference resource allocation scheme. The simulation results show that the algorithm can effectively solve the resource allocation problem in communication network confrontation, comparing with the existing deep reinforcement learning methods, it has faster learning speed and less fluctuation in the training process, and achieved 15% higher jamming efficacy than DDPG-based method.

摘要

针对通信组网对抗中干扰资源分配的优化问题，提出了一种基于最大策略熵深度强化学习（MPEDRL）的干扰资源分配方法。该方法将深度强化学习思想引入到通信对抗干扰资源分配领域，并通过加入最大策略熵准则且自适应调整熵系数，以增强策略探索性加速收敛至全局最优。该方法将干扰资源分配建模为马尔可夫决策过程，通过建立干扰策略网络输出分配方案，构建剪枝孪生结构的干扰效果评估网络完成方案效能评估，以策略熵最大化和累积干扰效能最大化为目标训练策略网络和评估网络，决策干扰资源最优分配方案。仿真结果表明，所提出的方法能有效解决组网对抗中的干扰资源分配问题，且相比于已有的深度强化学习方法具有学习速度更快，训练过程波动性更小等优点，干扰效能高出DDPG方法15%。

Key words: interference resource allocation / deep reinforcement learning / maximum policy entropy / deep neural network

关键字 : 干扰资源分配 / 深度强化学习 / 最大策略熵 / 神经网络

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.