Supervised pyramid network based on semantic consistency for object detection

Rui DAI; Pengyue XU; Jie LI; Lihuo HE

doi:10.1051/jnwpu/20244250959

All issues

Volume 42 / No 5 (October 2024)

JNWPU, 42 5 (2024) 959-968

Abstract

Open Access

Issue		JNWPU Volume 42, Number 5, October 2024


Page(s)		959 - 968
DOI		https://doi.org/10.1051/jnwpu/20244250959
Published online		06 December 2024

JNWPU 2024, 42(5): 959–968

Supervised pyramid network based on semantic consistency for object detection

基于语义一致性监督金字塔网络的目标检测方法

Rui DAI (代睿), Pengyue XU (徐鹏越), Jie LI (李洁) and Lihuo HE (何立火)

School of Electronic Engineering, Xidian University, Xi'an 710071, China

Received: 25 September 2023

Abstract

Feature pyramid network is widely used in image understanding tasks based on multi-scale feature learning. The latest multi-scale feature learning focuses on the interactive integration of features in semantic features and detail features. Feature pyramid network complements multi-scale information semantic features and detail features through feature interpolation and summation of adjacent layers. Due to the existence of nonlinear operation and convolution layers with different output dimensions, the relationship among different levels is much more complex, and pixel by pixel summation is suboptimal method. A supervised feature pyramid network based on semantic consistency for object detection is proposed. The present method is composed of asymmetric convolution lateral connection and multi-scale semantic features augmentation. The asymmetric convolution lateral connection improves the generalization of features to various pose objects by learning the feature maps of different receptive fields. The multi-scale semantic features augmentation network improves the detail expression ability of high-level features by supplementing the low-level information for the high-level feature map. Moreover, the present method can provide a better trade-off between accuracy and detection performance. Experiments conduct on the MSCOCO dataset, and the results show that the proposed object detection method's accuracy is improved by 2.6% without increasing extra FLOPs.

摘要

特征金字塔广泛应用于基于多尺度特征学习的图像理解任务中, 最新多尺度特征学习侧重于特征在语义特征和细节特征的交互融合, 特征金字塔通过相邻层特征插值和求和来补充多尺度信息语义特征和细节特征, 由于非线性运算的存在和不同输出维数的卷积层, 不同能级之间关系复杂, 逐像素求和并不是最有效的方法。因此, 提出了基于语义一致性监督金字塔网络的目标检测方法。该网络模型由多语义特征增强模块和非对称卷积侧接模块组成, 其中非对称卷积侧接模块通过学习不同感受野的特征图, 提升特征对各种姿态目标泛化性, 多语义特征增强模块通过为高层特征图补全底层信息, 提升高层特征的细节表达能力, 同时在准确性和检测性能之间实现更好的权衡。在基准测试集MSCOCO上进行的实验结果表明, 所提出的目标检测方法在不增加FLOPs的基础上, 将检测平均精确度提高了2.6%, 显著提高了目标检测的性能。

Key words: object detection / semantic consistency / feature pyramid network

关键字 : 目标检测 / 语义一致性 / 特征金字塔网络

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.