Exploring non-zero position constraints: algorithm-hardware co-designed DNN sparse training method

Miao WANG; Shengbing ZHANG; Meng ZHANG

doi:10.1051/jnwpu/20254310119

All issues

Volume 43 / No 1 (February 2025)

JNWPU, 43 1 (2025) 119-127

Abstract

Open Access

Issue		JNWPU Volume 43, Number 1, February 2025


Page(s)		119 - 127
DOI		https://doi.org/10.1051/jnwpu/20254310119
Published online		18 April 2025

JNWPU 2025, 43(1): 119–127

Exploring non-zero position constraints: algorithm-hardware co-designed DNN sparse training method

探索非零位置约束:算法-硬件协同设计的DNN稀疏训练方法

Miao WANG (王淼), Shengbing ZHANG (张盛兵) and Meng ZHANG (张萌)

School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China

Received: 9 October 2024

Abstract

On-device learning enables edge devices to continuously adapt to new data for AI applications. Leveraging sparsity to eliminate redundant computation and storage usage during training is a key approach to improving the learning efficiency of edge deep neural network(DNN). However, due to the lack of assumptions about non-zero positions, expensive runtime identification and allocation of zero positions and load balancing of irregular computations are often required, making it difficult for existing sparse training works to approach the ideal speedup. This paper points out that if the non-zero position constraints of operands during training can be predicted in advance, these processing overheads can be skipped to improve sparse training energy efficiency. Therefore, this paper explores the position constraint rules between operands for three typical activation functions in edge scenarios during sparse training. And based on these rules, this paper proposed a parev hardware-friendly sparse training algorithm to reduce the computation and storage pressure of the three phases, and an energy-efficient sparse training accelerator that can be executed in parallel with the forward propagation computation to estimate the non-zero positions so that the runtime processing cost is masked. Experiments show that the proposed method is 2.2 times, 1.38 times and 1.46 times more energy efficient than dense accelerator and two other sparse training tasks respectively.

摘要

设备上的学习使得边缘设备能连续适应人工智能应用的新数据。利用稀疏性消除训练过程中的冗余计算和存储占用是提高边缘深度神经网络(deep neural network, DNN)学习效率的关键途径。然而由于缺乏对非零位置的假设, 往往需要昂贵的代价用于实时地识别和分配零的位置以及对不规则计算的负载均衡, 这使得现有稀疏训练工作难以接近理想加速比。如果能提前预知训练过程中操作数的非零位置约束规则, 就可以跳过这些处理开销, 从而提升稀疏训练性能和能效比。针对稀疏训练过程, 面向边缘场景中典型的3类激活函数探索操作数之间的位置约束规则, 提出: ①一个硬件友好的稀疏训练算法以减少3个阶段的计算量和存储压力; ②一个高能效的稀疏训练加速器, 能预估非零位置使得实时处理代价被并行执行掩盖。实验表明所提出的方法比密集加速器和2个其他稀疏训练工作的能效比分别提升了2.2倍, 1.38倍和1.46倍。

Key words: sparse training / non-zero position constraint / DNN / sparse accelerator

关键字 : 稀疏训练 / 非零位置约束 / DNN / 稀疏加速器

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.