Design of Deep Learning VLIW Processor for Image Recognition

Lin Li; Shengbing Zhang; Juan Wu

doi:10.1051/jnwpu/20203810216

All issues

Volume 38 / No 1 (February 2020)

JNWPU, 38 1 (2020) 216-224

Abstract

Open Access

Issue		JNWPU Volume 38, Number 1, February 2020


Page(s)		216 - 224
DOI		https://doi.org/10.1051/jnwpu/20203810216
Published online		12 May 2020

JNWPU 2020, 38(1): 216-224

Design of Deep Learning VLIW Processor for Image Recognition

面向图像识别的深度学习VLIW处理器设计

Lin Li (李林)¹^,2, Shengbing Zhang (张盛兵)¹ and Juan Wu (吴鹃)³

¹ School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
² Fourth Design Department, Beijing Institute of Micoelectronics Technology, Beijing 100076, China
³ School of Animation and Software, Xi'an Vocational and Technical College, Xi'an 710077, China

Received: 8 January 2019

Abstract

In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing researches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model. Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word (VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design. The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications. The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz. For selecting benchmarks, the processor speed is about 12X faster than CPU and 7X faster than GPU at least. Comparing with the results of the software framework, the average error of the test accuracy of the processor is less than 1%.

摘要

为了适应航空航天领域高分辨率图像识别和本地化高效处理的需求，解决现有研究中计算并行性不足的问题，在对深度卷积神经网络模型各层计算优化的基础上，设计了一款可扩展的多处理器簇的深度学习超长指令字（VLIW）处理器体系结构。设计中采用了特征图和神经元的并行处理，基于VLIW的指令级并行，多处理器簇的数据级并行以及流水线技术。FPGA原型系统测试结果表明，该处理器可有效完成图像分类和目标检测应用；当工作频率为200 MHz时，处理器的峰值性能可以达到128 GOP/s；针对选取的测试基准，该处理器的计算速度至少是CPU的12倍，是GPU的7倍；对比软件框架运行结果，处理器的测试精度的平均误差不超过1%。

Key words: image recognition / deep learning / convolutional neural networks / very long instruction word(VLIW) / processor / extensible

关键字 : 图像识别 / 深度学习 / 卷积神经网络 / 超长指令字(VLIW) / 处理器 / 可扩展

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.