Image Caption Description of Traffic Scene Based on Deep Learning

Shiru Qu; Yuling Xi; Songtao Ding

doi:10.1051/jnwpu/20183630522

Open Access

Issue		JNWPU Volume 36, Number 3, June 2018


Page(s)		522 - 527
DOI		https://doi.org/10.1051/jnwpu/20183630522
Published online		08 October 2018

JNWPU 2018, 36(3):522-527

Image Caption Description of Traffic Scene Based on Deep Learning

基于深度学习的交通场景语义描述

Shiru Qu (曲仕茹), Yuling Xi (席玉玲) and Songtao Ding (丁松涛)

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

Received: 2 April 2017

Abstract

It is a hard issue to describe the complex traffic scene accurately in computer vision. The traffic scene is changeable, which causes image captioning easily interfered by light changes and object occlusion. To solve this problem, we propose an image caption generation model based on attention mechanism. Combining convolutional neural network (CNN) and recurrent neural network (RNN) to generate an end-to-end description for traffic images. To generate a semantic description with distinct degree of discrimination, the attention mechanism is applied to language model. Using Flickr8K、Flickr30K and MS COCO benchmark datasets to validate the effectiveness of our method. The accuracy is promoted maximally by 8.6%, 12.4%, 19.3% and 21.5% in different evaluation metrics. Experiments show that our algorithm has good robustness in four different complex traffic scenarios, such as light change, abnormal weather environment, road marked target and various kinds of transportation tools.

摘要

对复杂交通场景进行准确的语义描述，一直是图像视觉领域的难题。交通场景复杂多变，对图像场景的理解容易受到光线变化、物体遮挡等因素的干扰。针对这一问题，提出了一种基于注意力机制的交通场景语义描述方法。使用卷积神经网络 (CNN) 和循环神经网络（RNN）相结合的方式，产生对交通场景的端对端描述。交通目标种类繁杂，为了产生带有明显区分度的场景描述，在语言模型中引入了注意力机制。为了验证新算法的有效性，分别在Flickr8K、 Flickr30K 和 MS COCO 3 个基准数据库上进行了实验。结果表明，在不同评估方法下，算法准确率分别提升了 8.6%， 12.4%， 19.3% 和 21.5% 。同时，通过定性分析验证了算法在光线变化、异常天气环境、道路显著目标和多种交通工具等4种不同的复杂交通场景下，都具有良好的鲁棒性。

Key words: intelligent transportation / deep learning / neural network / image captioning / attention mechanism / design of experiments / reliability analysis

关键字 : 智能交通 / 深度学习 / 神经网络 / 交通场景语义描述 / 注意力机制

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.