A Visual Object Tracking Algorithm Based on Dynamics Pattern and Convolutional Feature

Boyan Zhang; Yong Zhong; Zhendong Li

doi:10.1051/jnwpu/20193761310

All issues

Volume 37 / No 6 (December 2019)

JNWPU, 37 6 (2019) 1310-1319

Abstract

Open Access

Issue		JNWPU Volume 37, Number 6, December 2019


Page(s)		1310 - 1319
DOI		https://doi.org/10.1051/jnwpu/20193761310
Published online		11 February 2020

JNWPU 2019, 37(6): 1310-1319

A Visual Object Tracking Algorithm Based on Dynamics Pattern and Convolutional Feature

基于动态模式和卷积特征的单目标跟踪算法

Boyan Zhang (张博言)¹^,2, Yong Zhong (钟勇)¹^,2 and Zhendong Li (李振东)¹^,2

¹ Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610041, China
² University of Chinese Academy of Sciences, Beijing 100049, China

Received: 24 December 2018

Abstract

Deep visual feature-based method has demonstrated impressive performance in visual tracking attributing to its powerful capability of visual feature representation. However, in some complex environments such as dramatic change of appearance, illumination variation and rotation, the extracted deep visual feature is insufficient for accurately characterizing the target. To solve this problem, we present an integrated tracking framework which combines a Long Short-Term Memory (LSTM) network and a Convolutional Neural Network (CNN). Firstly, the LSTM extracted dynamics feature of target on time sequence, resulting the state of target at present time step. With that state, the accurate preprocessed bounding box was obtained. Then, deep convolutional feature of the target was extracted using a CNN, based on the processed bounding box. Finally, the position of the target was determined based on the score of the feature. During tracking stage, in order to improve the adaptation of the network, the parameters of the network were updated using samples of the target captured while successful tracking. The experiment shows that the proposed method achieves outstanding tracking performance and robustness in cases of partial occlusion, out-of-view, motion blur and fast motion.

摘要

基于深度特征的目标跟踪网络凭借其对目标视觉特征强大的表征能力获得了令人印象深刻的表现。然而，在一些复杂的跟踪场景中常常涉及目标物体快速运动、光线变化、旋转等，仅仅依赖深度视觉特征难以准确地表征目标物体。针对以上问题，提出了一种基于融合特征的视频单目标跟踪网络。该网络结合了2种深度学习模型：卷积神经网络(convolutional neural network, CNN)和长短期记忆网络(long short-term memory, LSTM)。首先，运用长短期记忆网络提取目标基于时间序列的动态特征，产生当前时刻的目标状态，由此获得准确的预处理目标框；然后基于产生的预处理目标框，使用卷积神经网络提取目标的深度卷积特征，确定目标位置；在跟踪过程中，通过采集成功跟踪时目标样本，对网络参数进行短期和长期更新，以增强网络的适应性。对比实验结果表明，所提出的方法在目标运动过程中被部分遮挡、运动模糊、快速运动情况下具有优异的跟踪表现和鲁棒性。

Key words: visual object tracking / convolutional neural network / long short-term memory network

关键字 : 目标跟踪 / 卷积神经网络 / 长短期记忆网络

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.