Environment Sound Classification System Based on Hybrid Feature and Convolutional Neural Network

Ke Zhang; Yu Su; Jingyu Wang; Sanyu Wang; Yanhua Zhang

doi:10.1051/jnwpu/20203810162

All issues

Volume 38 / No 1 (February 2020)

JNWPU, 38 1 (2020) 162-169

Abstract

Open Access

Issue		JNWPU Volume 38, Number 1, February 2020


Page(s)		162 - 169
DOI		https://doi.org/10.1051/jnwpu/20203810162
Published online		12 May 2020

JNWPU 2020, 38(1): 162-169

Environment Sound Classification System Based on Hybrid Feature and Convolutional Neural Network

基于融合特征以及卷积神经网络的环境声音分类系统研究

Ke Zhang (张科)¹^,2, Yu Su (苏雨)¹^,2^,3, Jingyu Wang (王靖宇)¹^,2, Sanyu Wang (王霰宇)¹^,2 and Yanhua Zhang (张彦华)¹^,2

¹ National Key Laboratory of Aerospace Flight Dynamics, Xi'an 710072, China
² School of Astronautics, Northwestern Polytecnical University, Xi'an 710072, China
³ Signals, Images, and Intelligent Systems Laboratory(LISSI/EA 3956), University Paris-Est Creteil, Senart-FB Institute of Technology, 36-37 rue Charpak, 77127 Lieusaint, France

Received: 16 January 2019

Abstract

At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters:the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.

摘要

环境声音识别系统主要基于深度神经网络以及种类繁多的听觉特征对环境声音进行分类识别。分析基于深度神经网络的环境分类任务中，哪种听觉特征更适合环境声音识别系统十分必要。选择了基于2个广泛使用的滤波器：梅尔和Gammatone滤波器组提取的3种声音特征。随后，提出了一个MFCC和GFCC融合的特征MGCC。最后采用文中提出的深度卷积神经网络来验证哪种特征更适合于环境声音的分类识别。实验结果表明，在基于神经网络的环境声音分类系统中，信号处理特征比频谱图特征的效果好，其中，MGCC特征具有比其他特征更好的性能。最后，用文中提出的MCC-CNN模型与其他环境声音分类模型在UrbanSound 8K数据集上进行了对比。实验结果表明，所提模型分类精度最好。

Key words: environment sound / hybrid feature / sound classification / convolutional neural network / filter

关键字 : 环境声音 / 特征融合 / 声音分类 / 卷积神经网络

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.