Volume 39, Number 5, October 2021
|Page(s)||1122 - 1129|
|Published online||14 December 2021|
The manifold embedded selective pseudo-labeling algorithm and transfer learning of small sample dataset
College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
2 School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China
Special scene classification and identification tasks are not easily fulfilled to obtain samples, which results in a shortage of samples. The focus of current researches lies in how to use source domain data (or auxiliary domain data) to build domain adaption transfer learning models and to improve the classification accuracy and performance of small sample machine learning in these special and difficult scenes. In this paper, a model of deep convolution and Grassmann manifold embedded selective pseudo-labeling algorithm (DC-GMESPL) is proposed to enable transfer learning classifications among multiple small sample datasets. Firstly, DC-GMESPL algorithm uses satellite remote sensing image sample data as the source domain to extract the smoke features simultaneously from both the source domain and the target domain based on the Resnet50 deep transfer network. This is done for such special scene of the target domain as the lack of local sample data for forest fire smoke video images. Secondly, DC-GMESPL algorithm makes the source domain feature distribution aligned with the target domain feature distribution. The distance between the source domain and the target domain feature distribution is minimized by removing the correlation between the source domain features and re-correlation with the target domain. And then the target domain data is pseudo-labeled by selective pseudo-labeling algorithm in Grassmann manifold space. Finally, a trainable model is constructed to complete the transfer classification between small sample datasets. The model of this paper is evaluated by transfer learning between satellite remote sensing image and video image datasets. Experiments show that DC-GMESPL transfer accuracy is higher than DC-CMEDA, Easy TL, CMMS and SPL respectively. Compared with our former DC-CMEDA, the transfer accuracy of our new DC-GMESPL algorithm has been further improved. The transfer accuracy of DC-GMESPL from satellite remote sensing image to video image has been improved by 0.50%, the transfer accuracy from video image to satellite remote sensing image has been improved by 8.50% and then, the performance has been greatly improved.
特殊场景分类和识别任务面临样本不易获得而造成样本缺乏，利用源域（或称辅助域）数据构建领域自适应迁移学习模型，提高小样本机器学习在这些困难场景中的分类准确度与性能是当前研究的热点与难点。提出深度卷积与格拉斯曼流形嵌入的选择性伪标记算法（deep convolution and Grassmann manifold embedded selective pseudo-labeling，DC-GMESPL）模型，以实现在多种小样本数据集间迁移学习分类。针对目标域特殊场景，如森林火灾烟雾视频图像的本地样本数据缺乏情景，使用卫星遥感图像异地样本数据作为源域，基于Resnet50深度迁移网络，同时提取源域与目标域的烟雾特征；通过去除源域特征间的相关性，并与目标域重新关联，最小化源域与目标域特征分布距离，使源域与目标域特征分布对齐；在格拉斯曼流形空间中，用选择性伪标记算法对目标域数据作伪标记；构建一种可训练模型完成小样本数据间迁移分类。通过卫星遥感图像与视频影像数据集间迁移学习，对文中模型进行评估。实验表明，DC-GMESPL迁移准确率均高于DC-CMEDA、Easy TL、CMMS和SPL等方法。与作者先期研究的DC-CMEDA算法相比，新算法DC-GMESPL的准确率得到进一步提升；DC-GMESPL从卫星遥感图像到视频图像迁移准确率提高了0.50%，而从视频图像到卫星遥感图像迁移准确率提高了8.50%，且在性能上有了很大改善。
Key words: transfer learning / domain adaptation / deep convolution neural networks / small sample dataset / forest fire smoke features
关键字 : 迁移学习 / 领域自适应 / 深度卷积神经网络 / 小样本数据集 / 森林火灾烟雾特征
© 2021 Journal of Northwestern Polytechnical University. All rights reserved.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.