Multi-spectral fusion power equipment fault recognition based on prompt learning | Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University

Open Access

Issue		JNWPU Volume 43, Number 2, April 2025


Page(s)		410 - 417
DOI		https://doi.org/10.1051/jnwpu/20254320410
Published online		04 June 2025

LYU Qiang, WANG Wei, MA Guoqiang, et al. Research on intelligent image identification technology of power equipment inspection defects[J]. Journal of Anhui Normal University, 2022, 45(6): 545–552 (in Chinese) [Google Scholar]
JIN Xin, HONG Bin, YU Dongsheng, et al. Defect detection of coal mine power equipment based on improved YOLOv5s[J]. Electronic Measurement Technology, 2023, 46(19): 148–155 (in Chinese) [Google Scholar]
YU Hao, JIANG Jinxia, LAI Xiaohan, et al. Surface defect detection of power equipment using adaptive receptive field network[J]. Journal of System Simulation, 2023, 35(7): 1572–1580 (in Chinese) [Google Scholar]
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning, 2021: 8748–8763 [Google Scholar]
JIA C, YANG Y, XIA Y, et al. Scaling up visual and vision-language representation learning with noisy text supervision[C]//International Conference on Machine Learning, 2021: 4904–4916 [Google Scholar]
CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//International Conference on Machine Learning, 2020: 1597–1607 [Google Scholar]
ZHOU K, YANG J, LOY C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337–2348. [Article] [Google Scholar]
BAHNG H, JAHANIAN A, SANKARANARAYANAN S, et al. Exploring visual prompts for adapting large-scale models[J/OL]. (2022-06-03)[2024-02-25]. [Article] [Google Scholar]
ZHANG R, FANG R, ZHANG W, et al. Tip-adapter: training-free clip-adapter for better vision-language modeling[J/OL]. (2021-11-15)[2024-02-25]. [Article] [Google Scholar]
ZHOU K, YANG J, LOY C C, et al. Conditional prompt learning for vision-language models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 16816–16825 [Google Scholar]
LI Y, QUAN R, ZHU L, et al. Efficient multimodal fusion via interactive prompting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 2604–2613 [Google Scholar]
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations, 2020 [Google Scholar]
TAN H, BANSAL M. LXMERT: learning cross-modality encoder representations from transformers[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019: 5100–5111 [Google Scholar]
CHEN Y C, LI L, YU L, et al. Uniter: Universal image-text representation learning[C]//European Conference on Computer Vision, Cham, 2020: 104–120 [Google Scholar]
DING N, QIN Y, YANG G, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models[J]. Nature Machine Intelligence, 2023, 5(3): 220–235. [Article] [CrossRef] [Google Scholar]
LI Y, QUAN R, ZHU L, et al. Efficient multimodal fusion via interactive prompting[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 2604–2613 [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.