Menu
Open Access
 Issue JNWPU Volume 40, Number 6, December 2022 1261 - 1268 https://doi.org/10.1051/jnwpu/20224061261 10 February 2023

© 2022 Journal of Northwestern Polytechnical University. All rights reserved.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## 1 OTE-GWRFFS算法

 图1OTE-GWRFFS算法流程图

### 1.1 OTE-GWRFFS算法流程

OTE-GWRFFS算法的具体步骤如下所示。

step1  初始化原始数据集S，设置构建随机森林的决策树个数n

step2  利用GAN算法对原始数据集进行数据增强，得到生成数据集S

step3  利用改进的NL-Means对生成数据集S′进行个别离群点的拟合得到数据集S

step4  采用bagging抽样，构成L个训练数据集S1i(i=1, 2, …, L)，一个测试数据集S2，每个训练数据集中有N′个样本数据，m个样本特征

step5   对数据集S1i进行决策树的构建

step6   for (i=1 to n)

if (A减小)

break

step7   for(i=1 to n-n′)

step8   最终的特征重要性度量值Fj

## 2 小样本数据扩充

1) 数据生成

2) 数据分布修正

 图2数据生成模型图

1) 分类错误率

2) 特征重要性度量

n=7，T=5的决策矩阵

## 4 实验验证

UCI中小样本实验数据汇总

 图3最优树集分类精度图

 图4降维精度对比图

## References

1. HASSAN H, BADR A, ABDELHALIM M B. Prediction of o-glycosylation sites using random forest and GA-tuned PSO technique[J]. Bioinformatics & Biology Insights, 2015, 9(9): 103–109 [Google Scholar]
2. ROBIN G, JEAN-MICHEL P, CHRISTINE T. Variable selection using random forests[J]. Pattern Recognit, Lett, 2010, 31: 2225–2236 [Article] [Google Scholar]
3. YAO Dengju, YANG Jing, ZHAN Xiaojuan. Feature selection algorithm based on random forest[J]. Journal of Jilin University, 2014, 44(1): 137–141 (in Chinese) [Google Scholar]
4. WANG Xiang, HU Xuegang. A review of feature selection in high-dimensional small sample classification[J]. Computer Application, 2017, 37(9): 2433–2438 (in Chinese) [Google Scholar]
5. XU Shaocheng, LI Dongxi. Weighted feature selection algorithm based on random forest[J]. Statistics and Decision Making, 2018, 34(18): 25–28 (in Chinese) [Google Scholar]
6. LI H B, WANG W, DING H W, et al. Trees weighting random forest method for classifying high dimensional noisy data[C]//IEEE 7th International Conference on E-Business Engineering, 2010 [Google Scholar]
7. KHAN Zardad, ASMA Gul, ARIS Perperoglou, et al. Ensemble of optimal trees, random forest and random projection ensemble classification[J]. Advances in Data Analysis and Classification, 2020, 14: 97–116 [Article] [CrossRef] [Google Scholar]
8. KHAN Z, GUL A, MAHMOUD O, et al. An ensemble of optimal trees for class membership probability estimation//Analysis of large and complex data[M]. Switzerand: Springer International Publshiug, 2016: 395–409 [Google Scholar]
9. WEN B, LUIS O, COLON K P. Subbalakshmi and ramamurti chandramouli causal-TGAN: generating tabular data using causal generative adversarial networks[D]. Hoboken: Stevens Institute of Technology, 2021 [Google Scholar]
10. ZHAO Qingping, CHEN Debao, JIANG Enhua, et al. An improved weighted nonlocal mean image denoising algorithm[J]. Journal of Electronic Measurement and Instrument, 2014, 28(3): 334–339 (in Chinese) [Google Scholar]
11. KUNCHEVA L I, MATTHEWS C E, ARNAIZ-GONZÁLEZ A, et al. Feature selection from high-dimensional data with very low sample size: a cautionary tale[J/OL]. (2020-08-27)[2022-01-19]. https://arxiv.org/abs/2008.12025 [Google Scholar]
12. LI Qiuwei. Research on small sample data processing method based on conditional generation countermeasure network and transfinite learning machine[D]. Zhenjiang: Jiangsu University, 2019 (in Chinese) [Google Scholar]

n=7，T=5的决策矩阵

UCI中小样本实验数据汇总

## All Figures

 图1OTE-GWRFFS算法流程图 In the text
 图2数据生成模型图 In the text
 图3最优树集分类精度图 In the text
 图4降维精度对比图 In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.