SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling
作者:Hongjiao Guan, Yingtao Zhang, Min Xian, H. D. Cheng, Xianglong Tang
摘要
Many practical applications suffer from imbalanced data classification, in which case the minority class has degraded recognition rate. The primary causes are the sample scarcity of the minority class and the intrinsic complex distribution characteristics of imbalanced datasets. The imbalanced classification problem is more serious on small sample datasets. To solve the problems of small sample and class imbalance, a hybrid resampling method is proposed. The proposed method combines an oversampling approach (synthetic minority oversampling technique, SMOTE) and a novel data cleaning approach (weighted edited nearest neighbor rule, WENN). First, SMOTE generates synthetic minority class examples using linear interpolation. Then, WENN detects and deletes unsafe majority and minority class examples using weighted distance function and k-nearest neighbor (kNN) rule. The weighted distance function scales up a commonly used distance by considering local imbalance and spacial sparsity. Extensive experiments over synthetic and real datasets validate the superiority of the proposed SMOTE-WENN compared with three state-of-the-art resampling methods.
论文关键词:Imbalanced data classification, Small sample datasets, Oversampling, Data cleaning
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-020-01852-8