Noise-robust oversampling for imbalanced data classification
作者:
Highlights:
• Propose three noise-robust mechanisms to address the noise generation problem in classic oversampling algorithms: adopting an advanced clustering algorithm, designing adaptive embedding to generate samples, and implementing a safe boundary to enlarge class boundaries.
• Propose the heterogeneous distance metric to better cluster mixed-type data along with dedicated approaches to avoid generating groundless samples with categorical variables.
• Adapted decomposition strategy extends solution for binary imbalanced data to the multi-class setting. Moreover, better placement of new samples are provided.
• Experiments on the standard datasets validate the effectiveness of the proposed data.
摘要
•Propose three noise-robust mechanisms to address the noise generation problem in classic oversampling algorithms: adopting an advanced clustering algorithm, designing adaptive embedding to generate samples, and implementing a safe boundary to enlarge class boundaries.•Propose the heterogeneous distance metric to better cluster mixed-type data along with dedicated approaches to avoid generating groundless samples with categorical variables.•Adapted decomposition strategy extends solution for binary imbalanced data to the multi-class setting. Moreover, better placement of new samples are provided.•Experiments on the standard datasets validate the effectiveness of the proposed data.
论文关键词:Imbalanced learning,Classification,Clustering
论文评审过程:Received 26 May 2021, Revised 13 August 2022, Accepted 27 August 2022, Available online 6 September 2022, Version of Record 16 September 2022.
论文官网地址:https://doi.org/10.1016/j.patcog.2022.109008