Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification

作者:

Highlights:

• We design a new grouping scheme. It can provide not only a theoretical basis for selecting the minority class samples in an oversampling method but also a new explanation for the poor performance of SVM on imbalanced data sets.

• We design a new oversampling algorithm for generating the minority class samples, which can effectively reduce the bias of the decision hyperplane obtained on the imbalanced data sets toward the minority class. At the same time, it makes full use of the repeated sample pairs and reduces the risk of overfitting of the classifier trained on the balanced data set.

• Extensive experimental results show that the proposed oversampling method outperforms the compared benchmark algorithms.

摘要

•We design a new grouping scheme. It can provide not only a theoretical basis for selecting the minority class samples in an oversampling method but also a new explanation for the poor performance of SVM on imbalanced data sets.•We design a new oversampling algorithm for generating the minority class samples, which can effectively reduce the bias of the decision hyperplane obtained on the imbalanced data sets toward the minority class. At the same time, it makes full use of the repeated sample pairs and reduces the risk of overfitting of the classifier trained on the balanced data set.•Extensive experimental results show that the proposed oversampling method outperforms the compared benchmark algorithms.

论文关键词:Imbalanced data classification,Kernel method,Support vector machine,Oversampling

论文评审过程:Received 16 April 2022, Revised 24 July 2022, Accepted 20 August 2022, Available online 24 August 2022, Version of Record 30 August 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108992