The ensemble of density-sensitive SVDD classifier based on maximum soft margin for imbalanced datasets

作者:

Highlights:

摘要

Imbalanced problems have recently attracted much attention due to their prevalence in numerous domains of great importance to the data mining community. However, conventional bi-class classification approaches, e.g., Support vector machine (SVM), generally perform poorly on imbalanced datasets as they are originally designed to generalize from the training data, and pay little attention to the minority class. In the paper, we extend traditional support vector domain description (SVDD) and propose a novel density-sensitive SVDD classifier based on maximum soft margin (DSMSM-SVDD) for imbalanced datasets. In the proposed approach, the relative density-based penalty weights are incorporated into the optimization objective function to represent the importance of the data samples. Through optimizing the objective function with the relative density-based penalty weights, the training majority samples with high relative densities are more likely to lie inside the hypersphere, thus eliminating noise effects on traditional SVDD. In addition, to make full use of the minority class samples to refine the boundary in training, the maximum soft margin regularization term is also introduced in the proposed technique inspired by the idea of maximizing soft margin of traditional SVM. This method allows the optimal domain description boundary to more skew toward the minority class than traditional SVDD and thus improves the classification accuracy. Eventually, AdaBoost ensemble version of DSMSM-SVDD is developed so as to further improve the generalization performance and stability in dealing with imbalanced datasets. The extensive experimental results on various datasets demonstrate that the proposed approach significantly outperforms other existing algorithms when dealing with the imbalanced classification problems in terms of G-Mean, F-Measure and AUC performance measures.

论文关键词:Imbalanced datasets,Support vector machine,Support vector domain description,Relative density,Maximum soft margin

论文评审过程:Received 22 December 2020, Revised 15 February 2021, Accepted 23 February 2021, Available online 26 February 2021, Version of Record 3 March 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.106897