A novel SMOTE-based resampling technique trough noise detection and the boosting procedure

作者:

Highlights:

• Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.

• The number of links in SMOTE is vaguely selected and same for every observation.

• We propose a new noise detection method to be applied before SMOTE to prevent noise generation.

• We also propose a new approach to select the number of links automatically in SMOTE.

• Proposed SMOTEWB method outperforms SMOTE in linear and nonlinear classifiers in presence of noise.

摘要

•Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.•The number of links in SMOTE is vaguely selected and same for every observation.•We propose a new noise detection method to be applied before SMOTE to prevent noise generation.•We also propose a new approach to select the number of links automatically in SMOTE.•Proposed SMOTEWB method outperforms SMOTE in linear and nonlinear classifiers in presence of noise.

论文关键词:Oversampling,SMOTE,Class imbalance,Noisy data

论文评审过程:Received 1 April 2020, Revised 5 March 2022, Accepted 27 March 2022, Available online 30 March 2022, Version of Record 4 April 2022.

论文官网地址:https://doi.org/10.1016/j.eswa.2022.117023