A novel SMOTE-based resampling technique trough noise detection and the boosting procedure
作者:
Highlights:
• Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.
• The number of links in SMOTE is vaguely selected and same for every observation.
• We propose a new noise detection method to be applied before SMOTE to prevent noise generation.
• We also propose a new approach to select the number of links automatically in SMOTE.
• Proposed SMOTEWB method outperforms SMOTE in linear and nonlinear classifiers in presence of noise.
摘要
•Presence of noise in a data set misguides classifiers when data set is resampled by SMOTE as more noise is generated.•The number of links in SMOTE is vaguely selected and same for every observation.•We propose a new noise detection method to be applied before SMOTE to prevent noise generation.•We also propose a new approach to select the number of links automatically in SMOTE.•Proposed SMOTEWB method outperforms SMOTE in linear and nonlinear classifiers in presence of noise.
论文关键词:Oversampling,SMOTE,Class imbalance,Noisy data
论文评审过程:Received 1 April 2020, Revised 5 March 2022, Accepted 27 March 2022, Available online 30 March 2022, Version of Record 4 April 2022.
论文官网地址:https://doi.org/10.1016/j.eswa.2022.117023