Weighted oversampling algorithms for imbalanced problems and application in prediction of streamflow
作者:
Highlights:
•
摘要
Synthetic Minority Oversampling Technique (SMOTE) is one of the most prevalent oversampling methods in imbalanced classification. Due to its intrinsic drawbacks of generating new samples blindly without distinguishing noisy samples, many variants of SMOTE have been developed to avoid selecting the label-noise samples as seed samples or to remove label-noise samples after being oversampled. However, these variants interpolate new samples linearly among the minority samples and their neighbors randomly without considering the relative chaotic level between each minority sample and its neighbors. In this paper, we propose a general weighting framework that carefully designates the interpolation location of each synthetic sample by computing the chaotic levels between the seed sample and its neighbors, placing it closer to a safe and clean sample and far away from a chaotic one. This general weighting framework can be easily combined with diversified SMOTE variants, thus we called it W-SMOTEs. Extensive experiments on synthetic, UCI, and industrial datasets with different levels of label noise demonstrate that the W-SMOTEs can effectively reduce the noisy samples produced and can enhance the separability between classes.
论文关键词:Oversampling,Interpolation location,Weight,W-SMOTEs,Label noise
论文评审过程:Received 3 March 2021, Revised 9 June 2021, Accepted 12 July 2021, Available online 15 July 2021, Version of Record 3 August 2021.
论文官网地址:https://doi.org/10.1016/j.knosys.2021.107306