An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data

作者:

Highlights:

• A method to determine the oversampling size based on classification complexity.

• The modified complexity measures to focus on the minority class for oversampling.

• The global oversampling size for typical oversampling methods.

• The local oversampling size in each cluster for cluster-based oversampling methods.

摘要

•A method to determine the oversampling size based on classification complexity.•The modified complexity measures to focus on the minority class for oversampling.•The global oversampling size for typical oversampling methods.•The local oversampling size in each cluster for cluster-based oversampling methods.

论文关键词:Class imbalance,Oversampling,Sampling size,Adaptive boosting,Ensemble learning

论文评审过程:Received 2 November 2020, Revised 4 May 2021, Accepted 12 June 2021, Available online 30 June 2021, Version of Record 3 July 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115442