A novel Random Forest integrated model for imbalanced data classification problem

作者:

Highlights:

• A novel classification algorithm is proposed based on the integrated over-sampling technique for various imbalanced data sets.

• The new instances are generated dynamically and variously according to the number of the majority class and the sample center of the minority class.

• The improved SSA was adopted to optimize RF parameters adaptively for adapting to each imbalanced data set.

• Most recently used assessment parameters and optimized algorithms for imbalanced data sets classification were diversely and critically compared.

摘要

•A novel classification algorithm is proposed based on the integrated over-sampling technique for various imbalanced data sets.•The new instances are generated dynamically and variously according to the number of the majority class and the sample center of the minority class.•The improved SSA was adopted to optimize RF parameters adaptively for adapting to each imbalanced data set.•Most recently used assessment parameters and optimized algorithms for imbalanced data sets classification were diversely and critically compared.

论文关键词:Imbalanced data classification,Random Forest,Sparrow Search Algorithm,Oversampling

论文评审过程:Received 29 November 2021, Revised 12 May 2022, Accepted 12 May 2022, Available online 21 May 2022, Version of Record 27 May 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109050