A new classifier for imbalanced data with iterative learning process and ensemble operating process

作者:

Highlights:

摘要

At present, existing methods carry out imbalanced classification at two levels, namely a sampling level and an algorithmic level. However, different sampling methods provide different data distributions for classifiers, which may lead to unreliable decision-making. The algorithmic level methods are often limited to a given classifier. In this paper, we propose a new iterative ensemble classifier for imbalanced data with iterative learning process and ensemble operating process (C-ILEO). There is an ensemble base classifier in each iteration of learning and its elementary classifier learns datasets in a low dimensional space composed of a small number of features which are selected with a new rule based on mutual information. Samples classified into negative class by the ensemble base classifier at each iteration are removed because these samples can no longer provide significant information for the following learning. In addition, the class weight of each elementary classifier is optimized with a new strategy which is similar to the half-interval search algorithm. The ensemble operating process is carried out on what all ensemble base classifiers learned on all the training samples including removed samples during the iteration. Compared with 16 commonly used imbalanced learning methods on 15 real world imbalanced datasets, C-ILEO outperforms most algorithmic, data level and ensemble approaches on metrics of Gmean, F1, Acc, and Precision.

论文关键词:Imbalanced learning,Feature selection,Mutual information,Class weight

论文评审过程:Received 13 October 2021, Revised 28 April 2022, Accepted 30 April 2022, Available online 10 May 2022, Version of Record 19 May 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108966