An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme
作者:
Highlights:
•
摘要
Class-imbalance learning is one of the most challenging problems in machine learning. As a new and important direction in this field, multi-class imbalanced data classification has attracted a great many research focus in recent years. In this paper, we first make a very comprehensive review on state-of-the-art classification algorithms for multi-class imbalanced data. Moreover, we propose a new multi-class imbalance classification algorithm, which is hereafter referred to as the Diversified Error Correcting Output Codes (DECOC) method. The main idea of DECOC is to combine the improved ECOC (Error Correcting Output Codes) method for tackling class imbalance, and the diversified ensemble learning framework, which finds the best classification algorithm (out of many heterogeneous classification algorithms) for each individual sub-dataset resampled from the original data. We conduct experiments on 19 public datasets to empirically compare the performance of DECOC with 17 state-of-the-art multi-class imbalance learning algorithms, using 4 different accuracy measures: overall accuracy, Geometric mean, F-measure, and Area Under Curve. Experimental results demonstrate that DECOC achieves significantly better accuracy performance than the other 17 algorithms on these accuracy metrics. To advance research in this field, we make all the source codes of DECOC and the above-mentioned 17 state-of-the-art algorithms for imbalanced data classification be available at GitHub: https://github.com/chongshengzhang/Multi_Imbalance.
论文关键词:Multi-class imbalance learning,Classification algorithms,Decomposition methods for multi-class data,Multi-class imbalanced data classification
论文评审过程:Received 15 October 2017, Revised 25 May 2018, Accepted 27 May 2018, Available online 4 June 2018, Version of Record 6 July 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.05.037