Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

作者:

Highlights:

摘要

The imbalanced class problem is related to the real-world application of classification in engineering. It is characterised by a very different distribution of examples among the classes. The condition of multiple imbalanced classes is more restrictive when the aim of the final system is to obtain the most accurate precision for each of the concepts of the problem.The goal of this work is to provide a thorough experimental analysis that will allow us to determine the behaviour of the different approaches proposed in the specialised literature. First, we will make use of binarization schemes, i.e., one versus one and one versus all, in order to apply the standard approaches to solving binary class imbalanced problems. Second, we will apply several ad hoc procedures which have been designed for the scenario of imbalanced data-sets with multiple classes.This experimental study will include several well-known algorithms from the literature such as decision trees, support vector machines and instance-based learning, with the intention of obtaining global conclusions from different classification paradigms. The extracted findings will be supported by a statistical comparative analysis using more than 20 data-sets from the KEEL repository.

论文关键词:Imbalanced data-sets,Multi-classification,Pairwise learning,Preprocessing,Cost-sensitive learning

论文评审过程:Received 24 July 2012, Revised 15 January 2013, Accepted 18 January 2013, Available online 29 January 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.01.018