A dissimilarity-based imbalance data classification algorithm

作者：Xueying Zhang, Qinbao Song, Guangtao Wang, Kaiyuan Zhang, Liang He, Xiaolin Jia

摘要

Class imbalances have been reported to compromise the performance of most standard classifiers, such as Naive Bayes, Decision Trees and Neural Networks. Aiming to solve this problem, various solutions have been explored mainly via balancing the skewed class distribution or improving the existing classification algorithms. However, these methods pay more attention on the imbalance distribution, ignoring the discriminative ability of features in the context of class imbalance data. In this perspective, a dissimilarity-based method is proposed to deal with the classification of imbalanced data. Our proposed method first removes the useless and redundant features by feature selection from the given data set; and then, extracts representative instances from the reduced data as prototypes; finally, projects the reduced data into a dissimilarity space by constructing new features, and builds the classification model with data in the dissimilarity space. Extensive experiments over 24 benchmark class imbalance data sets show that, compared with seven other imbalance data tackling solutions, our proposed method greatly improves the performance of imbalance learning, and outperforms the other solutions with all given classification algorithms.

论文关键词：Dissimilarity-based classification, Class imbalance, Software defect prediction, Feature selection, Prototype selection

论文评审过程：

论文官网地址：https://doi.org/10.1007/s10489-014-0610-5