An effective distance based feature selection approach for imbalanced data

作者:Shaukat Ali Shahee, Usha Ananthakumar

摘要

Class imbalance is one of the critical areas in classification. The challenges become more severe when the data set has a large number of features. Traditional classifiers generally favour the majority class because of skewed class distributions. In recent years, feature selection is being used to select the appropriate features for better classification of minority class. However, these studies are limited to imbalance that arise between the classes. In addition to between class imbalance, within class imbalance, along with large number of features, adds additional complexity and results in poor performance of the classifier. In the current study, we propose an effective distance based feature selection method (ED-Relief) that uses a sophisticated distance measure, in order to tackle simultaneous occurrence of between and within class imbalance. This method has been tested on a variety of simulated experiments and real life data sets and the results are compared with the traditional Relief method and some of the well known recent distance based feature selection methods. The results clearly show the superiority of the proposed effective distance based feature selection method.

论文关键词:Imbalanced data, Feature selection, Effective distance, Classification, Jeffreys divergence

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10489-019-01543-z