Class imbalance and the curse of minority hubs

作者:

Highlights:

摘要

Most machine learning tasks involve learning from high-dimensional data, which is often quite difficult to handle. Hubness is an aspect of the curse of dimensionality that was shown to be highly detrimental to k-nearest neighbor methods in high-dimensional feature spaces. Hubs, very frequent nearest neighbors, emerge as centers of influence within the data and often act as semantic singularities. This paper deals with evaluating the impact of hubness on learning under class imbalance with k-nearest neighbor methods. Our results suggest that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets. The standard approaches to learning under class imbalance usually clearly favor the instances of the minority class and are not well suited for handling such highly detrimental minority points. In our experiments, we have evaluated several state-of-the-art hubness-aware kNN classifiers that are based on learning from the neighbor occurrence models calculated from the training data. The experiments included learning under severe class imbalance, class overlap and mislabeling and the results suggest that the hubness-aware methods usually achieve promising results on the examined high-dimensional datasets. The improvements seem to be most pronounced when handling the difficult point types: borderline points, rare points and outliers. On most examined datasets, the hubness-aware approaches improve the classification precision of the minority classes and the recall of the majority class, which helps with reducing the negative impact of minority hubs. We argue that it might prove beneficial to combine the extensible hubness-aware voting frameworks with the existing class imbalanced kNN classifiers, in order to properly handle class imbalanced data in high-dimensional feature spaces.

论文关键词:Class imbalance,Class overlap,Classification,k-Nearest neighbor,Hubness,Curse of dimensionality

论文评审过程:Received 6 May 2013, Revised 28 August 2013, Accepted 29 August 2013, Available online 11 September 2013.

论文官网地址:https://doi.org/10.1016/j.knosys.2013.08.031