Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning
作者:
Highlights:
•
摘要
Datasets that have skewed class distributions pose a difficulty to learning algorithms in pattern classification. A number of different methods to deal with this problem have been developed in recent years. Specifically, synthetic oversampling techniques focus on balancing the distribution between the training instances of the majority and minority classes by generating extra artificial minority class instances. Unfortunately, few of them can be spread to tackle the problem of imbalanced data with missing values. Moreover, in most cases, existing oversampling methods do not make full use of the correlation between attributes. To this end, in this paper, we propose a fuzzy rule-based oversampling technique (FRO) to handle the class imbalance problem. FRO firstly creates fuzzy rules from the training data and assigns each of them a rule weight, which represents the certainty degree of an instance belonging to the fuzzy subspace. Then it synthesizes new minority instances under the guidance of fuzzy rules. The number of minority instances to be generated under a given fuzzy rule is determined by the rule weight. In a similar way, FRO can also recover the missing values that exist in the imbalanced dataset. Extensive experiments using 55 real-world imbalanced datasets evaluate the performance of the proposed FRO technique. The results show that our method is better than or comparable with a set of alternative state-of-the-art imbalanced classification algorithms in terms of various assessment metrics.
论文关键词:Fuzzy rules,Imbalanced data,Missing values,Attribute correlation,Synthesize minority instances
论文评审过程:Received 11 January 2018, Revised 30 May 2018, Accepted 31 May 2018, Available online 1 June 2018, Version of Record 6 July 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.05.044