A noise-aware fuzzy rough set approach for feature selection

作者:

Highlights:

摘要

Feature selection has aroused extensive attention and aims at selecting features that are highly relevant to classification from raw datasets to improve the performance of a learning model. Fuzzy rough set theory is a powerful mathematical method for feature selection. The classical fuzzy rough set model is very sensitive to the noise while the noise samples in classification data often appear. In addition, fuzzy rough set theory does not fit well when the density distribution of the samples in the dataset varies greatly. Thus, it is of great significance to improve the robustness of fuzzy rough set models and its adaptability to data for feature selection. Inspired by these issues, we focus on the robust fuzzy rough set approach for feature selection. We first propose a robust fuzzy rough set model based on data distribution to achieve the purpose of anti-noise i.e., Noise-aware Fuzzy Rough Sets (NFRS) model. This model proposes a novel search mechanism, which weakens the sensitivity of the approximation operator to noise by considering the distribution of samples in the decision classes to weight the samples, further obtains three kinds of samples, i.e., intra-class samples, boundary samples, and outlier samples. Then, the degrees of relevance of the feature for class is defined by the dependency function based on the NFRS model to evaluate the significance of the feature subset. On this basis, an evaluation function about feature significance is constructed, which simultaneously considers the relevance and redundancy of a candidate feature provided for the selected subset and the remaining feature subset. A novel forward greedy search algorithm is presented to select a feature sequence. The selected features are subsequently evaluated with downstream classification tasks. Experimental using real-world datasets demonstrate the effectiveness of the proposed model and its superiority against comparison baseline methods.

论文关键词:Fuzzy rough set,Robust feature selection,Noisy data,Density distribution,Dependency function

论文评审过程:Received 22 December 2021, Revised 16 May 2022, Accepted 17 May 2022, Available online 26 May 2022, Version of Record 9 June 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.109092