Rough set based semi-supervised feature selection via ensemble selector

作者：

Highlights：

•

摘要

Similar to feature selection over completely labeled data, the aim of feature selection over partially labeled data (semi-supervised feature selection) is also to find a feature subset which satisfies the intended constraint. Nevertheless, two difficulties may emerge in the semi-supervised feature selection: (1) labels are incomplete since labeled and unlabeled samples coexist in data; (2) the explanation of the selected feature subset is not clear. Therefore, such two problems will be mainly addressed in our research. Firstly, the unlabeled samples can be predicted through various semi-supervised learning methods. Secondly, the Local Neighborhood Decision Error Rate is proposed to construct multiple fitness functions for evaluating the significance of the candidate feature. Such mechanism not only realizes the ensemble selector in the process of feature selection, but also the qualified feature subset will bring us lower decision errors. Immediately, a heuristic algorithm is re-designed to execute feature selection. Finally, through testing nine different ratios (10%, 20%, …, 90%) of labeled samples in data, the experimental results demonstrate that our approach is superior to previous researches, mainly because: (1) the qualified feature subset derived by our approach can provide better classification performance; (2) the lower time consumption is required in our process of feature selection.

论文关键词：Ensemble selector,Feature selection,Neighborhood rough set,Partially labeled data,Semi-supervised learning

论文评审过程：Received 24 June 2018, Revised 31 October 2018, Accepted 27 November 2018, Available online 30 November 2018, Version of Record 7 January 2019.

论文官网地址：https://doi.org/10.1016/j.knosys.2018.11.034