Semi-supervised neighborhood discrimination index for feature selection

作者:

Highlights:

摘要

Neighborhood discriminant index (NDI) is an effective feature selection method for supervised learning. In reality, it is easy to obtain unlabeled data and is costly to tag them all. Thus, the given dataset commonly has only a small amount of tagged samples and a large amount of unlabeled ones, which cannot be handled by supervised learning methods. For this situation, we propose a semi-supervised feature selection method called semi-supervised neighborhood discriminant index (SSNDI) that combines NDI and the Laplacian score method to effectively deal with both labeled and unlabeled samples. The goal of SSNDI is to find an optimal feature subset that has a good ability to keep local geometrical structure and to distinguish samples belonging to different classes. In SSNDI, the classical Laplacian score method is modified to cooperate the iterative form of NDI. In each iteration, SSNDI picks up an important feature according to the new criterion that is a mixture of NDI and the modified Laplacian score. Extensive experiments are conducted on UCI and microarray gene datasets. Experimental results confirm that SSNDI can achieve a better performance than NDI and the other state-of-the-art semi-supervised methods.

论文关键词:Semi-supervised,Feature selection,Neighborhood discriminant index

论文评审过程:Received 21 January 2020, Revised 26 May 2020, Accepted 6 July 2020, Available online 8 July 2020, Version of Record 10 July 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106224