Weighted nearest neighbors feature selection
作者:
Highlights:
•
摘要
Huge amounts of data are pervasive in many domains and applications. Unfortunately, high-dimensional data are tightly associated with the curse of dimensionality, a phenomenon that adversely affects many data mining algorithms. Therefore, it is desirable to reduce the dimensionality of the data through preprocessing techniques such as feature selection (FS). Although FS is frequently perceived as a preprocessing technique, in some domains, such as bioinformatics, it is of paramount importance for identifying relevant attributes, and therefore, provides answers to the investigated research question. In this paper, we propose a novel supervised FS method based on k-nearest neighbors algorithm. In particular, we use distance and attribute weighted k-nearest neighbors with gradient descent as an iterative optimization algorithm for finding the function minima. The new method is compared with the state-of-the-art FS algorithms using eight artificial and twelve high-dimensional real-world datasets. The experimental results indicate that the proposed algorithm is able to identify the relevant features and shows the highest prediction performance for all four considered prediction algorithms.
论文关键词:Feature selection,k-nearest neighbors,Stochastic gradient descent,Euclidean distance,High-dimensional data
论文评审过程:Received 15 March 2018, Revised 17 September 2018, Accepted 3 October 2018, Available online 9 October 2018, Version of Record 21 November 2018.
论文官网地址:https://doi.org/10.1016/j.knosys.2018.10.004