NearCount: Selecting critical instances based on the cited counts of nearest neighbors

作者:

Highlights:

摘要

Traditional instance selection algorithms are not good at addressing imbalanced problems. Moreover, most of them are sensitive to noise instances and suffer from complex selection rules. To solve these problems, in this paper, we propose a concise learning framework named NearCount to determine the importance of the instance without editing noise. In NearCount, the importance of an instance corresponds to the cited counts. The count is determined by the number of times that one instance is selected as a nearest neighbor of instances in different classes. For the instances with nonzero cited counts, the importance of the instance is inversely proportional to the cited count. To handle classification problems with different data distributions, two detailed NearCount-based algorithms – NearCount-IM and NearCount-IS – are introduced. For imbalanced problems, NearCount-IM selects the important majority instances with an equal number of minority instances, thus balancing the data distribution. For balanced scenarios, NearCount-IS selects the instances whose cited counts are greater than zero and equal or less than the number of nearest neighbors as critical instances in every class. The proposed NearCount-IM and NearCount-IS algorithms are evaluated by comparing them with classical instance selection algorithms on the benchmark data sets. Experiments validate the effectiveness of the proposed algorithms.

论文关键词:Critical instance,Nearest neighbor,Cited counts,Imbalanced problem,Instance selection

论文评审过程:Received 6 March 2019, Revised 3 November 2019, Accepted 5 November 2019, Available online 7 November 2019, Version of Record 7 February 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.105196