A split–merge clustering algorithm based on the k-nearest neighbor graph

作者:

Highlights:

摘要

Numerous graph-based clustering algorithms relying on k-nearest neighbor (KNN) have been proposed. However, the performance of these algorithms tends to be affected by many factors such as cluster shape, cluster density and outliers. To address these issues, we present a split–merge clustering algorithm based on the KNN graph (SMKNN), which is based on the idea that two adjacent clusters can be merged if the data points located in the connection layers of the two clusters tend to be consistent in distribution. In Stage 1, a KNN graph is constructed. In Stage 2, the subgraphs are obtained by removing the pivot points from the KNN graph, in which the pivot points are determined by the size of local distance ratio of data points. In Stage 3, the adjacent cluster pairs satisfying the maximum similarity are merged, in which the similarity measure of two clusters is designed with two concepts including external connection edges and internal connection edges. By the experiments on ten synthetic data sets and eight real data sets, we compared SMKNN with two traditional algorithms, two density-based algorithms, nine graph-based algorithms and four neural network based algorithms in accuracy. The experimental results demonstrate a good performance of the proposed clustering method.

论文关键词:K-nearest neighbor,Clustering algorithm,Split,Merge,Similarity measure

论文评审过程:Received 31 March 2022, Revised 13 August 2022, Accepted 6 September 2022, Available online 14 September 2022, Version of Record 30 September 2022.

论文官网地址:https://doi.org/10.1016/j.is.2022.102124