Benchmarking graph-based clustering algorithms

作者:

Highlights:

摘要

Among all the different clustering approaches proposed so far, graph-based algorithms are particularly suited for dealing with data that does not come from a Gaussian or a spherical distribution. They can be used for detecting clusters of any size and shape without the need of specifying the actual number of clusters; moreover, they can be profitably used in cluster detection problems.Despite of the fact that graph-based methods are gaining more and more popularity in different scientific areas, the choice of an appropriate algorithm for a given application is still the most crucial task. In this paper, we then present a detailed performance evaluation of five different graph-based clustering approaches on a database of synthetically generated graphs. The main findings of such an analysis were that algorithms based on the Minimum Spanning Tree perform better than other approaches.Four of the algorithms selected for comparison have been chosen from the open literature. While these algorithms do not require the setting of the number of clusters, they need, however, some parameters to be provided by the user. So, as the fifth algorithm under comparison, we propose an approach that overcomes this limitation, proving to be an effective solution in real applications where a completely unsupervised method for cluster detection is desirable. This was confirmed by a further comparative analysis carried out on four datasets coming from the UCI Machine Learning Repository.

论文关键词:Benchmarking,Graph-based clustering,Cluster detection

论文评审过程:Received 24 October 2007, Revised 28 April 2008, Accepted 5 May 2008, Available online 20 May 2008.

论文官网地址:https://doi.org/10.1016/j.imavis.2008.05.002