DBHC: A DBSCAN-based hierarchical clustering algorithm

作者:

Highlights:

摘要

Clustering is the process of partitioning objects of a dataset into some groups according to similarities and dissimilarities between its objects. DBSCAN is one of the most important clustering algorithms in the density based approach of clustering. In spite of the numerous advantages of the DBSCAN algorithm, it has two important input parameters, MinPts and Eps, which determining their values is still a great challenge. This problem arises because values of these parameters are heavily dependent on data distribution. To overcome this challenge, firstly features of these parameters are investigated and the data distribution are analyzed. Then a DBSCAN-based hierarchical clustering (DBHC) method is proposed in this paper in order to fix this challenge. For this purpose, DBHC first determines values of these parameters using the notion of k nearest neighbor and k-dist plot. Because most of the real world data is not distributed uniformly, it is needed to be produced several values for the Eps parameter. Then, DBHC executes the DBSCAN algorithm several times based on the number of Eps produced earlier. Finally, DBHC method merges obtained clusters if the number of produced clusters is larger than the number which has estimated by the user. To evaluate the performance of the DBHC method, several experiments were performed on some of benchmark datasets of UCI database. Obtained results were compared with other previous works. The obtained results consistently showed that the DBHC method led to better results in comparison to the other works.

论文关键词:Clustering,Density based clustering,DBSCAN,Hierarchical clustering

论文评审过程:Received 30 September 2020, Revised 27 April 2021, Accepted 19 August 2021, Available online 25 August 2021, Version of Record 3 September 2021.

论文官网地址:https://doi.org/10.1016/j.datak.2021.101922