A Fast Parallel Clustering Algorithm for Large Spatial Databases

作者:Xiaowei Xu, Jochen Jäger, Hans-Peter Kriegel

摘要

The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we present PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnected through a network. A fundamental component of a shared-nothing system is its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer. We implemented our method using a number of workstations connected via Ethernet (10 Mbit). A performance evaluation shows that PDBSCAN offers nearly linear speedup and has excellent scaleup and sizeup behavior.

论文关键词:clustering algorithms, parallel algorithms, distributed algorithms, scalable data mining, distributed index structures, spatial databases

论文评审过程:

论文官网地址:https://doi.org/10.1023/A:1009884809343