A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

作者:

Highlights:

• The underlying idea is: point p and point q should have similar neighbors, provided p and q are close to each other; given a certain eps, the closer they are, the more similar their neighbors are.

• NQ-DBSCAN is an exact algorithm that may return the same result as DBSCAN if the parameters are same. While ρ-Approximate DBSCAN is an approximate algorithm.

• The best complexity of NQ-DBSCAN can be O(n), and the average complexity of NQ-DBSCAN is proved to be O(n log(n)) provided the parameters are properly chosen. While ρ-Approximate DBSCAN runs only in O(n2) in high dimension.

• NQ-DBSCAN is suitable for clustering data with a lot of noise.

摘要

•The underlying idea is: point p and point q should have similar neighbors, provided p and q are close to each other; given a certain eps, the closer they are, the more similar their neighbors are.•NQ-DBSCAN is an exact algorithm that may return the same result as DBSCAN if the parameters are same. While ρ-Approximate DBSCAN is an approximate algorithm.•The best complexity of NQ-DBSCAN can be O(n), and the average complexity of NQ-DBSCAN is proved to be O(n log(n)) provided the parameters are properly chosen. While ρ-Approximate DBSCAN runs only in O(n2) in high dimension.•NQ-DBSCAN is suitable for clustering data with a lot of noise.

论文关键词:DBSCAN,ρ-Approximate DBSCAN,NQ-DBSCAN

论文评审过程:Received 28 April 2017, Revised 1 February 2018, Accepted 31 May 2018, Available online 5 June 2018, Version of Record 22 June 2018.

论文官网地址:https://doi.org/10.1016/j.patcog.2018.05.030