Fast density peak clustering for large scale data based on kNN

作者:

Highlights:

摘要

Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, ρ and δ, are both obtained by brute force algorithm with complexity O(n2). Thus, a simple but fast DPeak, namely FastDPeak,1 is proposed, which runs in about O(nlog(n)) expected time in the intrinsic dimensionality. It replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations. Based on kNN-density, local density peaks and non-local density peaks are identified, and a fast algorithm, which uses two different strategies to compute δ for them, is also proposed with complexity O(n). Experimental results show that FastDPeak is effective and outperforms other variants of DPeak.

论文关键词:Density peak,FastDPeak,kNN-density

论文评审过程:Received 4 January 2019, Revised 24 June 2019, Accepted 27 June 2019, Available online 3 July 2019, Version of Record 18 November 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2019.06.032