BIRCHSCAN: A sampling method for applying DBSCAN to large datasets

作者:

Highlights:

• A sampling-based method for running DBSCAN on large data sets.

• The BIRCH algorithm is used to build a biased sample.

• It is driven by a unique parameter, a multiplier factor that defines the Threshold used by BIRCH.

• The proposed method has a good trade-off between results quality and running time in larger datasets.

摘要

•A sampling-based method for running DBSCAN on large data sets.•The BIRCH algorithm is used to build a biased sample.•It is driven by a unique parameter, a multiplier factor that defines the Threshold used by BIRCH.•The proposed method has a good trade-off between results quality and running time in larger datasets.

论文关键词:Clustering,Sampling,DBSCAN,BIRCH

论文评审过程:Received 9 March 2021, Revised 29 June 2021, Accepted 29 June 2021, Available online 10 July 2021, Version of Record 14 July 2021.

论文官网地址:https://doi.org/10.1016/j.eswa.2021.115518