K-means properties on six clustering benchmark datasets
作者:Pasi Fränti, Sami Sieranoja
摘要
This paper has two contributions. First, we introduce a clustering basic benchmark. Second, we study the performance of k-means using this benchmark. Specifically, we measure how the performance depends on four factors: (1) overlap of clusters, (2) number of clusters, (3) dimensionality, and (4) unbalance of cluster sizes. The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.
论文关键词:Clustering algorithms, Clustering quality, k-means, Benchmark
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-018-1238-7