How much can k-means be improved by using better initialization and repeats?
作者:
Highlights:
• K-means clustering algorithm can be significantly improved by using a better initialization technique, and by repeating (re-starting) the algorithm.
• When the data has overlapping clusters, k-means can improve the results of the initialization technique.
• When the data has well separated clusters, the performance of k-means depends completely on the goodness of the initialization.
• Initialization using simple furthest point heuristic (Maxmin) reduces the clustering error of k-means from 15% to 6%, on average.
摘要
•K-means clustering algorithm can be significantly improved by using a better initialization technique, and by repeating (re-starting) the algorithm.•When the data has overlapping clusters, k-means can improve the results of the initialization technique.•When the data has well separated clusters, the performance of k-means depends completely on the goodness of the initialization.•Initialization using simple furthest point heuristic (Maxmin) reduces the clustering error of k-means from 15% to 6%, on average.
论文关键词:Clustering algorithms,K-means,Initialization,Clustering accuracy,Prototype selection
论文评审过程:Received 13 August 2018, Revised 11 March 2019, Accepted 13 April 2019, Available online 15 April 2019, Version of Record 22 April 2019.
论文官网地址:https://doi.org/10.1016/j.patcog.2019.04.014