An efficient approximation to the K-means clustering for massive data

作者:

Highlights:

• An approximation to the Kmeans algorithm for massive data problems is proposed.

• RPKM reduces several orders of computations while obtaining good approximations.

• RPKM reduces the maximum number of Lloyd’s iterations up to a stirling number order.

• Experimentally, a monotone descent of the error function is consistently observed.

摘要

•An approximation to the Kmeans algorithm for massive data problems is proposed.•RPKM reduces several orders of computations while obtaining good approximations.•RPKM reduces the maximum number of Lloyd’s iterations up to a stirling number order.•Experimentally, a monotone descent of the error function is consistently observed.

论文关键词:K-means,Clustering,K-means++,Minibatch K-means

论文评审过程:Received 29 March 2016, Revised 13 June 2016, Accepted 27 June 2016, Available online 28 June 2016, Version of Record 20 December 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.06.031