Scalable model-based cluster analysis using clustering features

作者:

Highlights:

摘要

We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm—EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.

论文关键词:Cluster analysis,Data mining,Scalable,Gaussian mixture model,Expectation maximization,Clustering feature,Convergence

论文评审过程:Received 9 May 2003, Revised 15 July 2004, Accepted 15 July 2004, Available online 7 January 2005.

论文官网地址:https://doi.org/10.1016/j.patcog.2004.07.012