Gradual model generator for single-pass clustering

作者:

Highlights:

摘要

We present an algorithm for generating a mixture model from a data set by converting the data into a model. The method is applicable when only part of the data fits in the main memory at the same time. The generated model is a Gaussian mixture model but the algorithm can be adapted to other types of models, too. The user cannot specify the size of the generated model. We also introduce a post-processing method, which can reduce the size of the model without using the original data. This will result in a more compact model with fewer components, but with approximately the same representation accuracy as the original model. Our comparisons show that the algorithm produces good results and is quite efficient. The whole process requires only 0.5–10% of the time spent by the expectation-maximization algorithm.

论文关键词:Clustering,Gaussian mixture model,Single-pass,Large data sets

论文评审过程:Received 20 July 2005, Revised 19 May 2006, Accepted 22 June 2006, Available online 20 September 2006.

论文官网地址:https://doi.org/10.1016/j.patcog.2006.06.023