Model-based cluster analysis
作者:
Highlights:
•
摘要
The problem of dot clustering is studied from a model-based viewpoint. A set of “placement” processes is chosen, each of which associates a probability with each location in a discrete space; in other words, a placement is a probability mass function (pmf) on the space. A number of dots is then distributed in accordance with each of these pmfs; the pmf and its associated cardinality define a subpopulation of dots. This model is extremely general; the pmfs are arbitrary. Given a set of dots generated by such a model, maximum a posteriori (MAP) methods are applied to recover the most likely set of placements and cardinalities that could have given rise to the dots. This identification problem is different from the partitioning problem, which asks for the most likely partition of the dot population into subpopulations. It is shown how and why MAP methods are useful in cluster analysis, especially when the placement pmfs are non-Gaussian. It is also shown that although the general identification problem is intractable, there is a polynomial time solution if the number of subpopulations is bounded. It is shown that a similar result holds for the partitioning problem.
论文关键词:Cluster analysis,Dot clusters,MAP estimation,Population models
论文评审过程:Received 1 July 1992, Accepted 16 December 1992, Available online 19 May 2003.
论文官网地址:https://doi.org/10.1016/0031-3203(93)90061-Z