Cluster validity profiles

作者:

Highlights:

摘要

The quantitative evaluation of clusters has lagged far behind the development of clustering algorithms. This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data. Probability profiles furnish a comprehensive picture of the compactness and isolation of a cluster, scaled in probability units. Given a rank-order proximity matrix and a cluster to be examined, profiles compare the cluster's upper bounds on the best compactness and isolation indices one would expect in a randomly chosen graph.After reviewing the pertinent literature this paper explains the background from graph theory and cluster analysis needed to treat cluster validity. The probabilities and bounds needed to form cluster profiles are derived and strategies for using profiles are suggested. Special attention is given to the underlying probability models.Profiles are demonstrated on four artificially generated data sets, two of which have good hierarchical structure, and on data from a speaker recognition project. They reject spurious clusters and accept apparently valid clusters. Since profiles quantify the interaction between a cluster and its environment, they provide a much richer source of information on cluster structure than single-number indices proposed in the literature.

论文关键词:Clustering,Cluster validity,Verification,Random graph,Single-link method,Complete-link method

论文评审过程:Received 7 July 1980, Revised 27 January 1981, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(82)90002-4