A theory of proximity based clustering: structure detection by optimization

作者:

Highlights:

摘要

In this paper, a systematic optimization approach for clustering proximity or similarity data is developed. Starting from fundamental invariance and robustness properties, a set of axioms is proposed and discussed to distinguish different cluster compactness and separation criteria. The approach covers the case of sparse proximity matrices, and is extended to nested partitionings for hierarchical data clustering. To solve the associated optimization problems, a rigorous mathematical framework for deterministic annealing and mean-field approximation is presented. Efficient optimization heuristics are derived in a canonical way, which also clarifies the relation to stochastic optimization by Gibbs sampling. Similarity-based clustering techniques have a broad range of possible applications in computer vision, pattern recognition, and data analysis. As a major practical application we present a novel approach to the problem of unsupervised texture segmentation, which relies on statistical tests as a measure of homogeneity. The quality of the algorithms is empirically evaluated on a large collection of Brodatz-like micro-texture Mondrians and on a set of real–word images. To demonstrate the broad usefulness of the theory of proximity based clustering the performances of different criteria and algorithms are compared on an information retrieval task for a document database. The superiority of optimization algorithms for clustering is supported by extensive experiments.

论文关键词:Clustering,Proximity data,Similarity,Deterministic annealing,Texture segmentation,Document retrieval

论文评审过程:Received 15 March 1999, Available online 7 June 2001.

论文官网地址:https://doi.org/10.1016/S0031-3203(99)00076-X