Statistical modeling of dissimilarity increments for d-dimensional data: Application in partitional clustering

作者:

Highlights:

摘要

This paper addresses the use of high order dissimilarity models in data mining problems. We explore dissimilarities between triplets of nearest neighbors, called dissimilarity increments (DIs). We derive a statistical model of DIs for d-dimensional data (d-DID) assuming that the objects follow a multivariate Gaussian distribution. Empirical evidence shows that the d-DID is well approximated by the particular case d=2. We propose the application of this model in clustering, with a partitional algorithm that uses a merge strategy on Gaussian components. Experimental results, in synthetic and real datasets, show that clustering algorithms using DID usually outperform well known clustering algorithms.

论文关键词:Dissimilarity increments,Partitional clustering,Likelihood-ratio test,Minimum description length,Gaussian mixture decomposition

论文评审过程:Available online 23 December 2011.

论文官网地址:https://doi.org/10.1016/j.patcog.2011.12.009