A new distance between multivariate clusters of varying locations, elliptical shapes, and directions

作者:

Highlights:

• Proposing of new method for measuring the distance between pairs of clusters in the dataset.

• The proposed distance accurately captures both the variability of the cluster centers as well as the variability of shapes and directions of their respective covariance matrices

• The method has a number of advantages including simplicity, interpretability, and computational efficiency

• Both the classical and the robust versions of the distances are provided

• The distance is illustrated by several motivating examples that demonstrate the need of the new proposed distance and applied to both real and synthetic data

• Proving that the Ward distance and the Euclidian distance are equivalent.

摘要

•Proposing of new method for measuring the distance between pairs of clusters in the dataset.•The proposed distance accurately captures both the variability of the cluster centers as well as the variability of shapes and directions of their respective covariance matrices•The method has a number of advantages including simplicity, interpretability, and computational efficiency•Both the classical and the robust versions of the distances are provided•The distance is illustrated by several motivating examples that demonstrate the need of the new proposed distance and applied to both real and synthetic data•Proving that the Ward distance and the Euclidian distance are equivalent.

论文关键词:Clustering methods,Complete linkage,Elliptical distance,Euclidean distance,Hamming distance,Hierarchical clustering,Iris data,K-Means clustering,Manhattan distance,Single linkage,Robust estimation,Ward method

论文评审过程:Received 22 July 2021, Revised 25 April 2022, Accepted 6 May 2022, Available online 8 May 2022, Version of Record 12 May 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108780