A novel scale-invariant, dynamic method for hierarchical clustering of data affected by measurement uncertainty

作者:

Highlights:

摘要

An enhanced technique for hierarchical agglomerative clustering is presented. Classical clusterings suffer from non-uniqueness, resulting from the adopted scaling of data and from the arbitrary choice of the function to measure the proximity between elements. Moreover, most classical methods cannot account for the effect of measurement uncertainty on initial data, when present.To overcome these limitations, the definition of a weighted, asymmetric function is introduced to quantify the proximity between any two elements. The data weighting depends dynamically on the degree of advancement of the clustering procedure. The novel proximity measure is derived from a geometric approach to the clustering, and it allows to both disengage the result from the data scaling, and to indicate the robustness of a clustering against the measurement uncertainty of initial data.The method applies to both flat and hierarchical clustering, maintaining the computational cost of the classical methods.

论文关键词:62H30,68T99,Hierarchical clustering,Non-uniqueness,Proximity measure,Computational cost,Uncertainty

论文评审过程:Received 31 May 2017, Revised 30 May 2018, Available online 5 June 2018, Version of Record 18 June 2018.

论文官网地址:https://doi.org/10.1016/j.cam.2018.05.062