General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances

作者:

Highlights:

摘要

Agglomerative clustering methods with stopping criteria are generalized. Clustering-related concepts are rigorously formulated with special consideration on metricity of object space. A new definition of combinatoriality is given, and a stronger proposition of monotonicity is proven. Specializations of the general method are applied to non-attributive non-metric and attributive pseudometric representations of biosequences. The furthest neighbor method is shown suitable for non-metric use. In metric object space, four inter-clusteral distance functions, including a new truly context sensitive method, are compared using a method-independent goodness criterion. For biosequence clustering, the new method overcomes the UPGMA, UPGMC, and furthest neighbor methods.

论文关键词:Agglomerative clustering,Metricity,Combinatoriality,Monotonicity,Stopping criteria,Goodness,UPGMA,Furthest neighbor,Sequence similarity,Representatives

论文评审过程:Received 14 December 1992, Revised 3 March 1993, Accepted 3 March 1993, Available online 19 May 2003.

论文官网地址:https://doi.org/10.1016/0031-3203(93)90145-M