Categorical data clustering: What similarity measure to recommend?

作者:

Highlights:

• The clustering problem of categorical data resides in choosing the similarity measure.

• There are several similarity measures from the ones based on simple matching up to the most complex.

• We raise the issue: is there a similarity measure containing characteristics that are more stable?

• We compared nine different similarity measures considering three quality measures.

• We observed that the simplest measure of similarity presented the best results.

摘要

•The clustering problem of categorical data resides in choosing the similarity measure.•There are several similarity measures from the ones based on simple matching up to the most complex.•We raise the issue: is there a similarity measure containing characteristics that are more stable?•We compared nine different similarity measures considering three quality measures.•We observed that the simplest measure of similarity presented the best results.

论文关键词:Categorical data,Clustering,Clustering criterion,Clustering goal,Similarity

论文评审过程:Available online 28 September 2014.

论文官网地址:https://doi.org/10.1016/j.eswa.2014.09.012