A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm
作者:Fang Yuan, Youlong Yang, Tiantian Yuan
摘要
Among the existing clustering algorithms, the k-Means algorithm is one of the most commonly used clustering methods. As an extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. However, there are more mixed-type data containing categorical, ordinal and numerical attributes. Mixed-type data clustering problem has recently attracted much attention from the data mining research community, but most of them fail to notice the ordinal attributes and establish explicit metric similarity of ordinal attributes. In this paper, the limitations of some existing dissimilarity measure of k-Modes algorithm in mixed ordinal and nominal data are analyzed by using some illustrative examples. Based on the idea of mining ordinal information of ordinal attribute, a new dissimilarity measure for the k-Modes algorithm to cluster this type of data is proposed. The distinct characteristic of the new dissimilarity measure is to take account of the ordinal information of ordinal attribute. A convergence study and time complexity of the k-Modes algorithm based on this new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on nine real data sets from UCI show the effectiveness of the new dissimilarity measure.
论文关键词:Clustering algorithm, Mixed-type data, k-Modes algorithm, Dissimilarity measure of ordinal attribute
论文评审过程:
论文官网地址:https://doi.org/10.1007/s10489-019-01583-5