Hierarchical clustering algorithm for categorical data using a probabilistic rough set model

作者:

Highlights:

摘要

Several clustering analysis techniques for categorical data exist to divide similar objects into groups. Some are able to handle uncertainty in the clustering process, whereas others have stability issues. In this paper, we propose a new technique called TMDP (Total Mean Distribution Precision) for selecting the partitioning attribute based on probabilistic rough set theory. On the basis of this technique, with the concept of granularity, we derive a new clustering algorithm, MTMDP (Maximum Total Mean Distribution Precision), for categorical data. The MTMDP algorithm is a robust clustering algorithm that handles uncertainty in the process of clustering categorical data. We compare the MTMDP algorithm with the MMR (Min–Min–Roughness) algorithm which is the most relevant clustering algorithm, and also compared it with other unstable clustering algorithms, such as k-modes, fuzzy k-modes and fuzzy centroids. The experimental results indicate that the MTMDP algorithm can be successfully used to analyze grouped categorical data because it produces better clustering results.

论文关键词:Cluster analysis,Categorical data,Probabilistic rough sets,Distribution approximation precision,Approximation accuracy

论文评审过程:Received 8 April 2013, Revised 4 April 2014, Accepted 5 April 2014, Available online 18 April 2014.

论文官网地址:https://doi.org/10.1016/j.knosys.2014.04.008