Rough set approach for clustering categorical data using information-theoretic dependency measure

作者:

Highlights:

摘要

A variety of clustering algorithms exists to group objects having similar characteristics. But the implementations of many of those algorithms are challenging in the process of dealing with categorical data. While some of the algorithms cannot handle categorical data, others are unable to handle uncertainty within categorical data in nature. This is prerequisite for clustering categorical data which also deal with uncertainty. An algorithm, termed minimum-minimum roughness (MMR) was proposed, which uses the rough set theory in order to deal with the above problems in clustering categorical data. Later many algorithms has developed to improve the handling of hybrid data. This research proposes information-theoretic dependency roughness (ITDR), another technique for categorical data clustering taking into account information-theoretic attributes dependencies degree of categorical-valued information systems. In addition, it is second to none of all its predecessors; MMR, MMeR, SDR and standard-deviation of standard-deviation roughness (SSDR). Experimental results on two benchmark UCI datasets show that ITDR technique is better with the baseline categorical data clustering technique with respect to computational complexity and the purity of clusters.

论文关键词:Clustering,Categorical data,Rough set theory,Information system,Attribute dependency

论文评审过程:Received 8 January 2014, Accepted 3 June 2014, Available online 14 July 2014.

论文官网地址:https://doi.org/10.1016/j.is.2014.06.008