Diversity based cluster weighting in cluster ensemble: an information theory approach

作者:Frouzan Rashidi, Samad Nejatian, Hamid Parvin, Vahideh Rezaie

摘要

Clustering ensemble has been increasingly popular in the recent years by consolidating several base clustering methods into a probably better and more robust one. However, cluster dependability has been ignored in the majority of the presented clustering ensemble methods that exposes them to the risk of the low-quality base clustering methods (and consequently the low-quality base clusters). In spite of some attempts made to evaluate the clustering methods, it seems that they consider each base clustering individually regardless of the diversity. In this study, a new clustering ensemble approach has been proposed using a weighting strategy. The paper has presented a method for performing consensus clustering by exploiting the cluster uncertainty concept. Indeed, each cluster has a contribution weight computed based on its undependability. All of the predicted cluster tags available in the ensemble are used to evaluate a cluster undependability based on an information theoretic measure. The paper has proposed two measures based on cluster undependability or uncertainty to estimate the cluster dependability or certainty. The multiple clusters are reconciled through the cluster uncertainty. A clustering ensemble paradigm has been proposed through the proposed method. The paper has proposed two approaches to achieve this goal: a cluster-wise weighted evidence accumulation and a cluster-wise weighted graph partitioning. The former approach is based on hierarchical agglomerative clustering and co-association matrices, while the latter is based on bi-partite graph formulating and partitioning. In the first step of the former, the cluster-wise weighing co-association matrix is proposed for representing a clustering ensemble. The proposed approaches have been then evaluated on 19 real-life datasets. The experimental evaluation has revealed that the proposed methods have better performances than the competing methods; i.e. through the extensive experiments on the real-world datasets, it has been concluded that the proposed method outperforms the state-of-the-art. The substantial experiments on some benchmark data sets indicate that the proposed methods can effectively capture the implicit relationship among the objects with higher clustering accuracy, stability, and robustness compared to a large number of the state-of-the-art techniques, supported by statistical analysis.

论文关键词:Clustering, Clustering ensemble, Consensus function, Cluster dependability, Cluster weighting

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-019-09701-y