On voting-based consensus of cluster ensembles

作者:

Highlights:

摘要

Voting-based consensus clustering refers to a distinct class of consensus methods in which the cluster label mismatch problem is explicitly addressed. The voting problem is defined as the problem of finding the optimal relabeling of a given partition with respect to a reference partition. It is commonly formulated as a weighted bipartite matching problem. In this paper, we present a more general formulation of the voting problem as a regression problem with multiple-response and multiple-input variables. We show that a recently introduced cumulative voting scheme is a special case corresponding to a linear regression method. We use a randomized ensemble generation technique, where an overproduced number of clusters is randomly selected for each ensemble partition. We apply an information theoretic algorithm for extracting the consensus clustering from the aggregated ensemble representation and for estimating the number of clusters. We apply it in conjunction with bipartite matching and cumulative voting. We present empirical evidence showing substantial improvements in clustering accuracy, stability, and estimation of the true number of clusters based on cumulative voting. The improvements are achieved in comparison to consensus algorithms based on bipartite matching, which perform very poorly with the chosen ensemble generation technique, and also to other recent consensus algorithms.

论文关键词:Clustering,Cluster ensembles,Voting-based consensus

论文评审过程:Received 31 March 2009, Revised 20 October 2009, Accepted 12 November 2009, Available online 20 November 2009.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.11.012