Clustering ensemble selection considering quality and diversity

作者:Sadr-olah Abbasi, Samad Nejatian, Hamid Parvin, Vahideh Rezaie, Karamolah Bagherifard

摘要

It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can’t derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.

论文关键词:Clustering ensemble, Stability measure, Improved stability, Evidence accumulation, Extended EAC, Co-association matrix, Cluster evaluation

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10462-018-9642-2