Cooperative clustering

作者:

Highlights:

摘要

Data clustering plays an important role in many disciplines, including data mining, machine learning, bioinformatics, pattern recognition, and other fields, where there is a need to learn the inherent grouping structure of data in an unsupervised manner. There are many clustering approaches proposed in the literature with different quality/complexity tradeoffs. Each clustering algorithm works on its domain space with no optimum solution for all datasets of different properties, sizes, structures, and distributions. In this paper, a novel cooperative clustering (CC) model is presented. It involves cooperation among multiple clustering techniques for the goal of increasing the homogeneity of objects within the clusters. The CC model is capable of handling datasets with different properties by developing two data structures, a histogram representation of the pair-wise similarities and a cooperative contingency graph. The two data structures are designed to find the matching sub-clusters between different clusterings and to obtain the final set of clusters through a coherent merging process. The cooperative model is consistent and scalable in terms of the number of adopted clustering approaches. Experimental results show that the cooperative clustering model outperforms the individual clustering algorithms over a number of gene expression and text documents datasets.

论文关键词:Cooperative clustering,Similarity histogram,Cooperative contingency graph

论文评审过程:Received 15 October 2008, Revised 17 December 2009, Accepted 27 December 2009, Available online 11 January 2010.

论文官网地址:https://doi.org/10.1016/j.patcog.2009.12.018