A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

作者:

Highlights:

摘要

In this paper an adaptive hierarchical fuzzy clustering algorithm is presented, named Hierarchical Data Divisive Soft Clustering (H2D-SC). The main novelty of the proposed algorithm is that it is a quality driven algorithm, since it dynamically evaluates a multi-dimensional quality measure of the clusters to drive the generation of the soft hierarchy. Specifically, it generates a hierarchy in which each node is split into a variable number of sub-nodes, determined by an innovative quality assessment of soft clusters, based on the evaluation of multiple dimensions such as the cluster’s cohesion, its cardinality, its mass, and its fuzziness, as well as the partition’s entropy. Clusters at the same hierarchical level share a minimum quality value: clusters in the lower levels of the hierarchy have a higher quality; this way more specific clusters (lower level clusters) have a higher quality than more general clusters (upper level clusters). Further, since the algorithm generates a soft partition, a document can belong to several sub-clusters with distinct membership degrees. The proposed algorithm is divisive, and it is based on a combination of a modified bisecting K-Means algorithm with a flat soft clustering algorithm used to partition each node. The paper describes the algorithm and its evaluation on two standard collections.

论文关键词:Soft Hierarchical Clustering,Fuzzy C-Means,Cluster’s quality,Document clustering,Quality measures

论文评审过程:Received 7 December 2010, Revised 16 June 2011, Accepted 17 June 2011, Available online 1 July 2011.

论文官网地址:https://doi.org/10.1016/j.knosys.2011.06.012