Density-sensitive fuzzy kernel maximum entropy clustering algorithm

作者:

Highlights:

摘要

Maximum entropy clustering algorithm (ME) has lately received great attention for its high performance in large-scale data clustering and simplicity in implementation. However, previous studies have demonstrated that different clusters obtained by traditional ME tend to converge to the same one during its process of iteration affected by regularization coefficient and these cluster centers are subject to bias due to its sensitivity to different distributions of objects. These drawbacks of traditional ME can result in its failure of revealing the natural groupings in most datasets, especially in non-Gaussian distributed datasets. In order to address those limitations, we present a novel density-sensitive fuzzy kernel maximum entropy clustering algorithm in this paper. In the proposed approach, to accommodate non-Gaussian distributed cases, the dataset to be clustered in the original space is firstly implicitly mapped into high-dimensional feature space through the kernel function. By introducing the kernel function-based similarity terms in the update formula of the cluster centers, the effect of the objects not belonging to the current cluster on the update of its corresponding center can be counteracted, and simultaneously the influence of regularization coefficient on the clustering result is restricted as well, which can effectively overcome the convergence of the different clusters encountered by traditional ME. In addition, in order to prevent cluster centers from biases caused by the different distribution of the objects in the feature space, the relative density-based weights are also incorporated into the cost function, which can help the proposed approach produce more reasonable and accurate clustering results. In the experiments, the influence of the different parameters on the clustering performance is discussed in detail and some suggestions are also provided. Theoretical analysis and experimental results on several synthetic datasets, UCI benchmark datasets and generated large MNIST handwritten digits datasets demonstrate that the proposed approach is superior to other existing clustering techniques with good robustness.

论文关键词:Clustering,Relative density-based weight,Maximum entropy clustering algorithm,Robustness

论文评审过程:Received 15 May 2018, Revised 9 September 2018, Accepted 5 December 2018, Available online 21 December 2018, Version of Record 23 January 2019.

论文官网地址:https://doi.org/10.1016/j.knosys.2018.12.007