Extended Gaussian kernel version of fuzzy c-means in the problem of data analyzing

作者:

Highlights:

摘要

Fuzzy c-means clustering with spatial constraints is considered as suitable algorithm for data clustering or data analyzing. But FCM has still lacks enough robustness to employ with noise data, because of its Euclidean distance measure objective function for finding the relationship between the objects. It can only be effective in clustering ‘spherical’ clusters, and it may not give reasonable clustering results for “non-compactly filled” spherical data such as “annular-shaped” data. This paper realized the drawbacks of the general fuzzy c-mean algorithm and it tries to introduce an extended Gaussian version of fuzzy C-means by replacing the Euclidean distance in the original object function of FCM. Firstly, this paper proposes initial kernel version of fuzzy c-means to aim at simplifying its computation and then extended it to extended Gaussian kernel version of fuzzy c-means. It derives an effective method to construct the membership matrix for objects, and it derives a robust method for updating centers from extended Gaussian version of fuzzy C-means. Furthermore, this paper proposes a new prototypes learning method and it obtains initial cluster centers using new mathematical initialization centers for the new effective objective function of fuzzy c-means, so that this paper tries to minimize the iteration of algorithms to obtain more accurate result. Initial experiment will be done with an artificially generated data to show how effectively the new proposed Gaussian version of fuzzy C-means works in obtaining clusters, and then the proposed methods can be implemented to cluster the Wisconsin breast cancer database into two clusters for the classes benign and malignant. To show the effective performance of proposed fuzzy c-means with new initialization of centers of clusters, this work compares the results with results of recent fuzzy c-means algorithm; in addition, it uses Silhouette method to validate the obtained clusters from breast cancer datasets.

论文关键词:Cluster centers,Fuzzy c-means,Memberships,Objective function,Data grouping

论文评审过程:Available online 8 October 2010.

论文官网地址:https://doi.org/10.1016/j.eswa.2010.09.040