Information-theoretic clustering: A representative and evolutionary approach
作者:
Highlights:
•
摘要
This paper proposes a new perspective on non-parametric entropy-based clustering. We developed a new cost evaluation function for clustering that measures the cross information potential (CIP) between clusters on a dataset using representative points, which we called representative CIP (rCIP). We did this based on the idea that optimizing the cross information potential is equivalent to minimizing cross entropy between clusters. Our measure is different because, instead of using all points in a dataset, it uses only representative points to quantify the interaction between distributions without any loss of the original properties of cross information potential. This brings a double advantage: decreases the computational cost of computing the cross information potential, thus drastically reducing the running time, and uses the underlying statistics of the space region where representative points are in order to measure interaction. With this, created a useful non-parametric estimator of entropy and makes possible using cross information potential in applications where it was not. Due to the nature of clustering problems, we proposed a genetic algorithm in order to use rCIP as cost function. We ran several tests and compared the results with single linkage hierarchical algorithm, finite mixture of Gaussians and spectral clustering in both synthetic and real image segmentation datasets. Experiments showed that our approach achieved better results compared to the other algorithms and it was capable of capture the real structure of the data in most cases regardless of its complexity. It also produced good image segmentation with the advantage of a tuning parameter that provides a way of refining segmentation.
论文关键词:Clustering,Entropy,Information theory,Information potential,Complex data,Image segmentation
论文评审过程:Available online 23 January 2013.
论文官网地址:https://doi.org/10.1016/j.eswa.2013.01.027