Fast and scalable support vector clustering for large-scale data analysis

作者:Yuan Ping, Yun Feng Chang, Yajian Zhou, Ying Jie Tian, Yi Xian Yang, Zhili Zhang

摘要

As an important boundary-based clustering algorithm, support vector clustering (SVC) benefits multiple applications for its capability of handling arbitrary cluster shapes. However, its popularity is degraded by both its highly intensive pricey computation and poor label performance which are due to redundant kernel function matrix required by estimating a support function and ineffectively checking segmers between all pairs of data points, respectively. To address these two problems, a fast and scalable SVC (FSSVC) method is proposed in this paper to achieve significant improvement on efficiency while guarantees a comparable accuracy with the state-of-the-art methods. The heart of our approach includes (1) constructing the hypersphere and support function by cluster boundaries which prunes unnecessary computation and storage of kernel functions and (2) presenting an adaptive labeling strategy which decomposes clusters into convex hulls and then employs a convex-decomposition-based cluster labeling algorithm or cone cluster labeling algorithm on the basis of whether the radius of the hypersphere is greater than 1. Both theoretical analysis and experimental results (e.g., the first rank of a nonparametric statistical test) show the superiority of our method over the others, especially for large-scale data analysis under limited memory requirements.

论文关键词:Large-scale problem, Support vector clustering, Convex decomposition, Cluster boundary, Cluster labeling

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10115-013-0724-9