Clustering Large Graphs via the Singular Value Decomposition

作者:P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay

摘要

We consider the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters (usually m and n are variable, while k is fixed), so as to minimize the sum of squared distances between each point and its cluster center. This formulation is usually the objective of the k-means clustering algorithm (Kanungo et al. (2000)). We prove that this problem in NP-hard even for k = 2, and we consider a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points. This relaxation can be solved by computing the Singular Value Decomposition (SVD) of the m × n matrix A that represents the m points; this solution can be used to get a 2-approximation algorithm for the original problem. We then argue that in fact the relaxation provides a generalized clustering which is useful in its own right.

论文关键词:Singular Value Decomposition, randomized algorithms, k-means clustering

论文评审过程:

论文官网地址:https://doi.org/10.1023/B:MACH.0000033113.59016.96