Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Ulrike Luxburg, Olivier Bousquet, Mikhail Belkin
An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spec- tral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normal- ized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods.