Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)
Christos Boutsidis, Petros Drineas, Michael W. Mahoney
We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ∈(0,1), selects and appropriately rescales in an unsupervised manner Θ(klog(k/ϵ)/ϵ2) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ≥1) on the features selected using our method, we can find a (1+(1+ϵ)γ)-approximate partition with high probability.