Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)

Bibtex Metadata Paper


Boaz Nadler, Nathan Srebro, Xueyuan Zhou


We study the behavior of the popular Laplacian Regularization method for Semi-Supervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points. We show that in $\R^d$, $d \geq 2$, the method is actually not well-posed, and as the number of unlabeled points increases the solution degenerates to a noninformative function. We also contrast the method with the Laplacian Eigenvector method, and discuss the ``smoothness assumptions associated with this alternate method.