Statistical Analysis of Semi-Supervised Regression

Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper


Larry Wasserman, John Lafferty


Semi-supervised methods use unlabeled data in addition to labeled data to con- struct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regularization using graph Laplacians do not lead to faster minimax rates of con- vergence. Thus, the estimators that use the unlabeled data do not have smaller risk than the estimators that use only labeled data. We then develop several new approaches that provably lead to improved performance. The statistical tools of minimax analysis are thus used to offer some new perspective on the problem of semi-supervised learning.