Part of Advances in Neural Information Processing Systems 29 (NIPS 2016)
Vitaly Feldman
In stochastic convex optimization the goal is to minimize a convex function F(x)≐\Ef∼D[f(x)] over a convex set \K⊂\Rd where D is some unknown distribution and each f(⋅) in the support of D is convex over \K. The optimization is based on i.i.d.~samples f1,f2,…,fn from D. A common approach to such problems is empirical risk minimization (ERM) that optimizes FS(x)≐1n∑i≤nfi(x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of FS to F over \K. We demonstrate that in the standard ℓp/ℓq setting of Lipschitz-bounded functions over a \K of bounded radius, ERM requires sample size that scales linearly with the dimension d. This nearly matches standard upper bounds and improves on Ω(logd) dependence proved for ℓ2/ℓ2 setting in (Shalev-Shwartz et al. 2009). In stark contrast, these problems can be solved using dimension-independent number of samples for ℓ2/ℓ2 setting and logd dependence for ℓ1/ℓ∞ setting using other approaches. We also demonstrate that for a more general class of range-bounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2.