Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Yossi Arjevani
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sums problems. First, we show that perhaps surprisingly, the finite sum structure, by itself, is not sufficient for obtaining a complexity bound of ~\cO((n+L/μ)ln(1/ϵ)) for L-smooth and μ-strongly convex finite sums - one must also know exactly which individual function is being referred to by the oracle at each iteration. Next, we show that for a broad class of first-order and coordinate-descent finite sums algorithms (including, e.g., SDCA, SVRG, SAG), it is not possible to get an `accelerated' complexity bound of ~\cO((n+√nL/μ)ln(1/ϵ)), unless the strong convexity parameter is given explicitly. Lastly, we show that when this class of algorithms is used for minimizing L-smooth and non-strongly convex finite sums, the optimal complexity bound is ~\cO(n+L/ϵ), assuming that (on average) the same update rule is used for any iteration, and ~\cO(n+√nL/ϵ), otherwise.