Accelerated Gradient Methods for Stochastic Optimization and Online Learning[PDF] [BibTeX]
Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., $\ell_1$-regularizer). Gradient descent methods, though highly scalable and easy to implement, are known to converge slowly on these problems. In this paper, we develop novel accelerated gradient methods for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic optimization with both convex and strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple but powerful algorithm.