NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 1069 SEGA: Variance Reduction via Gradient Sketching

### Reviewer 1

This paper presents a first order optimization method for composite optimization problems called SEGA (SkEtched GrAdient method). The method can be applied to problems where the smooth part of the of objective is strongly convex, the regularizer must be convex but not necessarily separable, and the proximal operator of the regularizer must also be available. The set up for this algorithm is that the user does not have access to the true gradient, but instead, an oracle returns a random linear transformation of the gradient. A random variable $\theta_k$ is introduced that is used to construct an unbiased estimate of the gradient. SEGA is a variance reduced algorithm because as iterations progress, the variance of the gradient estimate $g^k$ goes to $0$. The authors provide convergence theory for their algorithm, including proving a linear rate of convergence for SEGA under a smoothness and strong convexity assumption. A comparison of the complexity results for SEGA with complexity results for various coordinate descent methods is given in Table 1, and the results of several numerical experiments are also presented to demonstrate the practical performance of SEGA. The paper is well written and organized, although I encourage the authors to proof-read that paper carefully for typos (spelling and grammatical errors). The authors also took care to explain their work well, providing examples to help with understanding, and also referenced specific places in the appendices that supported claims made in the main body of the paper (this made it much easier to read/find).