Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes an online stochastic optimization algorithm (similar to SGD) that has optimal convergence rate of the last iterate in two settings (O(1/sqrt(T)) for Lipschitz convex functions and O(1/T) strongly convex functions), and additionally it allows an arbitrary non-smooth regularizer (e.g. L1-norm to induce sparsity). Many subsets of the properties are achieved by prior works. Namely, it was known how to achieve these results up to O(log T) factors. It was known how to achieve the optimal rates with averaging, which, however, destroys sparsity. However, this paper has the first algorithm that has all the properties simultaneously and removes the log factors. The paper has rigorous proofs of the convergence rates and extensive numerical experiments. While many mathematical techniques used in the convergence analysis are standard, several key steps in the algorithm and the analysis are non-trivial, which make this paper an important contribution to stochastic optimization research. The paper is clearly written, the proofs appear to be correct. The authors addressed all the suggestions and questions from the reviewers in their rebuttal.