Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper examines the implicit bias of AdaGrad on linear classification with separable data. The authors show that using a sufficiently small step size, Adagrad converges in the direction of the minimal norm max-margin solution. The paper offers several interesting insights, including the novel convergence results for AdaGrad and interesting 2D examples. Overall, a good addition to the results focusing on understanding the implicit bias of optimization algorithms. Based on the feedback from the reviewers, the authors are encouraged to include some numerical results to confirm their theoretical findings and provide more proof details and novel ideas/insights in the main paper.