This is a very interesting submission studying the effect of L2 regularization on training overparameterized networks. Significant and intuitive theoretical contributions are made that are nicely backed by empirical evaluation. We strongly urge the authors to incorporate their rebuttal points on learning rate schedules.