Reviews: Implicit Regularization for Optimal Sparse Recovery

The authors study a reparametrization of the least squares problem such that early-stopped gradient descent mimicks L1 penalization. The idea is very creative and the reviewers were mostly positive. However, several important critiques were raised. I myself am also leaning towards the positive, but want to see the paper improved in several places in a camera-ready version. The intuition for the reparametrization needs to be much more prominent. Also, the main pros and cons of the new proposal could be more properly highlighted. Is this just a method that is restricted in a sense to be used under RIP, practically speaking? Or should we think of this as a general-purpose tool (like the lasso), with theory that describes it behavior in idealized cases? Also, the presentation in the experimental section can be improved; the figures are currently way too small to read properly. Also, the coordinate descent (aka forward stagewise) view should be more properly highlighted, explained, and comapred. This is of course the main "competitor" in that it is a simple iterative algorithm that stopped early produces something like L1 regularization. In addition to the references given in the related work section, the authors should pay attention to Tibshirani (2015), "A General Framework for Fast Stagewise Algorithms" where the connection between stagewise and L1 regularization is clearly/intuitively explained.

Paper ID:	1696
Title:	Implicit Regularization for Optimal Sparse Recovery