NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:1909
Title:First order expansion of convex regularized estimators


		
The present paper proposes an approximation, based on the first order Taylor expansion of convex regularizer. In the regularized regression setting and under some mild condition on the loss function and the underlying distribution that generates the data, the authors prove that one can replace the regularization term of the regression algorithm by its Taylor approximation and have a guarantee that the solution obtain with this approximation will be close to the original solution (according to the Mahalanobis distance). The authors give then examples of such proxy for square loss and logistic regression and also for Constrained Lasso, Penalized Lasso and Group Lasso. The paper also proposes a discussion where this approach can be useful. Although this paper is a bit technical, it is well written and the result are on my opinion non trivial and interesting. The reviewers points out that the approach needs to the user to define some set $T$ with precise properties in order to make this approach working for a particular regression algorithm. In fact it has been the principal weakness raised by the reviewers. On my opinion, this is indeed a problem, but not big enough to prevent acceptation. Logistic regression, Lasso and Group Lasso are important enough to justify the interest of the approach. Moreover, in the rebuttal, the authors pointed out that this set $T$ has already been used in other situations in the literature: “Sets T and their gaussian complexity have already been studied for most high-dimensional estimators by many authors: (Group-)Lasso, Slope [6], Nuclear norm [32], tensor norms, etc, see surveys [4, 16, 23, 32]. For all these examples, set T is already available and extension of our results to such penalty straightforward–we’ll clarify this.” So I decide not to take too much account of this “weakness” in my final decision. Indeed, I consider the results contain in this paper as non –trivial and interesting for the supervised regression community, and I therefore recommend its acceptation.