NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5520 Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

### Reviewer 1

The paper is of great quality, both in terms of the writing style and the technical content. One question I have, which is not very well address in the paper, is if the sample complexity upper bound of 1/\gamma^2 \epsilon^2 is optimal especially under the restricted approximation factor considered in Theorem 3.2 and Theorem 3.3, and would appreciate if the authors can shed some light on the same. There are some technical typos in the paper, which I am sure the authors will fix in the final version. Some of them are listed below: 1. Line 110, should be w^{i} in the RHS instead of w^{i+1}. 2. Like 256, incomplete parenthesis. 3. Like 268, should be 1/gamma^2 instead of 1/\gamma. 4. Line 254, by the standard arguments, it seems that the \epsilon term should contain a multiplicative factor of 1/\gamma. __________________ Post Author Feedback: Thanks for the feedback. I still strongly support to accept the paper.

### Reviewer 2

This paper studies proper agnostic learning of the class of large margin halfspaces. Prior to this work the best known result of [BS00] proposed a proper learning algorithm that achieves error OPT_gamma + eps in time poly(d/eps)2^O(log(1/eps) gamma^2). The current paper shows that if one is happy with a PTAS, i.e, (1+delta)OPT_gamma + eps, then there is a proper robust learning algorithm that uses O(1/eps^2 \gamma^2) samples and runs in time poly(d/eps)2^O(1/\gamma^2) (for constant delta). The paper also shows a matching lower bound that any constant factor approximation that is a proper learner must incur exp(1/gamma) run time. Techniques: --------------- The key algorithmic insight is that if the current guess w has a large error, namely more than (1+delta)OPT then there must be many examples where w has margin less than gamma/2 whereas w*, the optimal halfspaces has margin more than gamma. This implies that w-w* has a large projection onto the matrix M = E[xx^T] where the expectation is over examples where w has low margin. Hence, in time exp(1/gamma^2) one can try all large projections onto the subspace, add them to the current vector to produce a new set of candidate vectors and proceed. One then needs to argue that the depth of this search tree is small. Overall the paper is well written and makes a solid contribution towards the understanding of proper agnostic learning of large margin halfspaces.

### Reviewer 3

This paper gives a new algorithm for properly agnostically learning halfspaces with a misclassification error that is epsilon-far from (almost) the optimal gamma-margin error (up to some small approximation factor), with a run time that is poly(d/epsilon)2^(1/gamma^2). This result can be compared to previous work, which gave an algorithm that is poly(d) (1/epsilon)^(1/gamma^2), but gave an algorithm with no approximation factor. The proposed algorithm uses an interesting iterative approach, wherein in each stage, a projection that is aligned with the difference between the current halfspace and the optimal half space is sought, using the part of the distribution which incurs a low margin under the current half space. The paper also provides lower bounds on the computational complexity of any proper learning algorithm, under some standard complexity assumptions. These lower bounds show that the dependence on \gamma in the proposed algorithm is essentially tight. They also show that if one removes the approximation factor completely, then it is not possible to obtain a similar result, i.e. one with a polynomial dependence on $d$ and $1/epsilon$. Overall, the paper is well-written, and the results appear to be important and interesting. Detailed comments: - In section 1.4, Chow parameters are mentioned, but they are not subsequently used in the presented analysis, they are only used in the supplementary material. It is not clear how the algorithm and analysis in the supplementary relate to the ones in the paper. - Page 3, line 113: f and x_i are not defined. - Page 4, line 161: "Ruling out alpha-agnostic learners are slightly more complicated", "are" ==> "is" - Page 5, display equation in Claim 2.2: P[D'] should be P[y<= gamma/2] - Page 7, line 256: an extra parenthesis in the math formula. - Page 8, line 306: "learners learners"