Paper ID: | 4058 |
---|---|

Title: | On the Hardness of Robust Classification |

1. Novelty and Significance: The paper mostly presents some impossibility results on robust binary classification under adversarial perturbation, which could be of independent interest for a mathematical perspective. However it has not been made clear how do these impossibility results have any impact from a practical point of view. I would have appreciated if authors have provided some algorithms which achieves robust learnability even for a slightly relaxed setting, e.g. at least a robust learning algorithm for monotone conjugate functions for Thm. 10's setting would have been more useful. 2. Writing: The writing/presentations of the paper requires a lot of polishing: a) I could not find the definition of "PAC-learnability" in the paper (used in Sec 5). b) The class of "Monotone conjugation function of length l" is nowhere defined, neither authors put any reference for this. c) What is meant by "non-trivial" function class? No definitions provided. d) What is the point of introducing Def. 2 as all the results are stated only for the risk measure defined in Def. 1 only. e) By "distribution free setting" -- does that imply a classification problem setting where instances do not come from any underlying distribution? f)There are lot of (minor) typos throughout the paper: Line 135 --> "fhas". Line 200 --> unexpected occurrence of ",", Line 239 "a universal" repeated, Line 310: given --> gave etc. etc. 3. Seemingly incomplete results/ confusing theorem statements: a) Its very confusing as authors claim in the Introduction that: "On the other hand, a more powerful learning algorithm that has access to membership queries can exactly learn monotone conjunctions and as a result can also robustly learn with respect to exact in the ball loss." --- I could not find details of these results anywhere in the paper later. b) Thm. 10 also seems to be confusing on the distributional assumption D -- does these include any distribution defined on the extreme points of the boolean hypercube? c) Statement of Prop. 12: I am surprised the sample complexity does not depend on the properties of the underlying distribution and k --- I expect sample complexity to go up with increasing k. Should it be |C'_{k,n}| instead of |C_{n}|? d) Another typo in Proposition 12: D' --> D'_{(k,n)}? e) Its also slightly disappointing to see that the sample complexity results holds only for finite size concept classes, precisely since authors only have assumed the instance set to be extreme points of the boolean hypercube. I would rather have appreciated the result lot more if proved for a more general concept class where sample complexity depends on some complexity measure (e.g. VC dimension) of C_n, etc. 4. Experiments: The paper does not provide any empirical studies, but I understand that this is rather out of the scope of the current results since all the claims are mostly made on non-existence of any robust-learning algorithms, but atleast authors could have proposed some algorithms for the monotone conjugate functions (i.e. for Thm. 10) with learnability guarantee to show the tightness of their analysis and simulated it.

* Line 47: "conceptual conceptual" * Line 67: redundant "examples" * Discussion of [5] (Bubeck et al.): I find the use of the word "computable" very confusing here. I couldn't understand why they would need to solve the Halting Problem... * Line 224: I can't find Equation (8) (also, please capitalize E in Equation)

This work by considering different efficiceny aspects of robust learning is new and brings important contributions to an important field of research. The positionning to recent related works is very clear stressing the limits of the state-of-the-art when integrating computational efficiency and sample complexity in the metrics of designing robust learning models. The article is theoretically sound and well written. I think this article is important in the field of robust learning by focusing on the efficiency of solutions and on what problems are then achievable when efficiency has to be considered. Their contributions open new directions to see whether this result can be generalised to other classes of functions than the simple monotone conjunctions addressed here and how state-of-the-art recent results hold when adding constraints on sample complexity and computational efficiency.