NeurIPS 2020

### Review 1

Summary and Contributions: The paper proposes the new problem statement of PQ-learning a selective classifier (one which can abstain) and two algorithms (Rejectron and Urejectron) that are PQ-learners of selective classifiers. Informally, PQ learning refers to a problem where the learner gets labeled examples from a distribution P and unlabeled examples from distribution Q and needs to learn a selective classifier in polynomial time that has low error on Q and low rejection rate on P. The main contributions are: * formalizing PQ-learning of a selective classifier * proposing two algorithms (Rejectron and Urejectron) within this framework * providing and proving bounds for these algorithms in terms of errors on Q and rejection rates on P

Strengths: Novelty: the main contributions of the paper (PQ-learning, Rejectron, Urejectron) are novel to me and would clearly justify a publication at NeurIPS in terms of novelty. Significance: generalization and controlled behaviour for out-of-distribution data is one of the core challenges in terms of robust machine learning and the presented framework and algorithms make substantial progress. As outlined below, I am however not fully convinced that the specific setting studied in this paper is of relevance (the authors may convince me of the opposite in their response) Soundness of claims: the definitions, theorems, and proofs are well presented and I have not found any flaws in the material.

Correctness: As stated above, the empirical evaluation is performed on a simplified version of the proposed algorithms. It is not clear to me why the full Urejectron or Rejectron algorithms are never used in the evaluation?

Clarity: The paper is written well; particularly the introduction does a very good job in making the reader familiar with the problem statement and the main contributions of the paper.

Relation to Prior Work: Most relevant prior work is cited; however, work on "unrestriced adversarial examples" (e.g. https://openreview.net/forum?id=Sye_OgHFwH) should be discussed additionally. I am not very familiar with the field of selective classification, so I might miss relevant work from this field.

Reproducibility: Yes

Additional Feedback: Generally, I am willing to increase my score if my major criticisms raised under Weaknesses were to be addressed. ### Post-rebuttal Thanks for your kind and extensive response to my review. Most of my main points have been addressed. However, I still miss an actual evaluation of the proposed methods (at least on toy data), An evaluation of a very much ablated version of the Urejectron and leaving an actual evaluation to future work as stated in the response is not sufficient in my opinion, particularly as it should not be a major effort on toy data. I thus keep my score of 5, even though I otherwise share much of the positive feedback of the other reviewers.

### Review 2

Summary and Contributions: Short summary: ============== This paper considers the learning setting where (a) the learner is tested on a distribution that is different from training distribution but additionally, (b) the learner can output what is called a selective classifier (i.e., a classifier that can either output a label or just say "don't know" and abstain from predicting anything) and even further (c) the learner is given some unlabeled data from the test distribution. In this setting, the paper provides two algorithms with novel types of guarantees. More details ============= Specifically, the paper proposes two learning models: a) "The generalization setting" where the learner is given data from a training distribution (P) and unlabeled data from a test distribution (Q). The goal of the learner is to make sure that "rej_P + err_Q" is small. Here, rej_P is the fraction of the P distribution that falls under the "rejection set", and err_Q is the fraction of the Q distribution on which the learner mislabels (which would NOT include the rejection set). b) "The transductive setting" where Q=P. Here, the learner first chooses an h using data from P. Then a white-box adversary that knows h and can see some test data Z from P creates arbitrary test data \tilde{X}. The learner can look at \tilde{X} to come up with a rejection set such that "rej_{Z} + err_{\tilde{X}}" is small. Under these settings, the paper presents -- an supervised learning algorithm Rejectron that outputs a hypothesis + the acceptance set. -- and an unsupervised learning algorithm Urejectron that outputs only an acceptance set (and has access to only unlabeled data from P) For both these algorithms the paper demonstrates -- computational efficiency of the algorithms assuming access to an appropriate ERM oracle -- upper bounds on the above guarantees (in terms of the VC dimension). -- Since these bounds grow as \sqrt{d/n} unlike standard VC dim bounds (d/n), the paper also presents a matching lower bound to show that this is indeed tight. Finally, the paper presents some experiments running a practical version of the unsupervised algorithm on the EMNIST dataset for two different classifiers (Neural network & Random forest) and two different test distributions (natural and adversarial). These experiments are to be considered as a proof of concept.

Strengths: This paper was an exciting and enjoyable read! I've many positive comments and honestly nothing negative to say (maybe some suggestions at best). 1. The problem of being robust to adversarial shifts in the distribution is an important problem, and it is even more important and interesting to get a learning-theoretic grounding for it. Admittedly, unlike most practical settings, this paper works with a selective classifier. While the selective classifier might seem restrictive, such classifiers are interesting in their own right as ultimately, a "don't know" is always better than a misclassification. Additionally, it might also seem restrictive that the model uses unlabeled data, but the problem becomes much harder without such an assumption. Overall, even though the paper makes some seemingly restrictive assumptions, these assumptions make sense as they help us provide guarantees without making the problem trivial. 2. I'd also like to note that both the high level approach and the low level details are novel here. At a high level, the idea of using a selective classifier together with formulating the goal as a sum of rej_{train dist} + err_{test dist} is novel. This also gives way for a neat, formal treatment of the algorithms. At the low level, the algorithms are novel, yet simple to understand and natural (indeed they resemble a lot of standard disagreement-based algorithms common to KWIK-learning, mistake-bound learning, active learning, co-training etc.,). The proofs are simple to understand. It's interesting that the sample complexity captures a tradeoff between the two goals of minimizing error and minimizing rejection rate. 3. The paper also leaves open the possibility for quite a lot of follow-up work since the framework is general. One could imagine trying to come up with more sample-efficient/computationally-efficient algorithms under a variety of other assumptions within this framework. Overall I think that this is a valuable foundational paper in the theory of learning to be robust to arbitrary adversaries.

Weaknesses: I don't think I have anything significant to criticize in the paper.

Correctness: Yes, they seem correct. However, I've not checked the appendix proofs in detail.

Relation to Prior Work: Yes. The paper describes existing theoretical results in selective classification noting why they don't apply in this setting. The paper is also honest about the fact that existing algorithms fail in the considered setting only because they don't assume access to unlabeled data. 11. In L159, it'd be nice to be clearer regarding "as opposed to... we consider arbitrary test examples". This seems to (wrongly) suggest that the notion of arbitrary test examples is neither weaker nor stronger than the notion of adversarial examples in deep learning literature. However, the notion of arbitrary test examples here does capture (and is stronger than) the latter, right? If it is right to say that the notion of adversarial examples is subsumed by "the arbitrary adversary", it might also be worth outlining/distinguishing work in adversarial examples that uses unlabeled data. e.g., https://papers.nips.cc/paper/9298-unlabeled-data-improves-adversarial-robustness.pdf, https://arxiv.org/abs/1905.13725, https://papers.nips.cc/paper/8792-robustness-to-adversarial-perturbations-in-learning-from-incomplete-data.pdf, https://arxiv.org/abs/1906.00555 (I don't need any clarification regarding these papers; just a suggestion for what could be added in the related work section.) Thanks for the paper!

Reproducibility: Yes

Additional Feedback: Updates after response: Thank you for responding to my questions. Although I'm not increasing the score, I would continue to argue for acceptance of this paper since it has strong theoretical value and novelty.

### Review 3

Summary and Contributions: The paper proposes the setting where the learning algorithm gets iid training data from a distribution P and gets samples from a different distribution Q, and learns a classifier that can abstain from predicting on some examples. In the realizable case, and assuming the target concept class C has finite VC dimension and given an Empirical Risk Minimizer (ERM) for C, the paper gives an algorithm that learns a model that has a low abstention rate with respect to P and a low generalization error with respect to Q. It also considers the transductive and agnostic settings.

Strengths: + The direction is novel. The paper proposes an interesting novel setting, and can potentially open up an interesting and useful research direction. + The results are also novel (since no prior guarantees were known even for simple classes of functions). They are also quite powerful (several bounds nearly match the lower bounds). + The results are based on algorithms that are quite natural and can be practical (the assumption of an ERM can be approximately achieved for quite rich application settings).

Weaknesses: - The given setup does not seem to match some practical considerations. For example, it may be the case that the classifier has a large abstention rate with respect to Q. Are there ways to improve this with some assumptions on Q? When the statistical distance between P and Q is small, the current guarantee implies a small abstention rate with respect to Q. Can this be generalized/improved? ============after response====================== I've read all the reviews and the response. I would like to thank the authors for the points, and remain positive about the paper.

Correctness: correct

Clarity: Yes

Relation to Prior Work: yes

Reproducibility: Yes