Review for NeurIPS paper: A Computational Separation between Private Learning and Online Learning

NeurIPS 2020

A Computational Separation between Private Learning and Online Learning

Review 1

Summary and Contributions: This work shows that there is a class that is privately PAC learnable in polynomial time, but not efficiently learnable (assuming the existence of one-way functions) in the online setting, i.e., there is no polynomial time algorithm with a polynomial mistake bound. A line of recent work (focused on sample complexity) has demonstrated various equivalences between private PAC and online learning; this result focuses on the question of efficiency, proving that efficient private learnability does not imply efficient online learnability if one-way functions exist. The cryptographic assumption is standard for such results and is needed when ruling out general polynomial time learners. The class of functions that cannot be efficiently learned in the online model was given by [Blum 1994], which showed a separation between distribution free PAC learning and online learning. Much of the work toward separating the models of learning, which involves working out the cryptographic construction and giving an efficient PAC algorithm, was done there -- the technical contribution here is to give a private version of that algorithm. That algorithm is not private in that the resulting hypothesis reveals a ‘minimum’ value of the dataset and a key example; privacy is achieved here by the careful composition of existing algorithms to privately select a minimum and key example. Ensuring privacy incurs a dependence on the domain size in the sample complexity.

Strengths: This work settles (using a minimal and widely accepted assumption) the question of an efficient reduction from online learning to private PAC learning in the negative. Though just recently considered, as sample equivalences are only recently known, it is a natural and fundamental question studied by a few works (Gonen et al 2019 and Neel et al 2019). The proof is relatively simple (in a good way) and builds nicely on the work of Blum. It also complements and clarifies the result of Gonen et al. 2019. The discussion around the uniformity/efficient sample complexity requirement for their reduction could be important/useful for other researchers.

Weaknesses: There is not a lot of technical novelty here, as the result is obtained by combining Blum’s work with existing privacy algorithms. I find this to be a minor weakness, since the result answers a fundamental question and is well presented and executed. Update after author response: ======================= After the author's response my overall view of the paper , which is positive, is the same, and I'm glad they woul include the details of the proof in the final version.

Correctness: The claims in the paper are correct; some details of the proofs are deferred to a full version and I hope the authors will provide an appendix with the full argument in the next round (that being said the omitted parts are quite believable).

Clarity: The paper is generally clear enough. A few points of confusion I had: - I’m not clear on the indexing of sigma in the description of Algorithm 1. Shouldn’t the non-hat sigmas be indexed by their corresponding i, the place in the OWS? At least this seems to be the convention at the end of step 4. In that case sigma_1 should be sigma_{i_1} etc in step 1. - Line 228 should it be x*? - It might be helpful to explicitly point out that an example (i,sigma) has i in {0,1}^k when describing the OWS, and then to make some statement in proof of Thm 16 about the number of samples, i.e., since Min_pure is run over a domain of size 2^k=2^{sqrt{d}} and Freq_pure, etc. It sounds from the proof like only (8/alpha) log 2/beta samples are needed.

Relation to Prior Work: The paper’s relationship to previous work is well and thoroughly explained. One question I had is whether the work of Feldman and Xiao 2015 isn’t also relevant around lines 52-58 (don’t they show that pure DP learning sample complexity d implies Littlestone dimension O(d))?

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: The paper proves that, assuming the existence of one-way function, there exists a class of hypotheses that is poly-time private PAC learnable but is not poly-time online learnable. The proof is via an easy adaptation of an analogous separation of (non-private) PAC and online.

Strengths: Simple proof + resolves an open problem from FOCS paper!

Weaknesses: The result, which the author describes as “barrier to barrier” is very theoretical and I’m not sure if NeurIPS is the best venue?

Correctness: Yes

Clarity: Yes

Relation to Prior Work: Yes. *** OBSOLETE *** There is an existing huge separation in terms of sample complexity, albeit for non-efficient algorithms. I’m wondering whether that example can be padded to make the algorithms “polynomial” time? [Thanks author for clarifying this point! If I understand correctly, the issue is that an unconditional exponential separation is formally impossible since Littlestone's dimension is bounded by log(|H|) \le log(|U|^VC) = [sample description length] * [complexity of PAC learning]. Right?]

Reproducibility: Yes

Additional Feedback: Typos: Line 30: “C is *privately* learnable” Line 67: “the latter for which”

Review 3

Summary and Contributions: This paper studies the relationship of private PAC learning and online learning, from a computational complexity aspect. It asks whether every class that is privately PAC learnable in polynomial time is also online learnable in polynomial time. The authors give a negative answer to this question, by providing a class that is privately learnable in poly time (giving both pure and approximate learners) but not online learnable, under the existence of one-way functions. They also explore the connection of this result with Gonen et al. that appeared in NeurIPS 2019, which gave a computationally efficient reduction from online learning to uniform pure-private learning (a strong private learning setting over an infinite concept class, with sample complexity independent of the size of the concept). In this paper, the authors prove that such uniform pure-private learners could not be computationally efficient. However, they point to a relaxation of uniform pure-private learners to highly sample-efficient pure-private learners, that would still allow the application of the reduction of Gonen et al.

Strengths: -Recent results on the equivalence of online and private learning left important open questions on whether the sample and computational complexity gaps that these general reductions incur can be reduced. The specific question of whether computationally efficient private learning implies efficient online learning, was also explicitly stated as an interesting open problem in a FOCS 2019 paper. This paper negatively resolves it. -The complimentary results in relation to the Gonen et al. paper, further clarify the space to direct future research to classes or private learning models where these reductions could be efficient.

Weaknesses: -There is a slight drawback, which is that the authors have omitted to submit the supplementary material. However, I find the details provided in the main body to be enough to establish the correctness of the results with sufficient confidence (in particular, the private learning algorithms are fully analyzed, and the impossibility of online learning this class is largely based on Blum1994. As this is the main contribution of the paper, I don't consider this to be significant.

Correctness: All claims are either fully proven or sufficient evidence is given to be convinced of their correctness.

Clarity: The paper is well written.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: The paper was overall very well written. I think that having a section (section 4) to explore the complicated relationship between this paper and Gonen et al was a good idea as it gives a thorough exploration of how these results could be interpreted. But I do think it is a bit harder to read. Perhaps breaking it into three parts (Gonen et al and what is uniform pure private learning, impossibility of efficient uniform pure-private learning, and relaxations that allow the reduction to be useful) would help. Minor typos: -Line 215: is q(S,r) is -> , q(S,r), is - Line 223: identify -> identity ========================== Thank you for your response. I am keeping my score as is and support acceptance of this paper.