Paper ID: | 82 |
---|---|

Title: | Private Hypothesis Selection |

Overall, I think this is a well-written paper that provides a significant technical contribution on an important problem in differential privacy, and it would be a good addition to the NeurIPS program. Minor comments, mostly on presentation, are given below. --pg 2, Thm 1: does alpha need to be known in advance? If not (and alpha has to be chosen by the user), then what happens if no such H* exists? This is all made clear later in the technical sections, but should be clarified here. --pg 2, lines 57-61: For applications where m is so large that Omega(m) is inefficient, then this algorithms runtime of m^2 is even more infeasible. --pg 3, lines 90-92,103: Is d still the VC-based measure from a few paragraphs ago? Relatedly the earlier mention of Theorem 3 feels sufficiently separated that I thought the mention of it here was a typo. These paragraphs could be better linked together. --pg 5, lines 200-201: What does it mean that a distribution behaves like a function applied to a set? I can imagine when thinking of a distribution as a hypothesis as the average label over that set (although I'm not sure this is the correct interpretation since we're not talking about labels here), but then there is P(W_1) and P is the true distribution. When defining W_1, should \mathcal{X} be D? Confusion over these details also let to some confusion when interpreting the results. --pg 6, proof of Lemma 2: The claim that Gamma(H_1,H_2) has sensitivity 1 is not obvious and should be proven/argued. This confusion is related to the point above because Gamma is a piecewise-linear functions whose conditionals and values depend on the terms W_1, p_1, p_2, and tau that were unclear. --pg 6, line 263: "indepdently" --pg 7, line 267: concentrated differential privacy hasn't been mentioned yet. It should be either defined (formally or informally) in the body or include a pointer to a formal definition in the supplementary materials.

This paper is a solid contribution towards understand the sample complexity costs of achieving differential privacy during hypothesis testing. The paper explores this setting quite well, along with several motivating applications of a fairly abstract general mechanism. Originality: The scoring function that measure the number of points that must be changed to change a pairwise comparison of hypotheses appears to be a novel contribution, and its application into the exponential mechanism (or GAP-MAX) achieves novel utility bounds. The paper sufficiently cites related work in this commonly explored problem. Quality: The work appears quite sound with theoretical results, and the strengths and limitations of the analysis are explored. Clarity: The work is quite clearly written, carefully building up their mechanism and its analysis. Significance: The wide reach of their improved bounds to several applications is significant.

This work studies the problem of selecting a hypothesis from a class of hypotheses that is close in TV-distance from an unknown distribution P that samples are drawn from, with the added constraint of DP. The overall algorithm is the exponential mechanism, but some care needs to be taken in selecting the correct score function. The paper needs a lot more explanatory text to help the reader understand the intuition. Some things are defined with no explanation, such as \Gamma(H,H’,D). The paper dismisses in a footnote the issue that p_1 and p_2 can be evaluated exactly, because they can be estimated to arbitrary accuracy. How does an error in p_1 or p_2 propagate through the analysis? Overall I think the paper tackles an interesting problem and uses standard DP techniques to solve it. There are many different applications that the paper points out to show the impact of this result. The paper would benefit from a thorough proof reading with more explanations to improve the writing. This could all be fixed. Comments: - "indepdently" on page 2. - Should cite the various constraints studied previously in hypothesis selection on Page 1. - Lots of terms are used before they are defined, like DP, \alpha-cover. - The variable “d” is overloaded in Section 1.1 (refers to VC dim and I assume the dimension of the Gaussian distribution). - Preliminaries just state lots of definitions, with no explanatory text. - In Definition 3, how to define d_{TV}(P,\mathcal{H}) when d_{TV} is only defined over distributions. Also missing a \hat{H} in the d_{TV} bound. - Footnote on page 5 refers to H_j, but should this be H’? - Proof of Lemma 2, is H_j and H_k supposed to be H_1 and H_2? “probablity” on page 6 - In the Applications of Hypothesis Selection section, the notation O(x)^d is used. Is this supposed to be O(x^d)? ### I have read the author feedback and will keep my score unchanged.