NeurIPS 2020

Private Identity Testing for High-Dimensional Distributions

Review 1

Summary and Contributions: Review Update: I agree that reducing dependence on privacy parameters, even marginally in some regimes, can have great practical importance and impact. I'm comfortable sticking with my score however. This paper studies the problem of private distribution testing, giving algorithms and bounds that improve the dependence of the error on the dimensionality, and hence scale better to high dimensions. Informally, they give results for (1) distinguishing the uniform distribution over (0,1)^d from anything alpha close in TV (uniformity testing over the cube) (2) They can reduce testing the mean of two Gaussians to uniformity testing over the cube (3) They can reduce testing of specific product distributions called "extreme" to univariate testing, and thus use lower bounds from univariate testing to provide the first known lower bounds for private product testing. (1) is the key result - and they give two algorithms, one efficient, and one not. Both rely on the idea of privately computing a test statistic that has high global sensitivity and low average sensitivity by recursively taking Lipschitz extensions. In general computing the extension is inefficent, but it can be done approximately giving an efficient algorithm with worse sample complexity. Their bounds give an additive cost for privacy that only scales like sqrt(dimension) rather than linear. For epsilon = Omega(alpha) their bound shows that you can get privacy for free.

Strengths: - new results for privately computing a fundamental statistical primitive - interesting use of Lipschitz extensions to reduce global sensitivity to average sensitivity while still maintaining privacy, although this is similar to prior work - novel lower bounds for this problem

Weaknesses: * unclear how practically important improving the bound from (1/epsilon) multiplicatively to an additive term really is - in many applications epsilon could be O(1)

Correctness: - yes

Clarity: - yes

Relation to Prior Work: - yes

Reproducibility: Yes

Additional Feedback: This is thorough work that represents a definite contribution to private hypothesis testing. What prevents it from receiving a higher score is that I'm not sure how significant the results are, and of how much interest they are to the Neurips community.

Review 2

Summary and Contributions: The paper studies the problem of privately testing whether a set of data samples is drawn from the uniform distribution, or \alpha far from it. The paper also provides solution to generalizations of this problem, e.g., biased product distributions and Gaussian distribution with diagonal covariance. The paper provides improved sample complexity bounds that has better dependence on dimensions, and show that without computational assumptions the sample complexity is almost optimal in certain parameter regimes.

Strengths: 1. The paper is very well-written, and the results are very easy to follow. 2. The paper uses (by now standard techniques) like using proxy statistics which are stable on typical data sets, and techniques of Lipschitz extension in a fairly non-trivial way. 3. May be I am new to this line of research on property testing with differential privacy, but I found the technique of privacy filtering to be quite novel.

Weaknesses: I am not too much familiar with the area to criticize the paper much, beyond requesting the authors to demonstrate some motivation for practical importance of these problems in real-world settings of design of privacy preserving algorithms.

Correctness: The paper seems to be correct intuitively.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: The paper gives algorithms and techniques for differentially private hypothesis testing for relatively simple distributions in high dimensions. The main test case is to distinguish a sample from a uniform distribution on the d-dimensional hypercube to a product distribution over {-1,1}^d that is "far" from uniform in total variation distance

Strengths: The paper gives two interesting techniques to perform these tests. 1) Lipschitz extensions. This gives good sample complexity bounds but inefficient 2) Private filtering

Weaknesses: Some of the better bounds are inefficient, some of the bounds are suboptimal

Correctness: What i looked at seems ok, did not verify all

Clarity: yes

Relation to Prior Work: I am not too familiar with prior work on private hypothesis testing

Reproducibility: Yes

Additional Feedback: I have read the author's response

Review 4

Summary and Contributions: Differential privacy is an important area of continued research with increased applicability in industry. This paper derives an optimal hypothesis test with differential privacy. The results are theoretic in nature but clearly quantify the cost of privacy in terms of the tests themselves. The approach itself is clearly derived and focusses on the average case in order to manage challenges associated with metrics of extreme values (the dependence of the algorithm on single observations which in term conflict with the definitions of differential privacy).

Strengths: The paper was clear, well-motivated and offer concrete theoretical contributions. The authors described the motivations, algorithmic design, theoretical guarantee and broader impact of the approach. While there was no empirical evidence to support the approach, the argument via optimal guarantees, correctness and associated theorems left the paper complete. I believe it will be of interest to the NeurIPS community.

Weaknesses: Unfortunately, the paper was not the most readable. The ordering of the sections jumped around a lot (intro, background, results, techniques, related work, core section, impact). I would have suggested that background and related works are tied together and results and impact due to their natural relations. Nevertheless, this is merely nitpicking and there are not many overwhelming concerns with the work. One question that the reader is left with however is the applicability of the approach in practice and whether the additional cost of privacy (despite the obvious benefits) leads to poor hypothesis testing in the wild.

Correctness: There is no obvious issue with the papers theory itself as far as I am aware.

Clarity: Please see weaknesses.

Relation to Prior Work: The paper links to related research such as hypothesis testing and its applicability in the ML community and the Lipschitz-extension technique. Differential privacy literature and related topics are assumed to be well understood by the reader.

Reproducibility: Yes

Additional Feedback: