NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:84
Title:Differentially Private Algorithms for Learning Mixtures of Separated Gaussians

Reviewer 1

The point about strong bounds on parameters not being required is somewhat subtle (and perhaps overstated in the abstract and intro) because bounds on these parameters are required to be known, but the sample complexity depends only polynomially and polylogarithmically on these bounds, so an analyst can have very loose bounds on these terms and it will not affect the sample complexity significantly. From a first read of the abstract/intro, it seems like these parameters are not required to known at all. I would encourage the authors to re-phrase the informal statements of these claims to more accurately represent the results of the paper. The algorithm works similarly to that of Achlioptas and McSherry by using PCA to project the data into a low-dimensional space and then recursively clustering data points. It differs from this previous work by using differentially private versions of PCA and clustering. The authors develop a new algorithm for private clustering that is based on private clustering algorithms of Nissim, Stemmer, and Vadhan, but satisfies additional properties needed for this particular problem. The authors also derive sample complexity results for solving the problem using subsample-and-aggregate, and show their algorithm has asymptotically better sample complexity. Overall, this paper provides novel technical results on a relevant problem in private machine learning, and would be a good fit for NeurIPS. The writing and presentation is also very clear. Minor comment: pg 4, line 132: "an secluded cluster"

Reviewer 2

The paper considers the fundamental problem of learning a mixture of Gaussians under differential privacy. It builds on an algorithm by Achlioptas and McSherry and the robustness of PCA-projection to noise. The resulting sample complexity (for achieving a given level of accuracy) matches that of the non-private version of the algorithm. In particular, it shows a provable advantage compared with the well-known sample-and-aggregate framework. The paper is well written in general and the contributions are clearly stated.

Reviewer 3

Post rebuttal: the author feedback addresses my concerns, though I still think the readability of the paper needs improvement. I increased the overall score. --------- My main concern is that the main algorithm (Sec 4) and the privacy analysis are not clearly stated, making it hard to check the quality of the paper. More specifically, In terms of clarity, - Algorithm 1 and 2 needs to be better explained in the text. The pseudo code can also be made more readable, for example, Line 4 and 7 are actually if-then-else statements but the else parts are implicit (the full version is better). - PTERRIFICBALL should be explained more. It is unclear how much noise is added. - It seems to me Algorithm 1 would find k clusters. Yet the recursive part calls it twice with k-1. Doesn’t that mean we would have 2(k-1) clusters? - Line 3 in Algorithm 2 mentioned a subroutine PGE which is not defined. PGE seems to be important as it is the main component in Algorithm 2 that guarantees Differential Privacy. - Line 251, is TerrificRadius the same as TerrificBall? In terms of privacy analysis: - In Algorithm 2, why do we add Lap(1/epsilon) noise to |Cj|? Is the sensitivity 1?