NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper considers the problem of privately estimating the "fingerprint" of a dataset. A dataset over some universe consists of (x,n_x) pairs, where x is a label, and n_x is the number of times that label x occurs. The fingerprint of this dataset is the histogram over the n_x's: it discards the labels, and, for each r >= 0, it counts the number of labels which were observed exactly r times. This is equivalent to approximating the histogram of the distribution up to a permutation. Under some appropriate distance measure (the minimum L1 distance over datasets which may correspond to the given fingerprints), this paper gives an algorithm which estimates the distribution up to accuracy ~O(sqrt(n/epsilon)) (in the high-privacy regime -- in the low-privacy regime, there is also a O_epsilon(sqrt(n)) upper bound). There is a known lower bound of Omega(sqrt(n/epsilon)) in the high-privacy regime, showing that the upper bound here is optimal up to a log(1/epsilon) factor. There is a new lower bound in this paper for the low-privacy regime, which is close to tight (up to a constant in the exponent multiplying epsilon -- as such, the use of "near-optimal" is slightly imprecise in line 119-120, as I usually read this as optimal up to polylogarithmic factors). The authors also provide a corollary of their main upper bound, to estimating symmetric properties of distributions. The results seem quite strong. It is impressive that this paper manages to essentially settle the problem in both the high and low privacy regimes. (Very) naive methods would give error of O(n/eps) (the authors should provide a comparison with the sample complexity for the problem of estimating the histogram itself, which I believe is this same value of n/eps), this paper improves the accuracy quadratically in both n and eps. The algorithm itself is fairly "elementary" -- there are a lot of steps, but pretty much all of them are simple operations, which may result in some adaptation of this method being practically realizable. There is a rather careful "partitioning" method into high and low count items in order to achieve the sqrt(n) error bound, rather than the naive n. My biggest criticism with this work is perhaps the clarity of the actual method. While a prefix of the paper is rather well-written and provides a nice exposition, unfortunately, I found the technical core of the actual result difficult to follow. The pseudocode itself (page 8) is rather inscrutable by itself, so one must consult the accompanying prose description (Section 4). However, the critical portions of this (Sections 4.2 and 4.3) are a bit terse, and marred by typos and poor language in some crucial places (I point out some of these instances below). I think this paper would benefit greatly from a more thorough and carefully written technical overview of the algorithm, potentially even in the supplementary material. I would be slightly hesitant to accept the paper in its present state, since I feel a small improvement here would greatly reduce the burden placed on any future reader. Thus, I would consider improving my score with a firm promise to address this issue in the final version. It would have been nice to see some concrete instantiations of Corollary 1, even in the supplementary material. What are the sample complexities for these problems, and how do they compare with Acharya et al.'s work? What type of properties can you handle that they do not (I think distance to uniformity?)? I think it would be worth expanding on the implications of this result. Also, wouldn't entropy depend doubly-logarithmically on the domain size, since f_max can be as large as log k? This is contrary to what is stated in line 173-174. Some typos I discovered (not comprehensive): The crucial definition in line 180 appears to be incorrect at first glance, as the s in the summation is never used. Line 227 typo (deferentially). Line 237 has poor grammar (and generally, this sentence could be expanded a bit -- why does a large L1 distance imply a large sensitivity in the histogram (and what does the sensitivity of a histogram even mean?)). Superscript in line 236: h^2_2 should be h^s_2. Missing period line 243. Algorithm PrivHist has Z ~ G(e^-eps_1) -- should be Z^a. Also, Z^b should have eps_2 instead of eps_1. Input in PrivHist-LowPrivacy H^\ell is missing a c in the superscript. EDIT: Based on the promise of the authors to improve the explanation of the main algorithm, I have raised my score.
Reviewer 2
This paper considers the problem of DP data releasing. Suppose there is some data set, the task is to release the "profile" of it in a DP way. This problem is important in DP and has several applications, e.g., discrete distribution estimation problem. This paper proposes an algorithm which runs in polynomial time and achieves almost optimal l_1 loss, which makes sound theoretical contribution. Their algorithm is based on the following nice observation: there are at most \sqrt{n} different profiles. It loses linear in n (or even worse) by purely releasing profiles or counts, since you have to output all n items in the worst case. But if combining these two in a smarter way, only \sqrt{n} needs to be lost. This paper is well written overall. The writing is generally neat and there are only few typos.
Reviewer 3
The paper is written very clearly, it spends just the right time on giving motivation for the current results, as well as discussing prior work. The derivation of the final algorithm is gradually introduced step-by-step, which for me personally was really important in gaining intuition. As mentioned above, the algorithm closes a gap as its performance in l1 utility matches that of an existing lower bound, which already earned my attention pretty early on. However, I wish there was a bit more motivation for PrivHist, in comparison with the algorithm that achieves a similar rate for (epsilon, delta)-DP with delta>0. I want to confirm - the algorithm you are referring to in “Pure vs approximate differential privacy” has error O(sqrt{n}/epsilon)? In which case, your solution also improves dependence on epsilon? if yes, I think this should be added when discussing prior work. The derivation of PrivHist is non-trivial, and goes beyond mere geometric noise addition. However, given that there has been some prior work on differentially private release of anonymized histograms, I would appreciate if the authors commented on whether these existing algorithms used some of the techniques discussed here - like splitting r around sqrt{n} and using prevalences in one regime and counts in another, or the smoothing idea used to zero out the prevalence vector, etc. I think this discussion would be helpful in assessing the novelty of presented ideas. Another minor question: is there a formal reason why the algorithm splits epsilon across epsilon1, epsilon2 and epsilon3 evenly? To conclude, I think the proposed algorithm is clearly presented, comes with formal guarantees, and moreover the current work has many potential applications as suggested in the paper. What I also really liked is that on several occasions the authors discussed why “natural” solutions don’t work (e.g. why Laplace noise or adding noise to all counts would give a worse rate). For this reason I recommend its acceptance. P.S. Here are a few typos/stylistic comments I had while reading: In Corollary 1: “suppose f-hat satisfies” and “let there exist” On page 6: deferentially private -> differentially private In Definition 1: “if and only if” is somewhat unnecessary since you are using this equality as the definition Post-feedback: The authors addressed my questions and comments in a satisfactory manner, and I think that making the above comparisons with prior work more explicit will only make the paper stronger. I still recommend acceptance.