NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

The authors consider differentially private “bagging.” A single dataset is divided into k partitions. Each partition splits the data in n subsets, for a total of kn subsets. This is analogous to classical bagging since a single datapoint can appear in multiple subsets (in this case k). Furthermore, each subset will be used to train a classifier. A final classifier takes the majority vote of the individual classifiers with Lap(k/lambda) noise added to n_c, the number of sub-classifiers that output label c. This immediately yields a lambda-DP bound on the counts, and therefore the label of the final classifier. The authors are next interested in answering a sequence of queries using this classifier. The key analytic tool for understand this setting is the personalized moments accountant, which allows for a stronger composition of the privacy guarantees for each individual query than simple composition, advanced composition, or even the standard moments accountant. Ordinarily the l-th moment accountant is defined with respect to the privacy-loss random variable defined on two neighboring datasets. The privacy random variable is defined by drawing an outcome from the mechanism applied first dataset and computing the log odds ratio of the outcome with respect to the neighboring dataset. The worst-case (across datasets) l-th moment of this random variable essentially defines the lth moments accountant. In this work, the authors split the definition above into neighboring datasets that add an element, and neighboring datasets that subtract an element. They then show that all the same composition theorems hold for these “upwards” and “downwards” moments accountants individually. Much of the proofs here simply mirror the original moments accountant. They offer this as a tool for attaining better privacy bounds by first applying composition before taking the maximum between the upwards and downwards accountants. With this decomposition they can bound the moments accountant for the bagging setting by a data-dependent quantity. Given a new query point, for each class c and user u, we ask what fraction of classifiers that utilize user u’s data classify this point as c. If there is any user and class for which this fraction is 1, the personalized moments accountant yields a bound equivalent to the standard moments accountant. However, if the query point induces disagreement for all users, the bound is strictly better than the moments accountant. Across many query points, we should expect the latter case to sometimes happen, allowing us to use less privacy budget (although any budgeting will be data-dependent). This is born out in the experiments provided in the paper. The paper is well-written and easy to understand.

### Reviewer 2

The authors modify the moments accountant analysis of Abadi et al for a subsample-and-aggregate style algorithm. The key technical idea is that their analysis treats differently privacy loss when a data point is removed vs added. The results seem plausible and rigorous (although I did not verify all details), but I wish more effort had gone toward comparing the results here to the analog without the separate downwards and upwards moment accounting to help show the power of this technique. At many times the prose was too high-level/imprecise to help me understand the significance of the pieces not immediately inherited from Abadi et al. Comments: *Avoid opinion language like “we believe” in comparing techniques qualitatively and speculating about their future impact. *The paragraph before 4.2 seems to be the main idea, but it could use some clarification. How much better should we expect (4) to be than (3)? You make a comment about how it is “unlikely” that the two bounds are the same, but what does unlikely mean in this sentence? More rigorous claims along these lines could strengthen the paper. *The paper should be reorganized so Section 4 is (at least almost) all new contributions; as it is, almost all of 4.1 is just inherited from Abadi et al. *Use $\ell$ instead of $l$ for readability. *Is there a typo in Thm 2? alpha does not appear to be defined with u as a parameter. *Thm 3: “The mechanism” -> “Any mechanism” *m is defined to be the the fraction of teachers that voted for all but the least popular vote c_min, which is different from the claim at Line 203 that unanimity is the only way to get m=1. Thus Line 203 seems to be an overstatement. Can you clarify? *The simulations are useful baselines, but a formal accuracy guarantee is really required in 4.3 to assess the quality of this technique.

### Reviewer 3

The paper explores a remarkably simple but consequential improvement on the standard sample-and-aggregate framework. We have two main concerns about the paper. First, it is the relatively niche appeal of the result - for pragmatical reasons, sample-and-aggregate, or PATE, frameworks are very rarely used. Second, the paper compares its personalized accountant mechanism with the original sample-and-aggregate, outperforming it slightly. A more relevant comparison would have been with either PATE, or the "Scalable PATE" (ICLR 2018), both of which apply their own versions of data-dependent accounting mechanisms. In its current form the paper is somewhat half-baked: rather than improving on the latest state-of-the-art, it uses as the main benchmark a 2007 paper.