NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This work studies the question of building a representative dataset for a private dataset, from the union of datasets existing at several agents. The constraints are (a) differential privacy between agents, and (b) the agent building the dataset does not learn too many more points than necessary. This is an interesting and relevant model of the problem. The authors propose an adaptation of the greedy algorithm in some kernel space. The adaptation involves interesting techniques. I like the paper both for the question asked and for the techniques. The reviewers had some concerns about the writeup, and I hope the authors will fix the writeup to address the reviewer concerns as promised in the rebuttal. Comments: 107-108: Abadi et al. "variance is linear in the number of iterations" : the variance is also scaled by the sampling factor which in practice translates to constant sigma for most applications. Also there is a few works in DP where the privacy cost is constant "per acquisition", e.g private versions of Frank-Wolfe for private LASSO (Talwar et al., Neurips 15) and private marginal release (Dwork et al. SoCG 2014).