Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Authors

James Jordon, Jinsung Yoon, Mihaela van der Schaar

Abstract

Differential Privacy is a popular and well-studied notion of privacy. In the era ofbig data that we are in, privacy concerns are becoming ever more prevalent and thusdifferential privacy is being turned to as one such solution. A popular method forensuring differential privacy of a classifier is known as subsample-and-aggregate,in which the dataset is divided into distinct chunks and a model is learned on eachchunk, after which it is aggregated. This approach allows for easy analysis of themodel on the data and thus differential privacy can be easily applied. In this paper,we extend this approach by dividing the data several times (rather than just once)and learning models on each chunk within each division. The first benefit of thisapproach is the natural improvement of utility by aggregating models trained ona more diverse range of subsets of the data (as demonstrated by the well-knownbagging technique). The second benefit is that, through analysis that we provide inthe paper, we can derive tighter differential privacy guarantees when several queriesare made to this mechanism. In order to derive these guarantees, we introducethe upwards and downwards moments accountants and derive bounds for thesemoments accountants in a data-driven fashion. We demonstrate the improvementsour model makes over standard subsample-and-aggregate in two datasets (HeartFailure (private) and UCI Adult (public)).