{"title": "Renyi Differential Privacy Mechanisms for Posterior Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 5289, "page_last": 5298, "abstract": "With the newly proposed privacy definition of R\u00e9nyi Differential Privacy (RDP) in (Mironov, 2017), we re-examine the inherent privacy of releasing a single sample from a posterior distribution. We exploit the impact of the prior distribution in mitigating the influence of individual data points. In particular, we focus on sampling from an exponential family and specific generalized linear models, such as logistic regression. We propose novel RDP mechanisms as well as offering a new RDP analysis for an existing method in order to add value to the RDP framework. Each method is capable of achieving arbitrary RDP privacy guarantees, and we offer experimental results of their efficacy.", "full_text": "R\u00e9nyi Differential Privacy Mechanisms for Posterior\n\nSampling\n\nJoseph Geumlek\n\nUniversity of California, San Diego\n\njgeumlek@cs.ucsd.edu\n\nShuang Song\n\nUniversity of California, San Diego\n\nshs037@eng.ucsd.edu\n\nKamalika Chaudhuri\n\nUniversity of California, San Diego\n\nkamalika@cs.ucsd.edu\n\nAbstract\n\nWith the newly proposed privacy de\ufb01nition of R\u00e9nyi Differential Privacy (RDP)\nin [14], we re-examine the inherent privacy of releasing a single sample from a\nposterior distribution. We exploit the impact of the prior distribution in mitigating\nthe in\ufb02uence of individual data points. In particular, we focus on sampling from\nan exponential family and speci\ufb01c generalized linear models, such as logistic\nregression. We propose novel RDP mechanisms as well as offering a new RDP\nanalysis for an existing method in order to add value to the RDP framework. Each\nmethod is capable of achieving arbitrary RDP privacy guarantees, and we offer\nexperimental results of their ef\ufb01cacy.\n\n1\n\nIntroduction\n\nAs data analysis continues to expand and permeate ever more facets of life, the concerns over the\nprivacy of one\u2019s data grow too. Many results have arrived in recent years to tackle the inherent\ncon\ufb02ict of extracting usable knowledge from a data set without over-extracting or leaking the private\ndata of individuals. Before one can strike a balance between these competing goals, one needs a\nframework by which to quantify what it means to preserve an individual\u2019s privacy.\nSince 2006, Differential Privacy (DP) has reigned as the privacy framework of choice [6]. It quanti\ufb01es\nprivacy by measuring how indistinguishability of the mechanism output across whether or not any\none individual is in or out of the data set. This gave not just privacy semantics, but also robust\nmathematical guarantees. However, the requirements have been cumbersome for utility, leading to\nmany proposed relaxations. One common relaxation is approximate DP, which allows arbitrarily bad\nevents to occur with probability at most \u03b4. A more recent relaxation is R\u00e9nyi Differential Privacy\n(RDP) proposed in [14], which uses the measure of R\u00e9nyi divergences to smoothly vary between\nbounding the average and maximum privacy loss. However, RDP has very few mechanisms compared\nto the more established approximate DP. We expand the RDP repertoire with novel mechanisms\ninspired by R\u00e9nyi divergences, as well as re-analyzing an existing method in this new light.\nInherent to DP and RDP is that there must be some uncertainty in the mechanism; they cannot be\ndeterministic. Many privacy methods have been motivated by exploiting pre-existing sources of\nrandomness in machine learning algorithms. One promising area has been Bayesian data analysis,\nwhich focuses on maintaining and tracking the uncertainty within probabilistic models. Posterior\nsampling is prevalent in many Bayesian methods, serving to introduce randomness that matches the\ncurrently held uncertainty.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fWe analyze the privacy arising from posterior sampling as applied to two domains: sampling from\nexponential families and Bayesian logistic regression. Along with these analyses, we offer tunable\nmechanisms that can achieve stronger privacy guarantees than directly sampling from the posterior.\nThese mechanisms work via controlling the relative strength of the prior in determining the posterior,\nbuilding off the common intuition that concentrated prior distributions can prevent over\ufb01tting in\nBayesian data analysis. We experimentally validate our new methods on synthetic and real data.\n\n2 Background\nPrivacy Model. We say two data sets X and X(cid:48) are neighboring if they differ in the private record\nof a single individual or person. We use n to refer to the number of records in the data set.\nDe\ufb01nition 1. Differential Privacy (DP) [6]. A randomized mechanism A(X) is said to be (\u0001, \u03b4)-\ndifferentially private if for any subset U of the output range of A and any neighboring data sets X\nand X(cid:48), we have p(A(X) \u2208 U ) \u2264 exp (\u0001) p(A(X(cid:48)) \u2208 U ) + \u03b4.\nDP is concerned with the difference the participation of an individual might have on the output\ndistribution of the mechanism. When \u03b4 > 0, it is known as approximate DP while the \u03b4 = 0 case\nis known as pure DP. The requirements for DP can be phrased in terms of a privacy loss variable, a\nrandom variable that captures the effective privacy loss of the mechanism output.\nDe\ufb01nition 2. Privacy Loss Variable [2]. We can de\ufb01ne a random variable Z that measures the\nprivacy loss of a given output of a mechanism across two neighboring data sets X and X(cid:48).\n\nZ = log\n\np(A(X) = o)\np(A(X(cid:48)) = o)\n\n(cid:12)(cid:12)(cid:12)(cid:12)o\u223cA(X)\n\n(1)\n\n(\u0001, \u03b4)-DP is the requirement that for any two neighboring data sets Z \u2264 \u0001 with probability at least\n1 \u2212 \u03b4. The exact nature of the trade-off and semantics between \u0001 and \u03b4 is subtle, and choosing them\nappropriately is dif\ufb01cult. For example, setting \u03b4 = 1/n permits (\u0001, \u03b4)-DP mechanisms that always\nviolate the privacy of a random individual [12]. However, there are other ways to specify that a\nrandom variable is mostly small. One such way is to bound the R\u00e9nyi divergence of A(X) and\nA(X(cid:48)).\nDe\ufb01nition 3. R\u00e9nyi Divergence [2]. The R\u00e9nyi divergence of order \u03bb between the two distributions\nP and Q is de\ufb01ned as\n\n(cid:90)\n\nD\u03bb(P||Q) =\n\n1\n\n\u03bb \u2212 1\n\nlog\n\nP (o)\u03bbQ(o)1\u2212\u03bbdo.\n\n(2)\n\nAs \u03bb \u2192 \u221e, R\u00e9nyi divergence becomes the max divergence; moreover, setting P = A(X) and\nQ = A(X(cid:48)) ensures that D\u03bb(P||Q) = 1\n\u03bb\u22121 log EZ[e(\u03bb\u22121)Z], where Z is the privacy loss variable.\nThus, a bound on the R\u00e9nyi divergence over all orders \u03bb \u2208 (0,\u221e) is equivalent to (\u0001, 0)-DP, and as\n\u03bb \u2192 1, this approaches the expected value of Z equal to KL(A(X)||A(X(cid:48))). This leads us to R\u00e9nyi\nDifferential Privacy, a \ufb02exible privacy notion that covers this intermediate behavior.\nDe\ufb01nition 4. R\u00e9nyi Differential Privacy (RDP) [14]. A randomized mechanism A(X) is said\nto be (\u03bb, \u0001)-R\u00e9nyi differentially private if for any neighboring data sets X and X(cid:48) we have\nD\u03bb(A(X)||A(X(cid:48))) \u2264 \u0001.\nThe choice of \u03bb in RDP is used to tune how much concern is placed on unlikely large values of Z\nversus the average value of Z. One can consider a mechanism\u2019s privacy as being quanti\ufb01ed by the\nentire curve of \u0001 values associated with each order \u03bb, but the results of [14] show that almost identical\nresults can be achieved when this curve is known at only a \ufb01nite collection of possible \u03bb values.\nPosterior Sampling. In Bayesian inference, we have a model class \u0398, and are given observations\nx1, . . . , xn assumed to be drawn from a \u03b8 \u2208 \u0398. Our goal is to maintain our beliefs about \u03b8 given the\nobservational data in the form of the posterior distribution p(\u03b8|x1, . . . , xn). This is often done in the\nform of drawing samples from the posterior.\nOur goal in this paper is to develop privacy preserving mechanisms for two popular and simple\nposterior sampling methods. The \ufb01rst is sampling from the exponential family posterior, which we\naddress in Section 3; the second is sampling from posteriors induced by a subset of Generalized\nLinear Models, which we address in Section 4.\n\n2\n\n\fRelated Work. Differential privacy has emerged as the gold standard for privacy in a number of\ndata analysis applications \u2013 see [8, 15] for surveys. Since enforcing pure DP sometimes requires the\naddition of high noise, a number of relaxations have been proposed in the literature. The most popular\nrelaxation is approximate DP [6], and a number of uniquely approximate DP mechanisms have been\ndesigned by [7, 16, 3, 1] among others. However, while this relaxation has some nice properties,\nrecent work [14, 12] has argued that it can also lead privacy pitfalls in some cases. Approximate\ndifferential privacy is also related to, but is weaker than, the closely related \u03b4-probabilistic privacy [11]\nand (1, \u0001, \u03b4)-indistinguishability [4].\nOur privacy de\ufb01nition of choice is R\u00e9nyi differential privacy [14], which is motivated by two\nrecent relaxations \u2013 concentrated DP [9] and z-CDP [2]. Concentrated DP has two parameters,\n\u00b5 and \u03c4, controlling the mean and concentration of the privacy loss variable. Given a privacy\nparameter \u03b1, z-CDP essentially requires (\u03bb, \u03b1\u03bb)-RDP for all \u03bb. While [2, 9, 14] establish tighter\nbounds on the privacy of existing differentially private and approximate DP mechanisms, we provide\nmechanisms based on posterior sampling from exponential families that are uniquely RDP. RDP\nis also a generalization of the notion of KL-privacy [19], which has been shown to be related to\ngeneralization in machine learning.\nThere has also been some recent work on privacy properties of Bayesian posterior sampling; however\nmost of the work has focused on establishing pure or approximate DP. [5] establishes conditions under\nwhich some popular Bayesian posterior sampling procedures directly satisfy pure or approximate\nDP. [18] provides a pure DP way to sample from a posterior that satis\ufb01es certain mild conditions by\nraising the temperature. [10, 20] provide a simple statistically ef\ufb01cient algorithm for sampling from\nexponential family posteriors. [13] shows that directly sampling from the posterior of certain GLMs,\nsuch as logistic regression, with the right parameters provides approximate differential privacy. While\nour work draws inspiration from all [5, 18, 13], the main difference between their and our work is\nthat we provide RDP guarantees.\n\n3 RDP Mechanisms based on Exponential Family Posterior Sampling\n\nIn this section, we analyze the R\u00e9nyi divergences between distributions from the same exponential\nfamily, which will lead to our RDP mechanisms for sampling from exponential family posteriors.\nAn exponential family is a family of probability distributions over x \u2208 X indexed by the parameter\n\u03b8 \u2208 \u0398 \u2286 Rd that can be written in this canonical form for some choice of functions h : X \u2192 R,\nS : X \u2192 Rd, and A : \u0398 \u2192 R:\n\np(x1, . . . , xn|\u03b8) =\n\nh(xi)\n\nexp\n\n(\n\nS(xi)) \u00b7 \u03b8 \u2212 n \u00b7 A(\u03b8)\n\n.\n\n(3)\n\ni=1\n\ni=1\n\nOf particular importance is S, the suf\ufb01cient statistics function, and A, the log-partition function of\nthis family. Our analysis will be restricted to the families that satisfy the following three properties.\nDe\ufb01nition 5. The natural parameterization of an exponential family is the one that indexes the\ndistributions of the family by the vector \u03b8 that appears in the inner product of equation (3).\nDe\ufb01nition 6. An exponential family is minimal if the coordinates of the function S are not linearly\ndependent for all x \u2208 X .\nDe\ufb01nition 7. For any \u2206 \u2208 R, an exponential family is \u2206-bounded if\n\u2206 \u2265 supx,y\u2208X ||S(x) \u2212 S(y)||.\nThis constraint can be relaxed with some caveats explored in the appendix.\n\nA minimal exponential family will always have a minimal conjugate prior family. This conjugate\nprior family is also an exponential family, and it satis\ufb01es the property that the posterior distribution\nformed after observing data is also within the same family. It has the following form:\n\n(4)\nThe suf\ufb01cient statistics of \u03b8 can be written as T (\u03b8) = (\u03b8,\u2212A(\u03b8)) and p(\u03b8|\u03b70, x1, . . . , xn) = p(\u03b8|\u03b7(cid:48))\n\np(\u03b8|\u03b7) = exp (T (\u03b8) \u00b7 \u03b7 \u2212 C(\u03b7)) .\n\nwhere \u03b7(cid:48) = \u03b70 +(cid:80)n\n\ni=1(S(xi), 1).\n\n3\n\n(cid:32) n(cid:89)\n\n(cid:33)\n\n(cid:32)\n\nn(cid:88)\n\n(cid:33)\n\n\fBeta-Bernoulli System. A speci\ufb01c example of an exponential family that we will be interested in is\nthe Beta-Bernoulli system, where an individual\u2019s data is a single i.i.d. bit modeled as a Bernoulli\nvariable with parameter \u03c1, along with a Beta conjugate prior. p(x|\u03c1) = \u03c1x(1 \u2212 \u03c1)1\u2212x.\nThe Bernoulli distribution can be written in the form of equation (3) by letting h(x) = 1, S(x) = x,\n1\u2212\u03c1 ), and A(\u03b8) = log(1 + exp (\u03b8)) = \u2212 log(1 \u2212 \u03c1). The Beta distribution with the usual\n\u03b8 = log( \u03c1\nparameters \u03b10, \u03b20 will be parameterized by \u03b70 = (\u03b7(1)\n0 ) = (\u03b10, \u03b10 +\u03b20) in accordance equation\n(4). This system satis\ufb01es the properties we require, as this natural parameterization is minimal and\n\u2206-bounded for \u2206 = 1. In this system, C(\u03b7) = \u0393(\u03b7(1)) + \u0393(\u03b7(2) \u2212 \u03b7(1)) \u2212 \u0393(\u03b7(2)).\nClosed Form R\u00e9nyi Divergence. The R\u00e9nyi divergences of two distributions within the same family\ncan be written in terms of the log-partition function.\n\n0 , \u03b7(2)\n\n(cid:32)(cid:90)\n\n\u0398\n\n(cid:33)\n\nD\u03bb(P||Q) =\n\n1\n\n\u03bb \u2212 1\n\nlog\n\nP (\u03b8)\u03bbQ(\u03b8)1\u2212\u03bbd\u03b8\n\nC(\u03bb\u03b7P + (1 \u2212 \u03bb)\u03b7Q) \u2212 \u03bbC(\u03b7P )\n\n\u03bb \u2212 1\n\n=\n\n+ C(\u03b7Q).\n\n(5)\n\nlog(cid:82)\n\nTo help analyze the implication of equation (5) for R\u00e9nyi Differential Privacy, we de\ufb01ne some sets of\nprior/posterior parameters \u03b7 that arise in our analysis.\nDe\ufb01nition 8. Normalizable Set E. We say a posterior parameter \u03b7 is normalizable if C(\u03b7) =\n\u0398 exp (T (\u03b8) \u00b7 \u03b7)) d\u03b8 is \ufb01nite. Let E contain all normalizable \u03b7 for the conjugate prior family.\nDe\ufb01nition 9. Let pset(\u03b70, n) be the convex hull of all parameters \u03b7 of the form \u03b70 + n(S(x), 1) for\nx \u2208 X . When n is an integer this represents the hull of possible posterior parameters after observing\nn data points starting with the prior \u03b70.\nDe\ufb01nition 10. Let Dif f be the difference set for the family, where Dif f is the convex hull of all\nvectors of the form (S(x) \u2212 S(y), 0) for x, y \u2208 X .\nDe\ufb01nition 11. Two posterior parameters \u03b71 and \u03b72 are neighboring iff \u03b71 \u2212 \u03b72 \u2208 Dif f.\nThey are r-neighboring iff \u03b71 \u2212 \u03b72 \u2208 r \u00b7 Dif f.\n\n3.1 Mechanisms and Privacy Guarantees\n\nWe begin with our simplest mechanism, Direct Sampling, which samples according to the true\nposterior. This mechanism is presented as Algorithm 1.\n\nAlgorithm 1 Direct Posterior\nRequire: \u03b70, {x1, . . . , xn}\n\n1: Sample \u03b8 \u223c p(\u03b8|\u03b7(cid:48)) where \u03b7(cid:48) = \u03b70 +(cid:80)n\n\ni=1(S(xi), 1)\n\nEven though Algorithm 1 is generally not differentially private [5], Theorem 12 suggests that it offers\nRDP for \u2206-bounded exponential families and certain orders \u03bb.\nTheorem 12. For a \u2206-bounded minimal exponential family of distributions p(x|\u03b8) with continuous\nlog-partition function A(\u03b8), there exists \u03bb\u2217 \u2208 (1,\u221e] such Algorithm 1 achieves (\u03bb, \u0001(\u03b70, n, \u03bb))-RDP\nfor \u03bb < \u03bb\u2217.\n\u03bb\u2217 is the supremum over all \u03bb such that all \u03b7 in the set \u03b70 + (\u03bb \u2212 1)Dif f are normalizable.\nCorollary 1. For the Beta-Bernoulli system with a prior Beta(\u03b10, \u03b20), Algorithm 1 achieves (\u03bb, \u0001)-\nRDP iff \u03bb > 1 and \u03bb < 1 + min(\u03b10, \u03b20).\n\nNotice the implication of Corollary 1: for any \u03b70 and n > 0, there exists \ufb01nite \u03bb such that direct\nposterior sampling does not guarantee (\u03bb, \u0001)-RDP for any \ufb01nite \u0001. This also prevents (\u0001, 0)-DP as an\nachievable goal. Algorithm 1 is in\ufb02exible; it offers us no way to change the privacy guarantee.\nThis motivates us to propose two different modi\ufb01cations to Algorithm 1 that are capable of achieving\narbitrary privacy parameters. Algorithm 2 modi\ufb01es the contribution of the data X to the posterior by\nintroducing a coef\ufb01cient r, while Algorithm 3 modi\ufb01es the contribution of the prior \u03b70 by introducing\na coef\ufb01cient m. These simple ideas have shown up before in variations: [18] introduces a temperature\n\n4\n\n\fAlgorithm 2 Diffused Posterior\nRequire: \u03b70, {x1, . . . , xn}, \u0001, \u03bb\n1: Find r \u2208 (0, 1] such that \u2200r-neighboring \u03b7P , \u03b7Q \u2208 pset(\u03b70, rn), D\u03bb(p(\u03b8|\u03b7P )||p(\u03b8|\u03b7Q)) \u2264 \u0001\n\n2: Sample \u03b8 \u223c p(\u03b8|\u03b7(cid:48)) where \u03b7(cid:48) = \u03b70 + r(cid:80)n\n\ni=1(S(xi), 1)\n\nscaling that acts similarly to r, while [13, 5] analyze concentration constraints for prior distributions\nmuch like our coef\ufb01cient m.\nTheorem 13. For any \u2206-bounded minimal exponential family with prior \u03b70 in the interior of E, any\n\u03bb > 1, and any \u0001 > 0, there exists r\u2217 \u2208 (0, 1] such that using r \u2208 (0, r\u2217] in Algorithm 2 will achieve\n(\u03bb, \u0001)-RDP.\n\nAlgorithm 3 Concentrated Posterior\nRequire: \u03b70, {x1, . . . , xn}, \u0001, \u03bb\n1: Find m \u2208 (0, 1] such that \u2200 neighboring \u03b7P , \u03b7Q \u2208 pset(\u03b70/m, n), D\u03bb(p(\u03b8|\u03b7P )||p(\u03b8|\u03b7Q)) \u2264 \u0001\n\n2: Sample \u03b8 \u223c p(\u03b8|\u03b7(cid:48)) where \u03b7(cid:48) = \u03b70/m +(cid:80)n\n\ni=1(S(xi), 1)\n\nTheorem 14. For any \u2206-bounded minimal exponential family with prior \u03b70 in the interior of E, any\n\u03bb > 1, and any \u0001 > 0, there exists m\u2217 \u2208 (0, 1] such that using m \u2208 (0, m\u2217] in Algorithm 3 will\nachieve (\u03bb, \u0001)-RDP.\n\nTheorems 13 and 14 can be interpreted as demonstrating that any RDP privacy level can be achieved\nby setting r or m arbitrarily close to zero. A small r implies a weak contribution from the data, while\na small m implies a strong prior that outweighs the contribution from the data. Setting r = 1 and\nm = 1 reduces to Algorithm 1, in which a sample is released from the true posterior without any\nmodi\ufb01cations for privacy.\nWe have not yet speci\ufb01ed how to \ufb01nd the appropriate values of r or m, and the condition requires\nchecking the supremum of divergences across the possible pset range of parameters arising as\nposteriors. However, with an additional assumption this supremum of divergences can be ef\ufb01ciently\ncomputed.\nTheorem 15. Let e(\u03b7P , \u03b7Q, \u03bb) = D\u03bb (p(\u03b8|\u03b7P )||p(\u03b8|\u03b7Q)). For a \ufb01xed \u03bb and \ufb01xed \u03b7P , the function e\nis a convex function over \u03b7Q.\nIf for any direction v \u2208 Dif f, the function gv(\u03b7) = v\n(cid:124)\u22072C(\u03b7)v is convex over \u03b7, then for a \ufb01xed \u03bb,\nthe function f\u03bb(\u03b7P ) = sup\u03b7Qr\u2212neighboring \u03b7P e(\u03b7P , \u03b7Q, \u03bb) is convex over \u03b7P in the directions spanned\nby Dif f.\nCorollary 2. The Beta-Bernoulli system satis\ufb01es the conditions of Theorem 15 since\nthe functions gv(\u03b7) have the form (v(1))2(\u03c81(\u03b7(1)) + \u03c81(\u03b7(2) \u2212 \u03b7(1))), and \u03c81 is the\ndigamma function.\nThe expression\nBoth pset and Dif f are de\ufb01ned as convex sets.\nsupr\u2212neighboring \u03b7P ,\u03b7Q\u2208pset(\u03b70,n) D\u03bb(p(\u03b8|\u03b7P )||p(\u03b8|\u03b7Q)) is therefore equivalent to the maximum of\nD\u03bb(p(\u03b8|\u03b7P )||p(\u03b8|\u03b7Q)) where \u03b7P \u2208 \u03b70 + {(0, n), (n, n)} and \u03b7Q \u2208 \u03b7P \u00b1 (r, 0).\nThe higher dimensional Dirichlet-Categorical system also satsi\ufb01es the conditions of Theorem 15.\nThis result is located in the appendix.\n\nWe can do a binary search over (0, 1] to \ufb01nd an appropriate value of r or m. At each candidate value,\nwe only need to consider the boundary situations to evaluate whether this value achieves the desired\nRDP privacy level. These boundary situations depend on the choice of model, and not the data size\nn. For example, in the Beta-Bernoulli system, evaluating the supremum involves calculating the\nR\u00e9nyi diverengence across at most 4 pairs of distributions, as in Corollary 2. In the d dimensional\nDirichlet-Categorical setting, there are O(d3) distribution pairs to evaluate.\nEventually, the search process is guaranteed to \ufb01nd a non-zero choice for r or m that achieves the\ndesired privacy level, although the utility optimality of this choice is not guaranteed. If stopped early\nand none of the tested candidate values satisfy the privacy constraint, the analyst can either continue\nto iterate or decide not to release anything.\n\n5\n\n\fExtensions. These methods have convenient privacy implications to the settings where some data is\npublic, such as after a data breach, and for releasing a statistical query. They can also be applied to\nnon-\u2206-bounded exponential families with some caveats. These additional results are located in the\nappendix.\n\n4 RDP for Generalized Linear Models with Gaussian Prior\n\nIn this section, we reinterpret some existing algorithms in [13] in the light of RDP, and use ideas\nfrom [13] to provide new RDP algorithms for posterior sampling for a subset of generalized linear\nmodels with Gaussian priors.\n\n4.1 Background: Generalized Linear Models (GLMs)\n\nThe goal of generalized linear models (GLMs) is to predict an outcome y given an input vector x; y\nis assumed to be generated from a distribution in the exponential family whose mean depends on x\nthrough E [y|x] = g\u22121(w(cid:62)x), where w represents the weight of linear combination of x, and g is\ncalled the link function. For example, in logistic regression, the link function g is logit and g\u22121 is the\nsigmoid function; and in linear regression, the link functions is the identity function. Learning in\nGLMs means learning the actual linear combination w.\n\nSpeci\ufb01cally, the likelihood of y given x can be written as p(y|w, x) = h(y)exp(cid:0)yw(cid:62)x \u2212 A(w(cid:62)x)(cid:1),\nlearn the parameter w. Let p(D|w) denote p({y1, . . . , yn}|w,{x1, . . . , xn}) = (cid:81)n\n\nwhere x \u2208 X , y \u2208 Y, A is the log-partition function, and h(y) the scaling constant. Given a\ndataset D = {(x1, y1), . . . , (xn, yn)} of n examples with xi \u2208 X and yi \u2208 Y, our goal is to\ni=1 p(yi|w, xi).\nWe set the prior p(w) as a multivariate Gaussian distribution with covariance \u03a3 = (n\u03b2)\u22121I, i.e.,\np(w) \u223c N (0, (n\u03b2)\u22121I). The posterior distribution of w given D can be written as\n\np(w|D) =\n\n(cid:82)\n\np(D|w)p(w)\n\nRd p(D|w(cid:48))p(w(cid:48))dw(cid:48) \u221d exp\n\n\u2212 n\u03b2(cid:107)w(cid:107)2\n\n2\n\np(yi|w, xi).\n\n(6)\n\n(cid:18)\n\n(cid:19) n(cid:89)\n\ni=1\n\n4.2 Mechanisms and Privacy Guarantees\n\nFirst, we introduce some assumptions that characterize the subset of GLMs and the corresponding\ntraining data on which RDP can be guaranteed.\nAssumption 1. (1). X is a bounded domain such that (cid:107)x(cid:107)2 \u2264 c for all x \u2208 X , and xi \u2208 X\nfor all (xi, yi) \u2208 D.\n(2). Y is a bounded domain such that Y \u2286 [ymin, ymax], and yi \u2208 Y\nfor all (xi, yi) \u2208 D..\n(3). g\u22121 has bounded range such that g\u22121 \u2208 [\u03b3min, \u03b3max]. Then, let\nB = max{|ymin \u2212 \u03b3max|,|ymax \u2212 \u03b3min|}.\nExample: Binary Regression with Bounded X Binary regression is used in the case where y takes\nvalue Y = {0, 1}. There are three common types of binary regression, logistic regression with\n\ng\u22121(w(cid:62)x) = 1/(1 + exp(cid:0)\u2212w(cid:62)x(cid:1)), probit regression with g\u22121(w(cid:62)x) = \u03a6(w(cid:62)x) where \u03a6 is the\nGaussian cdf, and complementary log-log regression with g\u22121(w(cid:62)x) = 1 \u2212 exp(cid:0)\u2212exp(cid:0)w(cid:62)x(cid:1)(cid:1). In\n\nthese three cases, Y = {0, 1}, g\u22121 has range (0, 1) and thus B = 1. Moreover, it is often assumed\nfor binary regression that any example lies in a bounded domain, i.e., (cid:107)x(cid:107)2 \u2264 c for x \u2208 X .\nNow we establish the privacy guarantee for sampling directly from the posterior in (6) in Theorem 17.\nWe also show that this privacy bound is tight for logistic regression; a detailed analysis is in Appendix.\nTheorem 16. Suppose we are given a GLM and a dataset D of size n that satis\ufb01es Assumption 1,\nand a Gaussian prior with covariance \u03a3 = (n\u03b2)\u22121I, then sampling with posterior in (6) satis\ufb01es\n(\u03bb, 2c2B2\n\nn\u03b2 \u03bb)-RDP for all \u03bb \u2265 1.\n\nNotice that direct posterior sampling cannot achieve (\u03bb, \u0001)-RDP for arbitrary \u03bb and \u0001. We next\npresent Algorithm 4 and 5, as analogous to Algorithm 3 and 2 for exponential family respectively,\nthat guarantee any given RDP requirement. Algorithm 4 achieves a given RDP level by setting a\nstronger prior, while Algorithm 5 by raising the temperature of the likelihood.\n\n6\n\n\fAlgorithm 4 Concentrated Posterior\nRequire: Dataset D of size n; Gaussian\nprior with covariance (n\u03b20)\u22121I; (\u03bb, \u0001).\n\n1: Set \u03b2 = max{ 2c2B2\u03bb\n2: Sample w \u223c p(w|D) in (6).\n\n, \u03b20} in (6).\n\nn\u0001\n\nAlgorithm 5 Diffuse Posterior\nRequire: Dataset D of size n; Gaussian prior with\n1: Replace p(yi|w, xi) with p(yi|w, xi)\u03c1 in (6)\n\ncovariance (n\u03b2)\u22121I; (\u03bb, \u0001).\n\n(cid:113) \u0001n\u03b2\n2c2B2\u03bb}.\n\nwhere \u03c1 = min{1,\n\n2: Sample w \u223c p(w|D) in (6).\n\nIt follows directly from Theorem 17 that under Assumption 1, Algorithm 4 satis\ufb01es (\u03bb, \u0001)-RDP.\nTheorem 17. Suppose we are given a GLM and a dataset D of size n that satis\ufb01es Assumption 1,\nand a Gaussian prior with covariance \u03a3 = (n\u03b2)\u22121I, then Algorithm 5 guarantees (\u03bb, \u0001)-RDP. In\nfact, it guarantees (\u02dc\u03bb, \u0001\n\u03bb\n\n\u02dc\u03bb)-RDP for any \u02dc\u03bb \u2265 1.\n\n5 Experiments\n\nIn this section, we present the experimental results for our proposed algorithms for both exponential\nfamily and GLMs. Our experimental design focuses on two goals \u2013 \ufb01rst, analyzing the relationship\nbetween \u03bb and \u0001 in our privacy guarantees and second, exploring the privacy-utility trade-off of our\nproposed methods in relation to existing methods.\n\n5.1 Synthetic Data: Beta-Bernoulli Sampling Experiments\n\nIn this section, we consider posterior sampling in the Beta-Bernoulli system. We compare three\nalgorithms. As a baseline, we select a modi\ufb01ed version of the algorithm in [10], which privatizes\nthe suf\ufb01cient statistic of the data to create a privatized posterior. Instead of Laplace noise that is\nused by[10], we use Gaussian noise to do the privatization; [14] shows that if Gaussian noise with\nvariance \u03c32 is added, then this offers an RDP guarantee of (\u03bb, \u03bb \u22062\n\u03c32 ) for \u2206-bounded exponential\nfamilies. We also consider the two algorithms presented in Section 3.1 \u2013 Algorithm 2 and 3; observe\nthat Algorithm 1 is a special case of both. 500 iterations of binary search were used to select r and m\nwhen needed.\nAchievable Privacy Levels. We plot the (\u03bb, \u0001)-RDP parameters achieved by Algorithms 2 and\n3 for a few values of r and m. These parameters are plotted for a prior \u03b70 = (6, 18) and the\ndata size n = 100 which are selected arbitrarily for illustrative purposes. We plot over six values\n{0.1, 0.3, 0.5, 0.7, 0.9, 1} of the scaling constants r and m. The results are presented in Figure 1.\nOur primary observation is the presence of the vertical asymptotes for our proposed methods. Recall\nthat any privacy level is achievable with our algorithms given small enough r or m; these plots\ndemonstrate the interaction of \u03bb and \u0001. As r and m decrease, the \u0001 guarantees improve at each \u03bb and\neven become \ufb01nite at larger orders \u03bb, but a vertical asymptote still exists. The results for the baseline\nare not plotted: it achieves RDP along any line of positive slope passing through the origin.\nPrivacy-Utility Tradeoff. We next evaluate the privacy-utility tradeoff of the algorithms by plotting\nKL(P||A) as a function of \u0001 with \u03bb \ufb01xed, where P is the true posterior and A is the output\ndistribution of a mechanism. For Algorithms 2 and 3, the KL divergence can be evaluated in closed\nform. For the Gaussian mechanism, numerical integration was used to evaluate the KL divergence\nintegral. We have arbitrarily chosen \u03b70 = (6, 18) and data set X with 100 total trials and 38 successful\ntrials. We have plotted the resulting divergences over a range of \u0001 for \u03bb = 2 in (a) and for \u03bb = 15 in\n(b) of Figure 2. When \u03bb = 2 < \u03bb\u2217, both Algorithms 2 and 3 reach zero KL divergence once direct\nsampling is possible. The Gaussian mechanism must always add nonzero noise. As \u0001 \u2192 0, Algorithm\n3 approaches a point mass distribution heavily penalized by the KL divergence. Due to its projection\nstep, the Gaussian Mechanism follows a bimodal distribution as \u0001 \u2192 0. Algorithm 2 degrades to\nthe prior, with modest KL divergence. When \u03bb = 15 > \u03bb\u2217, the divergences for Algorithms 2 and 3\nare bounded away from 0, while the Gaussian mechanism still approaches the truth as \u0001 \u2192 \u221e. In a\nnon-private setting, the KL divergence would be zero.\nFinally, we plot log p(XH|\u03b8) as a function of \u0001, where \u03b8 comes from one of the mechanisms applied\nto X. Both X and XH consist of 100 Bernoulli trials with proportion parameter \u03c1 = 0.5. This\n\n7\n\n\f(a) Algorithm 2\n\n(b) Algorithm 3\n\nFigure 1: Illustration of Potential (\u03bb, \u0001)-RDP Curves for Exponential Family Sampling.\n\n(a) KL: \u03bb = 2 < \u03bb\u2217\n\n(b) KL: \u03bb = 15 > \u03bb\u2217\n\n(c) \u2212 log p(XH ): \u03bb = 2 (d) \u2212 log p(XH ): \u03bb = 15\n\nFigure 2: Exponential Family Synthetic Data Experiments.\n\nexperiment was run 10000 times, and we report the mean and standard deviation. Similar to the\nprevious section, we have a \ufb01xed prior of \u03b70 = (6, 18). The results are shown for \u03bb = 2 in (c) and\nfor \u03bb = 15 in (d) of 2. These results agree with the limit behaviors in the KL test. This experiment is\nmore favorable for Algorithm 3, as it degrades only to the log likelihood under the mode of the prior.\nIn this plot, we have included sampling from the true posterior as a non-private baseline.\n\n5.2 Real Data: Bayesian Logistic Regression Experiments\n\n\u03b2 -differential privacy, which implies (\u03bb, 4c2\u03c1\n\n\u03c1 , which is quite high in practice as n is the dataset size.\n\nWe now experiment with Bayesian logistic regression with Gaussian prior on three real datasets. We\nconsider three algorithms \u2013 Algorithm 4 and 5, as well as the OPS algorithm proposed in [18] as a\nsanity check. OPS achieves pure differential privacy when the posterior has bounded support; for this\nalgorithm, we thus truncate the Gaussian prior to make its support the L2 ball of radius c/\u03b2, which is\nthe smallest data-independent ball guaranteed to contain the MAP classi\ufb01er.\nAchievable Privacy Levels. We consider the achievable RDP guarantees for our algorithms and\nOPS under the same set of parameters \u03b2, c, \u03c1 and B = 1. [18] shows that with the truncated prior,\n\u03b2 )-RDP for all \u03bb \u2208 [1,\u221e]; whereas\nOPS guarantees 4c2\u03c1\nn\u03b2 \u03bb)-RDP for all \u03bb \u2265 1. Therefore our algorithm achieves better\nour algorithm guarantees (\u03bb, 2c2\u03c12\nRDP guarantees at \u03bb \u2264 2n\nPrivacy-Utility: Test Log-Likelihood and Error. We conduct Bayesian logistic regression on three\nreal datasets: Abalone, Adult and MNIST. We perform binary classi\ufb01cation tasks: abalones with less\nthan 10 rings vs. the rest for Abalone, digit 3 vs. digit 8 for MNIST, and income \u2264 50K vs. > 50K\nfor Adult. We encode all categorical features with one-hot encoding, resulting in 9 dimensions for\nAbalone, 100 dimensions for Adult and 784 dimensions in MNIST. We then scale each feature to\nrange from [\u22120.5, 0.5], and normalize each example to norm 1. 1/3 of the each dataset is used for\ntesting, and the rest for training. Abalone has 2784 training and 1393 test samples, Adult has 32561\nand 16281, and MNIST has 7988 and 3994 respectively.\nFor all algorithms, we use an original Gaussian prior with \u03b2 = 10\u22123. The posterior sampling is done\nusing slice sampling with 1000 burn-in samples. Notice that slice sampling does not give samples\nfrom the exact posterior. However, a number of MCMC methods are known to converge in total\nvariational distance in time polynomial in the data dimension for log-concave posteriors (which is\nthe case here) [17]. Thus, provided that the burn-in period is long enough, we expect the induced\n\n8\n\n0510152000.511.52r = 0.1r = 0.3r = 0.5r = 0.7r = 0.9direct posterior0510152000.511.52m = 0.1m = 0.3m = 0.5m = 0.7m = 0.9direct posterior-10-50KL divergence00.20.40.6Alg. 2Alg. 3Gauss.Mech.-10-50KL divergence00.20.40.6Alg. 2Alg. 3Gauss.Mech.-15-10-50- log-likelihood010203040506070Alg. 2Alg. 3Gauss.Mech.True Post.-15-10-50- log-likelihood010203040506070Alg. 2Alg. 3Gauss.Mech.True Post.\f(a) Abalone.\n(c) MNIST 3vs8.\nFigure 3: Test error vs. privacy parameter \u0001. \u03bb = 1, 10, 100 from top to bottom.\n\n(b) Adult.\n\ndistribution to be quite close, and we leave an exact RDP analysis of the MCMC sampling as future\nwork. For privacy parameters, we set \u03bb = 1, 10, 100 and \u0001 \u2208 {e\u22125, e\u22124, . . . , e3}. Figure 3 shows the\ntest error averaged over 50 repeated runs. More experiments for test log-likelihood presented in the\nAppendix.\nWe see that both Algorithm 4 and 5 achieve lower test error than OPS at all privacy levels and across\nall datasets. This is to be expected, since OPS guarantees pure differential privacy which is stronger\nthan RDP. Comparing Algorithm 4 and 5, we can see that the latter always achieves better utility.\n\n6 Conclusion\n\nThe inherent randomness of posterior sampling and the mitigating in\ufb02uence of a prior can be made\nto offer a wide range of privacy guarantees. Our proposed methods outperform existing methods in\nspeci\ufb01c situations. The privacy analyses of the mechanisms \ufb01t nicely into the recently introduced RDP\nframework, which continues to present itself as a relaxation of DP worthy of further investigation.\n\nAcknowledgements\n\nThis work was partially supported by NSF under IIS 1253942, ONR under N00014-16-1-2616, and a\nGoogle Faculty Research Award.\n\nReferences\n[1] M. Bun, K. Nissim, U. Stemmer, and S. Vadhan. Differentially private release and learning of threshold\nfunctions. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages\n634\u2013649. IEEE, 2015.\n\n[2] M. Bun and T. Steinke. Concentrated differential privacy: Simpli\ufb01cations, extensions, and lower bounds.\n\nIn Theory of Cryptography Conference, pages 635\u2013658. Springer, 2016.\n\n9\n\n-4-2020.30.40.50.6Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.10.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.30.40.50.6Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.10.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.30.40.50.6Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior-4-2020.10.20.30.40.5Test errorConcentratedDiffuseOPSTrue Posterior\f[3] K. Chaudhuri, D. Hsu, and S. Song. The large margin mechanism for differentially private maximization.\n\nIn Neural Inf. Processing Systems, 2014.\n\n[4] K. Chaudhuri and N. Mishra. When random sampling preserves privacy.\n\nCryptology Conference, pages 198\u2013213. Springer, 2006.\n\nIn Annual International\n\n[5] C. Dimitrakakis, B. Nelson, A. Mitrokotsa, and B. I. Rubinstein. Robust and private Bayesian inference.\n\nIn International Conference on Algorithmic Learning Theory, pages 291\u2013305. Springer, 2014.\n\n[6] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via\ndistributed noise generation. In Annual International Conference on the Theory and Applications of\nCryptographic Techniques, pages 486\u2013503. Springer, 2006.\n\n[7] C. Dwork and J. Lei. Differential privacy and robust statistics. In Proceedings of the forty-\ufb01rst annual\n\nACM symposium on Theory of computing, pages 371\u2013380. ACM, 2009.\n\n[8] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy, volume 9. Now Publishers,\n\nInc., 2014.\n\n[9] C. Dwork and G. N. Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016.\n\n[10] J. Foulds, J. Geumlek, M. Welling, and K. Chaudhuri. On the theory and practice of privacy-preserving\nBayesian data analysis. In Proceedings of the 32nd Conference on Uncertainty in Arti\ufb01cial Intelligence\n(UAI), 2016.\n\n[11] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on\nthe map. In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pages 277\u2013286.\nIEEE, 2008.\n\n[12] F. McSherry. How many secrets do you have? https://github.com/frankmcsherry/blog/blob/\n\nmaster/posts/2017-02-08.md, 2017.\n\n[13] K. Minami, H. Arai, I. Sato, and H. Nakagawa. Differential privacy without sensitivity. In Advances in\n\nNeural Information Processing Systems, pages 956\u2013964, 2016.\n\n[14] I. Mironov. R\u00e9nyi differential privacy. In Proceedings of IEEE 30th Computer Security Foundations\n\nSymposium CSF 2017, pages 263\u2013275. IEEE, 2017.\n\n[15] A. D. Sarwate and K. Chaudhuri. Signal processing and machine learning with differential privacy:\n\nAlgorithms and challenges for continuous data. IEEE signal processing magazine, 30(5):86\u201394, 2013.\n\n[16] A. G. Thakurta and A. Smith. Differentially private feature selection via stability arguments, and the\n\nrobustness of the lasso. In Conference on Learning Theory, pages 819\u2013850, 2013.\n\n[17] S. Vempala. Geometric random walks: a survey. Combinatorial and computational geometry, 52(573-\n\n612):2, 2005.\n\n[18] Y.-X. Wang, S. E. Fienberg, and A. Smola. Privacy for free: Posterior sampling and stochastic gradient\nMonte Carlo. Proceedings of The 32nd International Conference on Machine Learning (ICML), pages\n2493\u2013\u20132502, 2015.\n\n[19] Y.-X. Wang, J. Lei, and S. E. Fienberg. On-average kl-privacy and its equivalence to generalization for\nmax-entropy mechanisms. In International Conference on Privacy in Statistical Databases, pages 121\u2013134.\nSpringer, 2016.\n\n[20] Z. Zhang, B. Rubinstein, and C. Dimitrakakis. On the differential privacy of Bayesian inference. In\n\nProceedings of the Thirtieth AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2016.\n\n10\n\n\f", "award": [], "sourceid": 2739, "authors": [{"given_name": "Joseph", "family_name": "Geumlek", "institution": "UCSD"}, {"given_name": "Shuang", "family_name": "Song", "institution": "UC San Diego"}, {"given_name": "Kamalika", "family_name": "Chaudhuri", "institution": "UCSD"}]}