{"title": "Privacy Odometers and Filters: Pay-as-you-Go Composition", "book": "Advances in Neural Information Processing Systems", "page_first": 1921, "page_last": 1929, "abstract": "In this paper we initiate the study of adaptive composition in differential privacy when the length of the composition, and the privacy parameters themselves can be chosen adaptively, as a function of the outcome of previously run analyses. This case is much more delicate than the setting covered by existing composition theorems, in which the algorithms themselves can be chosen adaptively, but the privacy parameters must be fixed up front. Indeed, it isn't even clear how to define differential privacy in the adaptive parameter setting. We proceed by defining two objects which cover the two main use cases of composition theorems. A privacy filter is a stopping time rule that allows an analyst to halt a computation before his pre-specified privacy budget is exceeded. A privacy odometer allows the analyst to track realized privacy loss as he goes, without needing to pre-specify a privacy budget. We show that unlike the case in which privacy parameters are fixed, in the adaptive parameter setting, these two use cases are distinct. We show that there exist privacy filters with bounds comparable (up to constants) with existing privacy composition theorems. We also give a privacy odometer that nearly matches non-adaptive private composition theorems, but is sometimes worse by a small asymptotic factor. Moreover, we show that this is inherent, and that any valid privacy odometer in the adaptive parameter setting must lose this factor, which shows a formal separation between the filter and odometer use-cases.", "full_text": "Privacy Odometers and Filters: Pay-as-you-Go\n\nComposition\n\nRyan Rogers\u2217\n\nAaron Roth\u2020\n\nJonathan Ullman\u2021\n\nSalil Vadhan\u00a7\n\nAbstract\n\nIn this paper we initiate the study of adaptive composition in differential privacy\nwhen the length of the composition, and the privacy parameters themselves can\nbe chosen adaptively, as a function of the outcome of previously run analyses.\nThis case is much more delicate than the setting covered by existing composition\ntheorems, in which the algorithms themselves can be chosen adaptively, but the\nprivacy parameters must be \ufb01xed up front. Indeed, it isn\u2019t even clear how to de\ufb01ne\ndifferential privacy in the adaptive parameter setting. We proceed by de\ufb01ning two\nobjects which cover the two main use cases of composition theorems. A privacy\n\ufb01lter is a stopping time rule that allows an analyst to halt a computation before his\npre-speci\ufb01ed privacy budget is exceeded. A privacy odometer allows the analyst\nto track realized privacy loss as he goes, without needing to pre-specify a privacy\nbudget. We show that unlike the case in which privacy parameters are \ufb01xed, in the\nadaptive parameter setting, these two use cases are distinct. We show that there\nexist privacy \ufb01lters with bounds comparable (up to constants) with existing pri-\nvacy composition theorems. We also give a privacy odometer that nearly matches\nnon-adaptive private composition theorems, but is sometimes worse by a small\nasymptotic factor. Moreover, we show that this is inherent, and that any valid\nprivacy odometer in the adaptive parameter setting must lose this factor, which\nshows a formal separation between the \ufb01lter and odometer use-cases.\n\n1\n\nIntroduction\n\nDifferential privacy [DMNS06] is a stability condition on a randomized algorithm, designed to guar-\nantee individual-level privacy during data analysis. Informally, an algorithm is differentially private\nif any pair of close inputs map to similar probability distributions over outputs, where similarity is\nmeasured by two parameters \u03b5 and \u03b4. Informally, \u03b5 measures the amount of privacy and \u03b4 measures\nthe failure probability that the privacy loss is much worse than \u03b5. A signature property of differential\nprivacy is that it is preserved under composition\u2014combining many differentially private subroutines\ninto a single algorithm preserves differential privacy and the privacy parameters degrade gracefully.\nComposability is essential for both privacy and for algorithm design. Since differential privacy is\ncomposable, we can design a sophisticated algorithm and prove it is private without having to rea-\n\u2217Department of Applied Mathematics and Computational Science, University of Pennsylvania.\n\u2020Department\n\nPennsylvania.\nSupported in part by an NSF CAREER award, NSF grant CNS-1513694,\n\nryrogers@sas.upenn.edu.\n\nof\n\nComputer\n\nUniversity\n\nof\n\naaroth@cis.upenn.edu.\nand a grant from the Sloan Foundation.\n\nand\n\nInformation\n\nSciences,\n\n\u2021College of Computer and Information Science, Northeastern University. jullman@ccs.neu.edu\n\u00a7Center for Research on Computation & Society and John A. Paulson School of Engineering & Applied Sci-\nences, Harvard University. salil@seas.harvard.edu. Work done while visiting the Department of Applied\nMathematics and the Shing-Tung Yau Center at National Chiao-Tung University in Taiwan. Also supported by\nNSF grant CNS-1237235, a grant from the Sloan Foundation, and a Simons Investigator Award.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fson directly about its output distribution. Instead, we can rely on the differential privacy of the basic\nbuilding blocks and derive a privacy bound on the whole algorithm using the composition rules.\nThe composition theorem for differential privacy is very strong, and holds even if the choice of\nwhich differentially private subroutine to run is adaptive\u2014that is, the choice of the next algorithm\nmay depend on the output of previous algorithms. This property is essential in algorithm design,\nbut also more generally in modeling unstructured sequences of data analyses that might be run\nby a human data analyst, or even by many data analysts on the same data set, while only loosely\ncoordinating with one another. Even setting aside privacy, it can be very challenging to analyze\nthe statistical properties of general adaptive procedures for analyzing a dataset, and the fact that\nadaptively chosen differentially private algorithms compose has recently been used to give strong\nguarantees of statistical validity for adaptive data analysis [DFH+15, BNS+16].\nHowever, all the known composition theorems for differential privacy [DMNS06, DKM+06,\nDRV10, KOV15, MV16] have an important and generally overlooked caveat. Although the choice\nof the next subroutine in the composition may be adaptive, the number of subroutines called and\nchoice of the privacy parameters \u03b5 and \u03b4 for each subroutine must be \ufb01xed in advance. Indeed, it is\nnot even clear how to de\ufb01ne differential privacy if the privacy parameters are not \ufb01xed in advance.\nThis is generally acceptable when designing a single algorithm (that has a worst-case analysis),\nsince worst-case eventualities need to be anticipated and budgeted for in order to prove a theorem.\nHowever, it is not acceptable when modeling the unstructured adaptivity of a data analyst, who may\nnot know ahead of time (before seeing the results of intermediate analyses) what he wants to do with\nthe data. When controlling privacy loss across multiple data analysts, the problem is even worse.\nAs a simple stylized example, suppose that A is some algorithm (possibly modeling a human data\nanalyst) for selecting statistical queries5 as a function of the answers to previously selected queries.\nIt is known that for any one statistical query q and any data set x, releasing the perturbed answer\n\u02c6a = q(x)+Z where Z \u223c Lap(1/\u03b5) is a Laplace random variable, ensures (\u03b5, 0)-differential privacy.\nComposition theorems allow us to reason about the composition of k such operations, where the\nqueries can be chosen adaptively by A, as in the following simple program.\nExample1(x):\n\nFor i = 1 to k: Let qi = A(\u02c6a1, . . . , \u02c6ai\u22121) and let \u02c6ai = qi(x) + Lap(1/\u03b5).\nOutput (\u02c6a1, . . . , \u02c6ak).\n\nThe \u201cbasic\u201d composition theorem [DMNS06] asserts that Example1 is (\u03b5k, 0)-differentially private.\nThe \u201cadvanced\u201d composition theorem [DRV10] gives a more sophisticated bound and asserts that\n\n(provided that \u03b5 is suf\ufb01ciently small), the algorithm satis\ufb01es (\u03b5(cid:112)8k ln(1/\u03b4), \u03b4)-differential privacy\n\nfor any \u03b4 > 0. There is even an \u201coptimal\u201d composition theorem [KOV15] too complicated to de-\nscribe here. These analyses crucially assume that both the number of iterations k and the parameter\n\u03b5 are \ufb01xed up front, even though it allows for the queries qi to be adaptively chosen.6\nNow consider a similar example where the number of iterations is not \ufb01xed up front, but actually\ndepends on the answers to previous queries. This is a special case of a more general setting where the\nprivacy parameter \u03b5i in every round may be chosen adaptively\u2014halting in our example is equivalent\nto setting \u03b5i = 0 in all future rounds.\nExample2(x, \u03c4 ):\n\nLet i \u2190 1, \u02c6a1 \u2190 q1(x) + Lap(1/\u03b5).\nWhile \u02c6ai \u2264 \u03c4: Let i \u2190 i + 1, qi = A(\u02c6a1, . . . , \u02c6ai\u22121), and let \u02c6ai = qi(x) + Lap(1/\u03b5).\nOutput (\u02c6a1, . . . , \u02c6ai).\n\nExample2 cannot be said to be differentially private ex ante for any non-trivial \ufb01xed values of \u03b5 and \u03b4,\nbecause the computation might run for an arbitrarily long time and privacy may degrade inde\ufb01nitely.\nWhat can we say about privacy after we run the algorithm? If the algorithm/data-analyst happens to\nstop after k rounds, can we apply the composition theorem ex post to conclude that it is (\u03b5k, 0)- and\n\n5A statistical query is parameterized by a predicate \u03c6, and asks \u201chow many elements of the dataset satisfy\n\n\u03c6?\u201d Changing a single element of the dataset can change the answer to the statistical query by at most 1.\n\nthey are all \ufb01xed in advance. For basic composition \u03b5k is replaced with(cid:80)k\n\n6The same analysis holds for hetereogeneous parameters (\u03b51, . . . , \u03b5k) are used in each round as long as\ni=1 \u03b5i and for advanced composition\n\n\u221a\nk is replaced with\n\n\u03b5\n\n(cid:113)(cid:80)k\n\ni .\ni=1 \u03b52\n\n2\n\n\f(\u03b5(cid:112)8k log(1/\u03b4), \u03b4)-differentially private, as we could if the algorithm were constrained to always\n\nrun for at most k rounds?\nIn this paper, we study the composition properties of differential privacy when everything\u2014the\nchoice of algorithms, the number of rounds, and the privacy parameters in each round\u2014may be\nadaptively chosen. We show that this setting is much more delicate than the settings covered by\npreviously known composition theorems, but that these sorts of ex post privacy bounds do hold\nwith only a small (but in some cases unavoidable) loss over the standard setting. We note that the\nconceptual discussion of differential privacy focuses a lot on the idea of arbitrary composition and\nour results give more support for this conceptual interpretation.\n\n1.1 Our Results\n\nWe give a formal framework for reasoning about the adaptive composition of differentially private\nalgorithms when the privacy parameters themselves can be chosen adaptively. When the parameters\nare chosen non-adaptively, a composition theorem gives a high probability bound on the worst case\nprivacy loss that results from the output of an algorithm. In the adaptive parameter setting, it no\nlonger makes sense to have \ufb01xed bounds on the privacy loss. Instead, we propose two kinds of\nprimitives capturing two natural use cases for composition theorems:\n\n1. A privacy odometer takes as input a global failure parameter \u03b4g. After every round i in the\ncomposition of differentially private algorithms, the odometer outputs a number \u03c4i that may\ndepend on the realized privacy parameters \u03b5i, \u03b4i in the previous rounds. The privacy odometer\nguarantees that with probability 1 \u2212 \u03b4g, for every round i, \u03c4i is an upper bound on the privacy\nloss in round i.\n\n2. A privacy \ufb01lter is a way to cut off access to the dataset when the privacy loss is too large. It\ntakes as input a global privacy \u201cbudget\u201d (\u03b5g, \u03b4g). After every round, it either outputs CONT (\u201ccon-\ntinue\u201d) or HALT depending on the privacy parameters from the previous rounds. The privacy \ufb01lter\nguarantees that with probability 1 \u2212 \u03b4g, it will output HALT before the privacy loss exceeds \u03b5g.\nWhen used, it guarantees that the resulting interaction is (\u03b5g, \u03b4g)-DP.\n\nA tempting heuristic is to take the realized privacy parameters \u03b51, \u03b41, . . . , \u03b5i, \u03b4i and apply one of the\nexisting composition theorems to those parameters, using that value as a privacy odometer or im-\nplementing a privacy \ufb01lter by halting when getting a value that exceeds the global budget. However\nthis heuristic does not necessarily give valid bounds.\nWe \ufb01rst prove that the heuristic does work for the basic composition theorem [DMNS06] in which\nthe parameters \u03b5i and \u03b4i add up. We prove that summing the realized privacy parameters yields both\na valid privacy odometer and \ufb01lter. The idea of a privacy \ufb01lter was also considered in [ES15], who\nshow that basic composition works in the privacy \ufb01lter application.\nWe then show that the heuristic breaks for the advanced composition theorem [DRV10]. However,\nwe give a valid privacy \ufb01lter that gives the same asymptotic bound as the advanced composition\ntheorem, albeit with worse constants. On the other hand, we show that, in some parameter regimes,\nthe asymptotic bounds given by our privacy \ufb01lter cannot be achieved by a privacy odometer. This\nresult gives a formal separation between the two models when the parameters may be chosen adap-\ntively, which does not exist when the privacy parameters are \ufb01xed. Finally, we give a valid privacy\nodometer with a bound that is only slightly worse asymptotically than the bound that the advanced\ncomposition theorem would give if it were used (improperly) as a heuristic. Our bound is worse\n\nby a factor that is never larger than(cid:112)log log(n) (here, n is the size of the dataset) and for some\n\nparameter regimes is only a constant.\n\n2 Privacy Preliminaries\n\nDifferential privacy is de\ufb01ned based on the following notion of similarity between two distributions.\nDe\ufb01nition 2.1 (Indistinguishable). Two random variables X and Y taking values from domain\nD are (\u03b5, \u03b4)-indistinguishable, denoted as X \u2248\u03b5,\u03b4 Y , if \u2200S \u2286 D, P [X \u2208 S] \u2264 e\u03b5P [Y \u2208 S] +\n\u03b4 and P [Y \u2208 S] \u2264 e\u03b5P [X \u2208 S] + \u03b4.\n\n3\n\n\fe\u03b5\u03b5\n\nThere is a slight variant of indistinguishability, called point-wise indistinguishability, which is nearly\nequivalent, but will be the more convenient notion for the generalizations we give in this paper.\nDe\ufb01nition 2.2 (Point-wise Indistinguishable). Two random variables X and Y taking values from\nD are (\u03b5, \u03b4)-point-wise indistinguishable if with probability at least 1 \u2212 \u03b4 over either a \u223c X or\na \u223c Y , we have\nLemma 2.3 ([KS14]). Let X and Y be two random variables taking values from D. If X and\nY are (\u03b5, \u03b4)-point-wise indistinguishable, then X \u2248\u03b5,\u03b4 Y . Also, if X \u2248\u03b5,\u03b4 Y then X and Y are\n\n(cid:17)(cid:12)(cid:12)(cid:12) \u2264 \u03b5.\n(cid:1)-point-wise indistinguishable.\n\n(cid:16) P[X=a]\n\n(cid:0)2\u03b5, 2\u03b4\n\n(cid:12)(cid:12)(cid:12)log\n\nP[Y =a]\n\n(cid:17)\n\n(cid:16) P[M(x)=a]\n\nWe say two databases x, x(cid:48) \u2208 X n are neighboring if they differ in at most one entry, i.e. if there\nexists an index i \u2208 [n] such that x\u2212i = x(cid:48)\n\u2212i. We can now state differential privacy in terms of\nindistinguishability.\nDe\ufb01nition 2.4 (Differential Privacy [DMNS06]). A randomized algorithm M : X n \u2192 Y with\narbitrary output range Y is (\u03b5, \u03b4)-differentially private (DP) if for every pair of neighboring databases\nx, x(cid:48): M(x) \u2248\u03b5,\u03b4 M(x(cid:48)).\nWe then de\ufb01ne the privacy loss LossM(a; x, x(cid:48)) for outcome a \u2208 Y and neighboring datasets\nx, x(cid:48) \u2208 X n as LossM(a; x, x(cid:48)) = log\nif we can bound\nLossM(a; x, x(cid:48)) for any neighboring datasets x, x(cid:48) with high probability over a \u223c M(x), then\nTheorem 2.3 tells us that M is differentially private. Moreover, Theorem 2.3 also implies that\nthis approach is without loss of generality (up to a small difference in the parameters). Thus, our\ncomposition theorems will focus on bounding the privacy loss with high probability.\nA useful property of differential privacy is that it is preserved under post-processing without degrad-\ning the parameters:\nTheorem 2.5 (Post-Processing [DMNS06]). Let M : X n \u2192 Y be (\u03b5, \u03b4)-DP and f : Y \u2192 Y(cid:48) be\nany randomized algorithm. Then f \u25e6 M : X n \u2192 Y(cid:48) is (\u03b5, \u03b4)-DP.\nWe next recall a useful characterization from [KOV15]: any DP algorithm can be written as the\npost-processing of a simple, canonical algorithm which is a generalization of randomized response.\nDe\ufb01nition 2.6. For any \u03b5, \u03b4 \u2265 0, we de\ufb01ne the randomized response algorithm RR\u03b5,\u03b4 : {0, 1} \u2192\n{0,(cid:62),\u22a5, 1} as the following (Note that if \u03b4 = 0, we will simply write the algorithm RR\u03b5,\u03b4 as RR\u03b5.)\n\n. We note that\n\nP[M(x(cid:48))=a]\n\nP [RR\u03b5,\u03b4(0) = 0] = \u03b4\nP [RR\u03b5,\u03b4(0) = (cid:62)] = (1 \u2212 \u03b4) e\u03b5\nP [RR\u03b5,\u03b4(0) = \u22a5] = (1 \u2212 \u03b4)\nP [RR\u03b5,\u03b4(0) = 1] = 0\n\n1\n\n1+e\u03b5\n\n1+e\u03b5\n\nP [RR\u03b5,\u03b4(1) = 0] = 0\nP [RR\u03b5,\u03b4(1) = (cid:62)] = (1 \u2212 \u03b4)\nP [RR\u03b5,\u03b4(1) = \u22a5] = (1 \u2212 \u03b4) e\u03b5\nP [RR\u03b5,\u03b4(1) = 1] = \u03b4\n\n1\n\n1+e\u03b5\n\n1+e\u03b5\n\nKairouz, Oh, and Viswanath [KOV15] show that any (\u03b5, \u03b4)\u2013DP algorithm can be viewed as a post-\nprocessing of the output of RR\u03b5,\u03b4 for an appropriately chosen input.\nTheorem 2.7 ([KOV15], see also [MV16]). For every (\u03b5, \u03b4)-DP algorithm M and for all neighbor-\ning databases x0 and x1, there exists a randomized algorithm T where T (RR\u03b5,\u03b4(b)) is identically\ndistributed to M(xb) for b \u2208 {0, 1}.\nThis theorem will be useful in our analyses, because it allows us to without loss of generality analyze\ncompositions of these simple algorithms RR\u03b5,\u03b4 with varying privacy parameters.\nWe now de\ufb01ne the adaptive composition of differentially private algorithms in the setting introduced\nby [DRV10] and then extended to heterogenous privacy parameters in [MV16], in which all of the\nprivacy parameters are \ufb01xed prior to the start of the computation. The following \u201ccomposition\ngame\u201d is an abstract model of composition in which an adversary can adaptively select between\nneighboring datasets at each round, as well as a differentially private algorithm to run at each round\n\u2013 both choices can be a function of the realized outcomes of all previous rounds. However, crucially,\nthe adversary must select at each round an algorithm that satis\ufb01es the privacy parameters which\nhave been \ufb01xed ahead of time \u2013 the choice of parameters cannot itself be a function of the realized\noutcomes of previous rounds. We de\ufb01ne this model of interaction formally in Algorithm 1 where\nthe output is the view of the adversary A which includes any random coins she uses RA and the\noutcomes A1,\u00b7\u00b7\u00b7 , Ak of every round.\n\n4\n\n\fAlgorithm 1 FixedParamComp(A,E = (E1,\u00b7\u00b7\u00b7 ,Ek), b), where A is a randomized algorithm,\nE1,\u00b7\u00b7\u00b7 ,Ek are classes of randomized algorithms, and b \u2208 {0, 1}.\n\nSelect coin tosses RbA for A uniformly at random.\nfor i = 1,\u00b7\u00b7\u00b7 , k do\nA = A(RbA, Ab\nA receives Ab\n\n1,\u00b7\u00b7\u00b7 , Ab\ni = Mi(xi,b)\nreturn view V b = (RbA, Ab\n\ni\u22121) gives neighboring datasets xi,0, xi,1, and Mi \u2208 Ei\n1,\u00b7\u00b7\u00b7 , Ab\nk)\n\nDe\ufb01nition 2.8 (Adaptive Composition [DRV10], [MV16]). We say that the sequence of parameters\n\u03b51,\u00b7\u00b7\u00b7 , \u03b5k \u2265 0, \u03b41,\u00b7\u00b7\u00b7 , \u03b4k \u2208 [0, 1) satis\ufb01es (\u03b5g, \u03b4g)-differential privacy under adaptive composition\nif for every adversary A, and E = (E1,\u00b7\u00b7\u00b7 ,Ek) where Ei is the class of (\u03b5i, \u03b4i)-DP algorithms, we\nhave FixedParamComp(A,E,\u00b7) is (\u03b5g, \u03b4g)-DP in its last argument, i.e. V 0 \u2248\u03b5g,\u03b4g V 1.\nWe \ufb01rst state a basic composition theorem which shows that the adaptive composition satis\ufb01es dif-\nferential privacy where \u201cthe parameters just add up.\u201d\nTheorem 2.9 (Basic Composition [DMNS06], [DKM+06]). The sequence \u03b51,\u00b7\u00b7\u00b7 , \u03b5k and \u03b41,\u00b7\u00b7\u00b7 \u03b4k\ni=1 \u03b5i, and \u03b4g =\n\nsatis\ufb01es (\u03b5g, \u03b4g)-differential privacy under adaptive composition where \u03b5g = (cid:80)k\n(cid:80)k\n\ni=1 \u03b4i.\n\nWe now state the advanced composition bound from [DRV10] which gives a quadratic improvement\nto the basic composition bound.\nTheorem 2.10 (Advanced Composition). For any \u02c6\u03b4 > 0, the sequence \u03b51,\u00b7\u00b7\u00b7 , \u03b5k and \u03b41,\u00b7\u00b7\u00b7 \u03b4k\nwhere \u03b5 = \u03b5i and \u03b4 = \u03b4i for all i \u2208 [k] satis\ufb01es (\u03b5g, \u03b4g)-differential privacy under adaptive\ncomposition where \u03b5g = \u03b5 (e\u03b5 \u2212 1) k + \u03b5\n\n2k log(1/\u02c6\u03b4), and \u03b4g = k\u03b4 + \u02c6\u03b4.\n\n(cid:113)\n\nThis theorem can be easily generalized to hold for values of \u03b5i that are not all equal (as done in\n[KOV15]). However, this is not as all-encompassing as it would appear at \ufb01rst blush, because this\nstraightforward generalization would not allow for the values of \u03b5i and \u03b4i to be chosen adaptively by\nthe data analyst. Indeed,the de\ufb01nition of differential privacy itself (De\ufb01nition 2.4) does not straight-\nforwardly extend to this case. The remainder of this paper is devoted to laying out a framework for\nsensibly talking about the privacy parameters \u03b5i and \u03b4i being chosen adaptively by the data analyst,\nand to prove composition theorems (including an analogue of Theorem 2.10) in this model.\n\n3 Composition with Adaptively Chosen Parameters\n\nWe now introduce the model of composition with adaptive parameter selection, and de\ufb01ne privacy\nin this setting.\nWe want to model composition as in the previous section, but allow the adversary the ability to also\nchoose the privacy parameters (\u03b5i, \u03b4i) as a function of previous rounds of interaction. We will de\ufb01ne\nthe view of the interaction, similar to the view in FixedParamComp, to be the tuple that includes A\u2019s\nrandom coin tosses RA and the outcomes A = (A1,\u00b7\u00b7\u00b7 , Ak) of the algorithms she chose. Formally,\nwe de\ufb01ne an adaptively chosen privacy parameter composition game in Algorithm 2 which takes as\ninput an adversary A, a number of rounds of interaction k,7 and an experiment parameter b \u2208 {0, 1}.\n\nWe then de\ufb01ne the privacy loss with respect to AdaptParamComp(A, k, b) in the following way\nfor a \ufb01xed view v = (r, a) where r represents the random coin tosses of A and we write v<i =\n\n7Note that in the adaptive parameter composition game, the adversary has the option of effectively stopping\nthe composition early at some round k(cid:48) < k by simply setting \u03b5i = \u03b4i = 0 for all rounds i > k(cid:48). Hence, the\nparameter k will not appear in our composition theorems the way it does when privacy parameters are \ufb01xed.\nThis means that we can effectively take k to be in\ufb01nite. For technical reasons, it is simpler to have a \ufb01nite\nparameter k, but the reader should imagine it as being an enormous number.\n\n5\n\n\fi\u22121) gives neighboring xi,0, xi,1, parameters (\u03b5i, \u03b4i), Mi that is\n\nAlgorithm 2 AdaptParamComp(A, k, b)\n\nSelect coin tosses RbA for A uniformly at random.\nfor i = 1,\u00b7\u00b7\u00b7 , k do\nA = A(RbA, Ab\n(\u03b5i, \u03b4i)-DP\nA receives Ab\n\n1,\u00b7\u00b7\u00b7 , Ab\ni = Mi(xi,b)\nreturn view V b = (RbA, Ab\n\n1,\u00b7\u00b7\u00b7 , Ab\nk)\n(cid:33)\nk(cid:88)\n\n=\n\ni=1\n\n(r, a1,\u00b7\u00b7\u00b7 , ai\u22121):\n\nLoss(v) = log\n\n(cid:32)P(cid:2)V 0 = v(cid:3)\n\nP [V 1 = v]\n\n(cid:32)P(cid:2)Mi(xi,0) = vi|v<i\n\nP [Mi(xi,1) = vi|v<i]\n\n(cid:33)\n\n(cid:3)\n\nlog\n\nk(cid:88)\n\ni=1\n\ndef\n=\n\nLossi(v\u2264i).\n\n(1)\n\nNote that the privacy parameters (\u03b5i, \u03b4i) depend on the previous outcomes that A receives. We will\nfrequently shorten our notation \u03b5t = \u03b5t(v<t) and \u03b4t = \u03b4t(v<t) when the outcome is understood.\nIt no longer makes sense to claim that the privacy loss of the adaptive parameter composition ex-\nperiment is bounded by any \ufb01xed constant, because the privacy parameters (with which we would\npresumably want to use to bound the privacy loss) are themselves random variables. Instead, we\nde\ufb01ne two objects which can be used by a data analyst to control the privacy loss of an adaptive\ncomposition of algorithms.\nThe \ufb01rst object, which we call a privacy odometer will be parameterized by one global parameter\n\u03b4g and will provide a running real valued output that will, with probability 1 \u2212 \u03b4g, upper bound the\nprivacy loss at each round of any adaptive composition in terms of the realized values of \u03b5i and \u03b4i\nselected at each round.\nDe\ufb01nition 3.1 (Privacy Odometer). A function COMP\u03b4g : R2k\u22650 \u2192 R \u222a {\u221e} is a valid privacy\nodometer if for all adversaries in AdaptParamComp(A, k, b), with probability at most \u03b4g over v \u223c\nV 0: |Loss(v)| > COMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) .\nThe second object, which we call a privacy \ufb01lter, is a stopping time rule. It takes two global parame-\nters (\u03b5g, \u03b4g) and will at each round either output CONT or HALT. Its guarantee is that with probability\n1 \u2212 \u03b4g, it will output HALT if the privacy loss has exceeded \u03b5g.\n: R2k\u22650 \u2192 {HALT, CONT} is a valid\nDe\ufb01nition 3.2 (Privacy Filter). A function COMP\u03b5g,\u03b4g\nprivacy \ufb01lter for \u03b5g, \u03b4g \u2265 0 if for all adversaries A in AdaptParamComp(A, k, b),\nthe fol-\n|Loss(v)| >\nlowing \u201cbad event\u201d occurs with probability at most \u03b4g when v \u223c V 0:\n\u03b5g\n\nCOMP\u03b5g,\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = CONT.\n\nand\n\nthe usage of these objects.\n\nWe note two things about\nFirst, a valid privacy odometer\ncan be used to provide a running upper bound on the privacy loss at each intermediate\nthe privacy loss at round k(cid:48) < k must with high probability be upper bounded by\nround:\nCOMP\u03b4g (\u03b51, \u03b41, . . . , \u03b5k(cid:48), \u03b4k(cid:48), 0, 0, . . . , 0, 0) \u2013 i.e.\nthe bound that results by setting all future pri-\nvacy parameters to 0. This is because setting all future privacy parameters to zero is equiv-\nalent to stopping the computation at round k(cid:48), and is a feasible choice for the adaptive ad-\nversary A. Second, a privacy \ufb01lter can be used to guarantee that with high probability, the\nstated privacy budget \u03b5g is never exceeded \u2013 the data analyst at each round k(cid:48) simply queries\nCOMP\u03b5g,\u03b4g (\u03b51, \u03b41, . . . , \u03b5k(cid:48), \u03b4k(cid:48), 0, 0, . . . , 0, 0) before she runs algorithm k(cid:48), and runs it only if the\n\ufb01lter returns CONT. Again, this is guaranteed because the continuation is a feasible choice of the\nadversary, and the guarantees of both a \ufb01lter and an odometer are quanti\ufb01ed over all adversaries.\nWe \ufb01rst give an adaptive parameter version of the basic composition in Theorem 2.9. See the full\nversion for the proof.\nTheorem 3.3. For each nonnegative \u03b4g, COMP\u03b4g\n\nCOMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = \u221e if (cid:80)k\n(cid:80)k\nCOMP\u03b5g,\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = HALT if (cid:80)k\n\ni=1 \u03b5i. Additionally,\n\nfor any \u03b5g, \u03b4g \u2265 0, COMP\u03b5g,\u03b4g\n\nis a valid privacy odometer where\ni=1 \u03b4i > \u03b4g and otherwise COMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) =\nis a valid privacy \ufb01lter where\ni=1 \u03b5i > \u03b5g and CONT otherwise.\n\nt=1 \u03b4t > \u03b4g or (cid:80)k\n\n6\n\n\f4 Concentration Preliminaries\n\nWe give a useful concentration bound that will be pivotal in proving an improved valid privacy\nodometer and \ufb01lter from that given in Theorem 3.3. To set this up, we present some notation: let\n(\u2126,F, P) be a probability triple where \u2205 = F0 \u2286 F1 \u2286 \u00b7\u00b7\u00b7 \u2286 F is an increasing sequence of\n\u03c3-algebras. Let Xi be a real-valued Fi-measurable random variable, such that E [Xi|Fi\u22121] = 0 a.s.\n\u2200k \u2265 1. We\nfor each i. We then consider the martingale where M0 = 0\nuse results from [dlPKLL04] and [vdG02] to prove the following (see supplementary \ufb01le).\nTheorem 4.1. For Mk given above, if there exists two random variables Ci < Di which are Fi\u22121\nmeasurable for i \u2265 1 such that Ci \u2264 Xi \u2264 Di almost surely \u2200i \u2265 1. and we de\ufb01ne U 2\n0 = 0,\ni=1 (Di \u2212 Ci)2, \u2200k \u2265 1, then for any \ufb01xed k \u2265 1, \u03b2 > 0 and \u03b4 \u2264 1/e, we have\nand U 2\nP\n4 + \u03b2\n\nk = (cid:80)k\n(cid:114)(cid:16) U 2\n\nand Mk =(cid:80)k\n\n(cid:16) U 2\n\n|Mk| \u2265\n\n(cid:17)(cid:16)\n\n(cid:17)(cid:17)\n\ni=1 Xi,\n\n4\u03b2 + 1\n\nlog(1/\u03b4)\n\n\u2264 \u03b4.\n\n2 + log\n\n(cid:20)\n\n(cid:21)\n\nk\n\nk\n\nWe will use this martingale inequality in our analysis for deriving composition bounds for both\nprivacy \ufb01lters and odometers. The martingale we form will be the sum of the privacy loss from\na sequence of randomized response algorithms from De\ufb01nition 2.6. Note that for pure-differential\nprivacy (where \u03b4i = 0) the privacy loss at round i is then \u00b1\u03b5i, which are \ufb01xed given the previous\noutcomes. See the supplementary \ufb01le for the case when \u03b4i > 0 at each round i.\nWe then use the result from Theorem 2.7 to conclude that every differentially private algorithm is\na post processing function of randomized response. Thus determining a high probability bound on\nthe martingale formed from the sum of the privacy losses of a sequence of randomized response\nalgorithms suf\ufb01ces for computing a valid privacy \ufb01lter or odometer.\n\n5 Advanced Composition for Privacy Filters\n\nj=1\n\n\u03b52\ng\n\n(cid:118)(cid:117)(cid:117)(cid:116)2\n\n\u03b5j (e\u03b5j \u2212 1) /2\n\ni=1 \u03b4i > \u03b4g/2 or if \u03b5g is smaller than\n\nWe next show that we can essentially get the same asymptotic bound as Theorem 2.10 for the privacy\n\ufb01lter setting using the bound in Theorem 4.1 for the martingale based on the sum of privacy losses\nCOMP\u03b5g,\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = HALT if(cid:80)k\nfrom a sequence of randomized response algorithms (see the supplementary \ufb01le for more details).\nTheorem 5.1. COMP\u03b5g,\u03b4g is a valid privacy \ufb01lter for \u03b4g \u2208 (0, 1/e) and \u03b5g > 0 where\nk(cid:88)\n(cid:33)(cid:32)\n\n(cid:32) k(cid:88)\nNote that if we have(cid:80)k\nbound on the privacy loss of \u03b5(cid:112)8k log(1/\u03b4g). Note that there may be better choices for the constant\ng by in (2), but for the case when \u03b5g = \u03b5(cid:112)8k log(1/\u03b4g) and \u03b5i = \u03b5 for every\n\nin (2),\nwe are then getting the same asymptotic bound on the privacy loss as in [KOV15] and in Theo-\nrem 2.10 for the case when \u03b5i = \u03b5 for i \u2208 [k]. If k\u03b52 \u2264\n8 log(1/\u03b4g), then Theorem 2.10 gives a\n\nand otherwise COMP\u03b5g,\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = CONT.\n\nlog(1/\u03b4g)(cid:80)k\n\ni = O (1/ log(1/\u03b4g)) and set \u03b5g = \u0398\n\n(cid:18)(cid:113)(cid:80)k\n\n(cid:33)(cid:33)\n\ni log(1/\u03b4g)\n\nlog(2/\u03b4g)\n\nlog(1/\u03b4g)\n\ni=1 \u03b52\n\ni=1 \u03b52\ni\n\ni=1 \u03b52\n\n(cid:32)\n\n1 +\n\nlog\n\n1\n2\n\n(cid:19)\n\n+\n\n\u03b52\ni +\n\ni=1\n\n\u03b52\ng\n\n1\n\n+ 1\n\n(2)\n\n28.04 that we divide \u03b52\ni \u2208 [n], it is nearly optimal.\n\n6 Advanced Composition for Privacy Odometers\n\nOne might hope to achieve the same sort of bound on the privacy loss from Theorem 2.10 when\nthe privacy parameters may be chosen adversarially. However we show that this cannot be the\ncase for any valid privacy odometer. In particular, even if an adversary selects the same privacy\n\nparameter \u03b5 = o((cid:112)log(log(n)/\u03b4g)/k) each round but can adaptively select a time to stop interacting\n\n7\n\n\f(cid:19)\n\nwith AdaptParamComp (which is a restricted special case of the power of the general adversary \u2013\nstopping is equivalent to setting all future \u03b5i, \u03b4i = 0), then we show that there can be no valid\n\nprivacy odometer achieving a bound of o(\u03b5(cid:112)k log (log(n)/\u03b4g)). This gives a separation between\n\nthe achievable bounds for a valid privacy odometers and \ufb01lters. But for privacy applications, it is\nworth noting that \u03b4g is typically set to be (much) smaller than 1/n, in which case this gap disappears\n(since log(log(n)/\u03b4g) = (1 + o(1)) log(1/\u03b4g) ). We prove the following with an anti-concentration\nbound for random walks from [LT91] (see full version).\nTheorem 6.1. For any \u03b4g \u2208 (0, O(1)) there is no valid COMP\u03b4g privacy odometer where\n\n(cid:16) e\u03b5i\u22121\n\n(cid:18)(cid:113)(cid:80)k\nCOMP\u03b4g (\u03b51, 0,\u00b7\u00b7\u00b7 , \u03b5k, 0) =(cid:80)k\nthat the bound incurs an additive 1/n2 loss to the(cid:80)\n\ni=1 \u03b5i\n\n(cid:17)\n\ne\u03b5i +1\n\n+ o\n\ni \u03b52\n\nWe now give our main positive result for privacy odometers, which is similar to our privacy \ufb01lter\nin Theorem 5.1 except that \u03b4g is replaced by \u03b4g/ log(n), as is necessary from Theorem 6.1. Note\ni term that is present without privacy. In\nany reasonable setting of parameters, this translates to at most a constant-factor multiplicative loss,\nbecause there is no utility running any differentially private algorithm with \u03b5i < 1\n10n (we know\nthat if A is (\u03b5i, 0)-DP then A(x) and A(x(cid:48)) for neighboring inputs have statistical distance at most\ne\u03b5in \u2212 1 < 0.1, and hence the output is essentially independent of the input - note that a similar\nstatement holds for (\u03b5i, \u03b4i)-DP.) The proof of the following result uses Theorem 4.1 along with a\ni \u2208 [1/n2, 1]. See\nthe full version for the complete proof.\nTheorem 6.2 (Advanced Privacy Odometer). COMP\u03b4g is a valid privacy odometer for \u03b4g \u2208 (0, 1/e)\ni \u2208 [1/n2, 1] then\n\nunion bound over log(n2) choices for \u03b2, which are discretized values for(cid:80)k\nwhere COMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) = \u221e if(cid:80)k\ni=1 \u03b4i > \u03b4g/2, otherwise if(cid:80)k\n(cid:19)\n(cid:18) e\u03b5i \u2212 1\n(cid:16)\u221a\n\n(cid:17)(cid:17)\n\ni=1 \u03b52\n\ni=1 \u03b52\n\n(cid:16)\n\nk(cid:88)\n\ni=1 \u03b52\n\ni log(log(n)/\u03b4g)\n\n(cid:118)(cid:117)(cid:117)(cid:116) k(cid:88)\n(cid:32)\n\ni=1\n\n1\n2\n\n\u03b5i\n\ni=1\n\n2\n\n+ 2\n\n\u03b52\ni\n\n1 + log\n\n3\n\nlog(4 log2(n)/\u03b4g).\n\n(3)\n\nCOMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) =\n(cid:32)\n\nand if(cid:80)k\nk(cid:88)\n\n(cid:18) e\u03b5i \u2212 1\n\ni=1 \u03b52\n\n(cid:118)(cid:117)(cid:117)(cid:116)2\n\n+\n\n\u03b5i\n\ni=1\n\n2\n\ni /\u2208 [1/n2, 1] then COMP\u03b4g (\u03b51, \u03b41,\u00b7\u00b7\u00b7 , \u03b5k, \u03b4k) is equal to\n(cid:19)\nk(cid:88)\n\n(cid:33)(cid:32)\n\nk(cid:88)\n\n1 +\n\nlog\n\n1 + n2\n\n(cid:33)(cid:33)\n\n1/n2 +\n\n\u03b52\ni\n\ni=1\n\n\u03b52\ni\n\nlog(4 log2(n))/\u03b4g).\n\ni=1\n\n(4)\n\nAcknowledgements\n\nThe authors are grateful Jack Murtagh for his collaboration in the early stages of this work, and for\nsharing his preliminary results with us. We thank Andreas Haeberlen, Benjamin Pierce, and Daniel\nWinograd-Cort for helpful discussions about composition. We further thank Daniel Winograd-Cort\nfor catching an incorrectly set constant in an earlier version of Theorem 5.1.\n\n8\n\n\fReferences\n\n[BNS+16] Raef Bassily, Kobbi Nissim, Adam D. Smith, Thomas Steinke, Uri Stemmer, and\nIn Proceedings\n\nJonathan Ullman. Algorithmic stability for adaptive data analysis.\nof the 48th Annual ACM on Symposium on Theory of Computing, STOC, 2016.\n\n[DFH+15] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, and\nAaron Leon Roth. Preserving statistical validity in adaptive data analysis. In Proceed-\nings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages\n117\u2013126. ACM, 2015.\n\n[DKM+06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni\nNaor. Our data, ourselves: Privacy via distributed noise generation. In Advances in\nCryptology-EUROCRYPT 2006, pages 486\u2013503. Springer, 2006.\n\n[dlPKLL04] Victor H. de la Pea, Michael J. Klass, and Tze Leung Lai. Self-normalized processes:\nexponential inequalities, moment bounds and iterated logarithm laws. Ann. Probab.,\n32(3):1902\u20131933, 07 2004.\n\n[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise\n\nto sensitivity in private data analysis. In TCC \u201906, pages 265\u2013284, 2006.\n\n[DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and differential\nprivacy. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS\n2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 51\u201360, 2010.\n\n[ES15] Hamid Ebadi and David Sands. Featherweight PINQ. CoRR, abs/1505.02642, 2015.\n[KOV15] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for\ndifferential privacy. In Proceedings of the 32nd International Conference on Machine\nLearning, ICML 2015, Lille, France, 6-11 July 2015, pages 1376\u20131385, 2015.\n\n[KS14] S.P. Kasiviswanathan and A. Smith. On the \u2018Semantics\u2019 of Differential Privacy: A\nBayesian Formulation. Journal of Privacy and Con\ufb01dentiality, Vol. 6: Iss. 1, Article\n1, 2014.\n\n[LT91] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Pro-\n\ncesses. A Series of Modern Surveys in Mathematics Series. Springer, 1991.\n\n[MV16] Jack Murtagh and Salil P. Vadhan. The complexity of computing the optimal compo-\nsition of differential privacy. In Theory of Cryptography - 13th International Confer-\nence, TCC 2016-A, Tel Aviv, Israel, January 10-13, 2016, Proceedings, Part I, pages\n157\u2013175, 2016.\n\n[vdG02] Sara A van de Geer. On Hoeffding\u2019s inequality for dependent random variables.\n\nSpringer, 2002.\n\n9\n\n\f", "award": [], "sourceid": 1049, "authors": [{"given_name": "Ryan", "family_name": "Rogers", "institution": "University of Pennsylvania"}, {"given_name": "Aaron", "family_name": "Roth", "institution": "University of Pennsylvania"}, {"given_name": "Jonathan", "family_name": "Ullman", "institution": "Northeastern University"}, {"given_name": "Salil", "family_name": "Vadhan", "institution": "Harvard University"}]}