{"title": "Query Complexity of Bayesian Private Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 2431, "page_last": 2440, "abstract": "We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? \n\nOur main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\\epsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\\log(1/\\epsilon)$ as $\\epsilon \\to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a \\emph{multiplicative} increase in query complexity. The proof builds on Fano's inequality and properties of certain proportional-sampling estimators.", "full_text": "Query Complexity of Bayesian Private Learning\n\nKuang Xu\n\nStanford Graduate School of Business\n\nStanford, CA 94305, USA\nkuangxu@stanford.edu\n\nAbstract\n\nWe study the query complexity of Bayesian Private Learning: a learner wishes to\nlocate a random target within an interval by submitting queries, in the presence\nof an adversary who observes all of her queries but not the responses. How many\nqueries are necessary and suf\ufb01cient in order for the learner to accurately estimate\nthe target, while simultaneously concealing the target from the adversary?\nOur main result is a query complexity lower bound that is tight up to the \ufb01rst\norder. We show that if the learner wants to estimate the target within an error of \u270f,\nwhile ensuring that no adversary estimator can achieve a constant additive error\nwith probability greater than 1/L, then the query complexity is on the order of\nL log(1/\u270f) as \u270f ! 0. Our result demonstrates that increased privacy, as captured\nby L, comes at the expense of a multiplicative increase in query complexity. The\nproof builds on Fano\u2019s inequality and properties of certain proportional-sampling\nestimators.\n\n1\n\nIntroduction\n\nHow to learn, while ensuring that a spying adversary does not learn? Enabled by rapid advancements\nin the Internet, surveillance technologies and machine learning, companies and governments alike\nhave become increasingly capable of monitoring the behavior of individuals or competitors, and\nuse such data for inference and prediction. Motivated by these developments, the present paper\ninvestigates the extent to which it is possible for a learner to protect her knowledge from an adversary\nwho observes, completely or partially, her actions.\nWe will approach these questions by studying the query complexity of Bayesian Private Learning,\na framework proposed by [17] and [13] to investigate the privacy-ef\ufb01ciency trade-off in sequential\nlearning. Our main result is a tight lower bound on query complexity, showing that there will be a\nprice to pay for the learner in exchange for improved privacy, whose magnitude scales multiplicatively\nwith respect to the level of privacy desired. In addition, we provide a family of inference algorithms\nfor the adversary, based on proportional sampling, which is provably effective in estimating the target\nagainst any learner who does not employ a large number of queries.\n\n1.1 The Model: Bayesian Private Learning\n\nWe begin by describing the Bayesian Private Learning model formulated by [17] and [13]. A\nlearner is trying to accurately identify the location of a random target, X\u21e4, up to some constant\nadditive error, \u270f, where X\u21e4 is uniformly distributed in the unit interval, [0, 1). The learner gathers\ninformation about X\u21e4 by submitting n queries, (Q1, . . . , Qn) 2 [0, 1)n, for some n 2 N. For\neach query, Qi, she receives a binary response, indicating the target\u2019s location relative to the query:\nRi = I(X\u21e4 \uf8ff Qi),\n\ni = 1, 2, . . . , n, where I(\u00b7) denotes the indicator function.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fThe learner submits the queries in a sequential manner, and subsequent queries may depend on\nprevious responses. Once all n queries are submitted, the learner will produce an estimator for the\ntarget. The learner\u2019s behavior is formally captured by a learner strategy, de\ufb01ned as follows.\nDe\ufb01nition 1.1 (Learner Strategy). Fix n 2 N. Let Y be a uniform random variable over [0, 1),\nindependent from other parts of the system; Y will be referred to as the random seed. A learner\nstrategy, = (q, l), consists of two components:\n1. Querying mechanism: q = (q\nn), is a sequence of deterministic functions, where\nq\ni : [0, 1)i1 \u21e5 [0, 1) ! [0, 1) takes as input past responses and the random seed, Y , and generates\nthe next query, i.e.,1\n(1.1)\n\ni = 1, . . . , n,\n\n1, . . . , q\n\nQi = q\n\ni (Ri1, Y ),\n\nwhere Ri denotes the responses from the \ufb01rst i queries: Ri = (R1, . . . , Ri), and R0 4= ;.\n2. Estimator: l : [0, 1)n \u21e5 [0, 1) ! [0, 1) is a deterministic function that maps all responses, Rn,\nand Y to a point in the unit interval that serves as a \u201cguess\u201d for X\u21e4: bX = l(Rn, Y ).. The estimator\nbX will be referred to as the learner estimator.\n\nWe will use n to denote the family of learner strategies that submit n queries.\n\nThe \ufb01rst objective of the learner is to accurately estimate the target, as is formalized in the following\nde\ufb01nition.\nDe\ufb01nition 1.2 (\u270f-Accuracy). Fix \u270f 2 (0, 1). A learner strategy, , is \u270f-accurate, if its estimator\napproximates the target within an absolutely error of \u270f/2 almost surely, i.e.,\n\nP\u21e3bX X\u21e4 \uf8ff \u270f/2\u2318 = 1,\n\n(1.2)\n\nwhere the probability is with respect to the randomness in the target, X\u21e4, and the random seed, Y .\n\nWe now introduce the notion of privacy: in addition to estimating X\u21e4, the learner would like to\nsimultaneously conceal X\u21e4 from an eavesdropping adversary. Speci\ufb01cally, there is an adversary\nwho knows the learner\u2019s query strategy, and observes all of the queries but not the responses. The\nadversary then uses the query locations to generate her own adversary estimator for X\u21e4, denoted by\n\nbX a, which depends on the queries, (Q1, . . . , Qn), and any internal, idiosyncratic randomness.\nWith the adversary\u2019s presence in mind, we de\ufb01ne the notion of a private learner strategy.\nDe\ufb01nition 1.3 ((, L)-Privacy). Fix 2 (0, 1) and L 2 N. A learner strategy, , is (, L)-private if,\nfor any adversary estimator, bX a,\n\n(1.3)\nwhere the probability is measured with respect to the randomness in the target, X\u21e4, and any\nrandomness employed by the learner strategy and the adversary estimator.2\n\nP(|bX a X\u21e4|\uf8ff /2) \uf8ff 1/L,\n\nIn particular, if a learner employs a (, L)-private strategy, then no adversary estimator can be close\nto the target within an absolute error of /2 with a probability great than 1/L. Therefore, for any\n\ufb01xed , the parameter L can be interpreted as the level of desired privacy.\nWe are now ready to de\ufb01ne the main quantity of interest in this paper: query complexity.\nDe\ufb01nition 1.4. Fix \u270f and in [0, 1], and L 2 N. The query complexity, N (\u270f, , L), is the least\nnumber of queries needed for an \u270f-accurate learner strategy to be (, L)-private:\n\nN (\u270f, , L) 4= min{n : n contains a strategy that is both \u270f-accurate and (, L)-private}.\n1Note that the query Qi does not explicitly depend on previous queries, {Q1, . . . , Qi1}, but only their\nresponses. This is without the loss of generality, since for a given value of Y it is easy to see that {Q1, . . . , Qi1}\ncan be reconstructed once we know their responses and the functions q\n2This de\ufb01nition of privacy is reminiscent of the error metric used in Probably Approximately Correct (PAC)\nlearning ([14]), if we view the adversary as trying to learn a (trivial) constant function to within an L1 error of\n/2 with a probability great than 1/L.\n\nn.\n1, . . . , q\n\n2\n\n\f2 Main Result\n\nThe main objective of the paper is to understand how N (\u270f, , L) varies as a function of the input\nparameters, \u270f, and L. Our result will focus on the regime of parameters where3\n\n0 <\u270f is arguably\nmuch less interesting, because it is not natural to expect the adversary, who is not engaged in the querying\nprocess, to have a higher accuracy requirement than the learner. The requirement that < 1/L stems from the\nfollowing argument. If > 1/L, then the adversary can simply draw a point uniformly at random in [0, 1) and\nbe guaranteed that the target will be within /2 with a probability greater than 1/L. Thus, the privacy constraint\nis automatically violated, and no private learner strategy exists. To obtain a nontrivial problem, we therefore\nneed only to consider the case where < 1/L.\n4We will use the asymptotic notation f (x) \u21e0 g(x) to mean that f is on the order of g: f (x)/g(x) ! 1 as x\n\napproaches a certain limit.\n\n3\n\n\fupper and lower bounds on query complexity for Private Sequential Learning. They also propose the\nReplicated Bisection algorithm as a learner strategy for the Bayesian variant, but without a matching\nquery complexity lower bound. The present paper closes this gap.\nAt a higher level, our work is related to a growing body of literature on privacy-preserving mechanisms,\nin computer science (cf. [5, 9, 6]), operations research (cf. [4, 12]), and statistical learning theory\n(cf. [2, 8, 16]), but diverges signi\ufb01cantly in models and applications. On the methodological front, our\nproof uses Fano\u2019s inequality, a fundamental tool for deriving lower bounds in statistics, information\ntheory, and active learning ([3]).\n\n4 The Upper Bound\n\nThe next two sections are devoted to the proof of Theorem 2.1. We \ufb01rst prove the query complexity\nupper bound, and begin by giving an overview of the main ideas. Consider the special case of L = 1,\nwhere learner is solely interested in \ufb01nding the target, X\u21e4, and not at all concerned with concealing\nit from the adversary. Here, the problem reduces to the classical setting, where it is well-known\nthat the bisection strategy achieves the optimal query complexity (cf. [15]). The bisection strategy\nrecursively queries the mid-point of the interval which the learner knows to contain X\u21e4. For instance,\nthe learner would set Q1 = 1/2, and if the response R1 = 0, then she will know that X\u21e4 lies in the\ninterval [0, 1/2], and set Q2 to 1/4; otherwise, Q2 will be set to 3/4. This process repeats for n steps.\nBecause the size of the smallest interval known to contain X\u21e4 is halved with each additional query,\nthis yields the query complexity\n\nN (\u270f, , 1) = log(1/\u270f),\u270f\n\n2 (0, 1).\n\n(4.1)\n\nUnfortunately, once the level of privacy L increases above 1, the bisection strategy is almost never\n\nprivate: it is easy to verify that if the adversary sets bX a to be the learner\u2019s last query, Qn, then the\n\ntarget is sure to be within a distance of at most \u270f. That is, the bisection strategy is not (, L)-private\nfor any L > 1, whenever \u270f log(x)\nfor all x > 0, we have that < /\u270f. Substituting with (log(/\u270f))1 in Eq. (5.12), we have that\nN (\u270f,,L)\n log(/\u270f) 1 3 log log(/\u270f) or, equivalently, N (\u270f, , L) L log(1/\u270f) L log(2/) \n3L log log(/\u270f). This completes the proof of the lower bound in Theorem 2.1.\n\nL\n\n6To avoid the use of rounding in our notation, we will assume that is an integer multiple of \u270f.\n\n8\n\n\f6 Concluding Remarks\n\nThe main contribution of the present paper is a tight query complexity lower bound for the Bayesian\nPrivate Learning problem, which, together with an upper bound in [13], shows that the learner\u2019s\nquery complexity depends multiplicatively on the level of privacy, L: if an \u270f-accurate learner wishes\nto ensure that an adversary\u2019s probability of making a -accurate estimation is at most 1/L, then\nshe needs to employ on the order of L log(/\u270f) queries. Moreover, we show that the multiplicative\ndependence on L holds even under the more general models of high-dimensional queries and partial\nadversary monitoring. To prove the lower bound, we develop a set of information-theoretic arguments\nwhich involve, as a main ingredient, the analysis of proportional-sampling adversary estimators that\nexploit the action-information proximity inherent in the learning problem.\nThe present work leaves open a few interesting directions. Firstly, the current upper and lower bounds\nare not tight in the regime where the adversary\u2019s error criterion, , is signi\ufb01cantly smaller than 1/L.\nMaking progress in this regime is likely to require a more delicate argument and possible new tools.\nSecondly, our query model assumes that the responses are noiseless, and it will be interesting to\nexplore how may the presence of noise (cf. [10, 1, 15]) impact the design of private query strategies.\nFor instance, a natural generalization of the bisection search algorithm to the noisy setting is the\nProbabilistic Bisection Algorithm ([7, 15]), where the nth query point is the median of the target\u2019s\nposterior distribution in the nth time slot. It is conceivable that one may construct a probabilistic query\nstrategy analogous to the Replicated Bisection strategy by replicating queries in L pre-determined\nsub-intervals. However, it appears challenging to prove that such replications preserve privacy, and\nstill more dif\ufb01cult to see how one may obtain a matching query complexity lower bound in the noisy\nsetting. Finally, one may want to consider richer, and potentially more realistic, active learning\nmodels, such as one in which each query reveals to the learner the full gradient of a function at the\nqueried location, instead of only the sign of the gradient as in the present model.\n\nReferences\n[1] Michael Ben-Or and Avinatan Hassidim. The bayesian learner is optimal for noisy binary search\n(and pretty good for quantum as well). In Foundations of Computer Science, 2008. FOCS\u201908.\nIEEE 49th Annual IEEE Symposium on, pages 221\u2013230. IEEE, 2008.\n\n[2] Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 12:1069\u20131109, 2011.\n\n[3] Thomas M Cover and Joy A Thomas. Elements of information theory, 2nd edition. John Wiley\n\n& Sons, 2006.\n\n[4] Rachel Cummings, Federico Echenique, and Adam Wierman. The empirical implications of\n\nprivacy-aware choice. Operations Research, 64(1):67\u201378, 2016.\n\n[5] Cynthia Dwork. Differential privacy: A survey of results. In International Conference on\n\nTheory and Applications of Models of Computation, pages 1\u201319. Springer, 2008.\n\n[6] Giulia Fanti, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Spy vs. spy: Rumor source\nobfuscation. In ACM SIGMETRICS Performance Evaluation Review, volume 43, pages 271\u2013284.\nACM, 2015.\n\n[7] Michael Horstein. Sequential transmission using noiseless feedback. IEEE Transactions on\n\nInformation Theory, 9(3):136\u2013143, 1963.\n\n[8] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning.\n\nIn Conference on Learning Theory, pages 24\u20131, 2012.\n\n[9] Yehuda Lindell and Benny Pinkas. Secure multiparty computation for privacy-preserving data\n\nmining. Journal of Privacy and Con\ufb01dentiality, 1(1):5, 2009.\n\n[10] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and Joel Spencer.\nCoping with errors in binary search procedures. Journal of Computer and System Sciences,\n20(3):396\u2013404, 1980.\n\n9\n\n\f[11] Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of\n\nmathematical statistics, pages 400\u2013407, 1951.\n\n[12] John N Tsitsiklis and Kuang Xu. Delay-predictability trade-offs in reaching a secret goal.\n\nOperations Research, 66(2):587\u2013596, 2018.\n\n[13] John N Tsitsiklis, Kuang Xu, and Zhi Xu. Private sequential learning. In Conference on\n\nLearning Theory (COLT), 2018. https://arxiv.org/abs/1805.02136.\n\n[14] Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134\u20131142,\n\n1984.\n\n[15] Rolf Waeber, Peter I Frazier, and Shane G Henderson. Bisection search with noisy responses.\n\nSIAM Journal on Control and Optimization, 51(3):2261\u20132279, 2013.\n\n[16] Martin J Wainwright, Michael I Jordan, and John C Duchi. Privacy aware learning. In Advances\n\nin Neural Information Processing Systems, pages 1430\u20131438, 2012.\n\n[17] Zhi Xu. Private sequential search and optimization. Master\u2019s thesis, Massachusetts Institute of\n\nTechnology, 2017. https://dspace.mit.edu/handle/1721.1/112054.\n\n10\n\n\f", "award": [], "sourceid": 1236, "authors": [{"given_name": "Kuang", "family_name": "Xu", "institution": "Stanford Graduate School of Business"}]}