{"title": "A Market Framework for Eliciting Private Data", "book": "Advances in Neural Information Processing Systems", "page_first": 3510, "page_last": 3518, "abstract": "We propose a mechanism for purchasing information from a sequence of participants.The participants may simply hold data points they wish to sell, or may have more sophisticated information; either way, they are incentivized to participate as long as they believe their data points are representative or their information will improve the mechanism's future prediction on a test set.The mechanism, which draws on the principles of prediction markets, has a bounded budget and minimizes generalization error for Bregman divergence loss functions.We then show how to modify this mechanism to preserve the privacy of participants' information: At any given time, the current prices and predictions of the mechanism reveal almost no information about any one participant, yet in total over all participants, information is accurately aggregated.", "full_text": "A Market Framework for Eliciting Private Data\n\nBo Waggoner\nHarvard SEAS\n\nbwaggoner@fas.harvard.edu\n\nRafael Frongillo\n\nUniversity of Colorado\nraf@colorado.edu\n\nJacob Abernethy\n\nUniversity of Michigan\n\njabernet@umich.edu\n\nAbstract\n\nWe propose a mechanism for purchasing information from a sequence of partici-\npants. The participants may simply hold data points they wish to sell, or may have\nmore sophisticated information; either way, they are incentivized to participate as\nlong as they believe their data points are representative or their information will\nimprove the mechanism\u2019s future prediction on a test set. The mechanism, which\ndraws on the principles of prediction markets, has a bounded budget and mini-\nmizes generalization error for Bregman divergence loss functions. We then show\nhow to modify this mechanism to preserve the privacy of participants\u2019 informa-\ntion: At any given time, the current prices and predictions of the mechanism reveal\nalmost no information about any one participant, yet in total over all participants,\ninformation is accurately aggregated.\n\n1\n\nIntroduction\n\nE\nx,y\n\nA \ufb01rm that relies on the ability to make dif\ufb01cult predictions can gain a lot from a large collection\nof data. The goal is often to estimate values y \u2208 Y given observations x \u2208 X according to an\nappropriate class of hypotheses F describing the relationship between x and y (for example, y = a\u00b7\nx + b for linear regression). In classic statistical learning theory, the goal is formalized as attempting\nto approximately solve\n\nmin\nf\u2208F\n\nLoss(f ; (x, y))\n\n(1)\nwhere Loss(\u00b7) is an appropriate inutility function and (x, y) is drawn from an unknown distribution.\nIn the present paper we are concerned with the case in which the data are not drawn or held by a\ncentral authority but are instead inherently distributed. By this we mean that the data is (disjointly)\npartitioned across a set of agents, with agent i privately possessing some portion of the dataset Si,\nand agents have no obvious incentive to reveal this data to the \ufb01rm seeking it. The vast swaths of data\navailable in our personal email accounts could provide massive bene\ufb01ts to a range of companies, for\nexample, but users are typically loathe to provide account credentials, even when asked politely.\nWe will be concerned with the design of \ufb01nancial mechanisms that provide a community of agents,\neach holding a private set of data, an incentive to contribute to the solution of a large learning or\nprediction task. Here we use the term \u2018mechanism\u2019 to mean an algorithmic interface that can receive\nand answer queries, as well as engage in monetary exchange (deposits and payouts). Our aim will\nbe to design such a mechanism that satis\ufb01es the following three properties:\n\n1. The mechanism is ef\ufb01cient in that it approaches a solution to (1) as the amount of data and\n\nparticipation grows while spending a constant, \ufb01xed total budget.\n\n2. The mechanism is incentive-compatible in the sense that agents are rewarded when their\ncontributions provide marginal value in terms of improved hypotheses, and are not re-\nwarded for bad or misleading information.\n3. The mechanism provides reasonable privacy guarantees, so that no agent j (or outside\nobserver) can manipulate the mechanism in order to infer the contributions of agent i (cid:54)= j.\n\n1\n\n\fUltimately we would like our mechanism to approach the performance of a learning algorithm that\nhad direct access to all the data, while only spending a constant budget to acquire data and improve\npredictions and while protecting participants\u2019 privacy.\nOur construction relies on the recent surge in literature on prediction markets [13, 14, 19, 20],\npopular for some time in the \ufb01eld of economics and recently studied in great detail in computer\nscience [8, 16, 6, 15, 18, 1]. A prediction market is a mechanism designed for the purpose of\ninformation aggregation, particularly when there is some underlying future event about which many\nmembers of the population may have private and useful information. For instance, it may elicit\npredictions about which team will win an upcoming sporting event, or which candidate will win an\nelection. These predictions are eventually scored on the actual outcome of the event.\nApplying these prediction market techniques allows participants to essentially \u201ctrade in a market\u201d\nbased on their data. (This approach is similar to prior work on crowdsourcing contests [3].) Members\nof the population have private information, just as with prediction markets \u2014 in this case, data points\nor beliefs \u2014 and the goal is to incentivize them to reveal and aggregate that information into a \ufb01nal\nhypothesis or prediction. Their \ufb01nal pro\ufb01ts are tied to the outcome of a test set of data, with each\nparticipant being paid in accordance with how much their information improved the performance\non the test set. Our techniques depart from the framework of [3] in two signi\ufb01cant aspects: (a) we\nfocus on the particular problem of data aggregation, and most of our results take advantage of kernel\nmethods; and (b) our mechanisms are the \ufb01rst to combine differential privacy guarantees with data\naggregation in a prediction-market framework.\nThis framework will provide ef\ufb01ciency and truthfulness. We will also show how to achieve privacy\nin many scenarios. We will give mechanisms where the prices and predictions published satisfy\n(\u0001, \u03b4)-differential privacy [10] with respect to each participant\u2019s data. The mechanism\u2019s output can\nstill give reasonable predictions while no observer can infer much about any participant\u2019s input data.\n\n2 Mechanisms for Eliciting and Aggregating Data\n\nWe now give a broad description of the mechanism we will study. In brief, we imagine a central\nauthority (the mechanism, or market) maintaining a hypothesis f t representing the current aggrega-\ntion of all the contributions made thus far. A new (or returning) participant may query f t at no cost,\nperhaps evaluating the quality of the predictions on a privately-held dataset, and can then propose an\nupdate df t+1 to f t that possibly requires an investment (a \u201cbet\u201d). Bets are evaluated at the close of\nthe market when a true data sample is generated (analogous to a test set), and payouts are distributed\naccording to the quality of the updates.\nAfter describing this initial framework as Mechanism 1, which is based loosely on the setting of\n[3], we turn our attention to the special case in which our hypotheses must lie in a Reproducing\nKernel Hilbert Space (RKHS) [17] for a given kernel k(\u00b7,\u00b7). This kernel-based \u201cnonparametric\nmechanism\u201d is particularly well-suited for the problem of data aggregation, as the betting space of\nthe participants consists essentially of updates of the form df t = \u03b1tk(zt,\u00b7), where zt is the data\nobject offered by the participant and \u03b1t \u2208 R is the \u201cmagnitude\u201d of the bet.\nA drawback of Mechanism 1 is the lack of privacy guarantees associated with the betting protocol:\nutilizing one\u2019s data to make bets or investments in the mechanism can lead to a loss of privacy for\nthe owner of that data. When a participant submits a bet of the form df t = \u03b1tk(zt,\u00b7), where zt\ncould contain sensitive personal information, another participant may be able to infer zt by querying\nthe mechanism. One of the primary contributions of the present work, detailed in Section 3, is a\ntechnique to allow for productive participation in the mechanism while maintaining a guarantee on\nthe privacy of the data submitted.\n\n2.1 The General Template\nThere is a space of examples X \u00d7Y, where x \u2208 X are features and y \u2208 Y are labels. The mechanism\ndesigner chooses a function space F consisting of f : X \u00d7 Y \u2192 R, and assumed to have Hilbert\nspace structure; one may view F as either the hypothesis class or the associated loss class, that is\nwhere fh(x, y) measures the loss/performance of hypothesis h on observation x and label y. In each\ncase we will refer to f \u2208 F as a hypothesis, eliding the distinction between fh and h.\n\n2\n\n\fThe pricing scheme of the mechanism relies on a convex cost function Cx(\u00b7) : F \u2192 R which is\nparameterized by elements x \u2208 X but whose domain is the set of hypotheses F. The cost function\nis publicly available and determined in advance. The interaction with the mechanism is a sequential\nprocess of querying and betting. On round t \u2212 1 the mechanism publishes a hypothesis f t\u22121, the\n\u201cstate\u201d of the market, which participants may query. Each participant arrives sequentially, and on\nround t a participant may place a \u201cbet\u201d df t \u2208 F, also called a \u201ctrade\u201d or \u201cupdate\u201d, modifying the\nhypothesis f t\u22121 \u2192 f t = f t\u22121 + df t. Finally participation ends and the mechanism samples (or\nreveals) a test example1 (x, y) from the underlying distribution and pays (or charges) each participant\naccording to the relative performance of their marginal contributions. Precisely, the total reward for\nparticipant t\u2019s bet df t is the value df t(x, y) minus the cost Cx(f t) \u2212 Cx(f t\u22121).\n\nMechanism 1: The Market Template\nMARKET announces f 0 \u2208 F\nfor t = 1, 2, . . . , T do\n\nPARTICIPANT may query functions \u2207f Cx(f t\u22121) and f t\u22121(x, y) for examples (x, y)\nPARTICIPANT t may submit a bet df t \u2208 F to MARKET\nMARKET updates state f t = f t\u22121 + df t\n\nMARKET observes a true sample (x, y)\nfor t = 1, 2, . . . , T do\n\nPARTICIPANT t receives payment df t(x, y) + Cx(f t\u22121) \u2212 Cx(f t)\n\nThe design of cost-function prediction markets has been an area of active research over the past\nseveral years, starting with [8] and many further re\ufb01nements and generalizations [1, 6, 15]. The\ngeneral idea is that the mechanism can ef\ufb01ciently provide price quotes via a function C(\u00b7) which\nacts as a potential on the space of outstandings shares; see [1] for a thorough review. In the present\nwork we have added an additional twist which is that the function Cx(\u00b7) is given an additional\nparameterization of the observation x. We will not dive too deeply into the theoretical aspects of\nthis generalization, but this is a straightforward extension of existing theory.\n\nuation is given by f\u03b8(x, y) = (cid:104)\u03b8, \u03c6(x, y)(cid:105). We let Cx(f ) := log(cid:82)\nCx(f\u03b8) = log(cid:82)\n\nKey special case: exponential family mechanism. For those more familiar with statistics and\nmachine learning, there is a natural and canonical family of problems that can be cast within the\ngeneral framework of Mechanism 1, which we will call the exponential family prediction mech-\nanism following [2]. Assume that F can be parameterized as F = {f\u03b8 : \u03b8 \u2208 Rd}, that we\nare given a suf\ufb01cient statistics summary function \u03c6 : X \u00d7 Y \u2192 Rd, and that function eval-\nY exp(f (x, y))dy so that\nY exp((cid:104)\u03b8, \u03c6(x, y)(cid:105)dy. In other words, we have chosen our mechanism to encode\na particular exponential family model, with Cx(\u00b7) chosen as the conditional log partition function\nover the distribution on y given x. If the market has settled on a function f\u03b8, then one may interpret\nthat as the aggregate market belief on the distribution of X \u00d7 Y is\nX\u00d7Y exp((cid:104)\u03b8, \u03c6(x, y)(cid:105)) dx dy.\np\u03b8(x, y) = exp((cid:104)\u03b8, \u03c6(x, y)(cid:105) \u2212 A(\u03b8))\nHow may we view this as a \u201cmarket aggregate\u201d belief? Notice that if a trader observes the market\nstate of f\u03b8 and she is considering a bet of the form df = f\u03b8 \u2212 f\u03b8(cid:48), the eventual pro\ufb01t will be\n\nA(\u03b8) = log(cid:82)\n\nwhere\n\nf\u03b8(cid:48)(x, y) \u2212 f\u03b8(x, y) + Cx(f\u03b8) \u2212 Cx(f\u03b8(cid:48)) = log\n\np\u03b8(cid:48)(y|x)\np\u03b8(y|x)\n\n.\n\nI.e., the pro\ufb01t is precisely the conditional log likelihood ratio of the update \u03b8 \u2192 \u03b8(cid:48).\nExample: Logistic regression. Let X = Rk, Y = {\u22121, 1}, and take F to be the set of func-\ntions f\u03b8(x, y) = y \u00b7 (\u03b8(cid:62)x) for \u03b8 \u2208 Rk. Then by our construction, Cx(f ) = log(exp(f (x, 1)) +\nexp(f (x,\u22121))) = log(exp(\u03b8(cid:62)x) + exp(\u2212\u03b8(cid:62)x)), and we let f 0 = f0 \u2261 0. The payoff of a\nparticipant placing a bet which moves the market state to f 1 = f\u03b8, upon outcome (x, y), is:\nf\u03b8(x, y) + Cx(f0) \u2212 Cx(f\u03b8) = y\u03b8(cid:62)x + log(2) \u2212 log(exp(\u03b8(cid:62)x) + exp(\u2212\u03b8(cid:62)x))\n\n= log(2) \u2212 log(1 + exp(\u22122y\u03b8(cid:62)x)) ,\n\n1This can easily be extended to a test set by taking the average performance over the test set.\n\n3\n\n\fwhich is simply negative logistic loss of the parameter choice 2\u03b8. A participant wishing to maximize\npro\ufb01t under a belief distribution p(x, y) should therefore choose \u03b8 via logistic regression,\n\n(cid:2)log(1 \u2212 exp(2y\u03b8(cid:62)x))(cid:3) .\n\n(2)\n\n\u03b8\u2217 = arg min\n\nE\n\n(x,y)\u223cp\n\n2.2 Properties of the Market\n\n\u03b8\n\nWe next describe two nice properties of Mechanism 1: incentive-compatibility and bounded bud-\nget. Recall that, for the exponential family markets discussed above, a trader moving the market\nhypothesis from f t\u22121 to f t was compensated according to the conditional log-likelihood ratio of\nf t\u22121 and f t on the test data point. The implication is that traders are incentivized to minimize a\nKL divergence between the market\u2019s estimate of the distribution and the true underlying distribu-\ntion. We refer to this property as incentive-compatibility because traders\u2019 interests are aligned with\nthe mechanism designer\u2019s. This property indeed holds generally for Mechanism 1, where the KL\ndivergence is replaced with a general Bregman divergence corresponding to the Fenchel conjugate\nof Cx(\u00b7); see Proposition 1 in the appendix for details.\nGiven that the mechanism must make a sequence of (possibly negative) payments to traders, a natural\nquestion is whether there is the potential for large downside for the mechanism in terms of total\npayment (budget).\nIn the context of the exponential family mechanism, this question is easy to\nanswer: after a sequence of bets moving the market state parameter \u03b80 \u2192 \u03b81 \u2192 . . . \u2192 \u03b8\ufb01nal, the\ntotal loss to the mechanism corresponds to the total payouts made to traders,\n\n(cid:88)\n\ni\n\nf\u03b8i+1(x, y) \u2212 f\u03b8i (x, y) + Cx(f\u03b8i) \u2212 Cx(f\u03b8i+1) = log\n\np\u03b8\ufb01nal(y|x)\np\u03b80 (y|x)\n\n;\n\nthat is, the worst-case loss is exactly the worst-case conditional log-likelihood ratio. In the context\nof logistic regression this quantity can always be guaranteed to be no more than log 2 as long as\nthe initial parameter is set to \u03b8 = 0. For Mechanism 1 more generally, one has tight bounds on\nthe worst-case loss following from such results from prediction markets [1, 8], and we give a more\ndetailed statement in Proposition 2 in the appendix.\n\nIn choosing the cost function family C = {Cx : x \u2208 X}, an\nPrice sensitivity parameter \u03bbC.\nimportant consideration is the \u201cscale\u201d of each Cx, or how quickly changes in the market hypothesis\nf t translate to changes in the \u201cinstantaneous prices\u201d \u2207Cx(f t) (which give the marginal cost for an\nin\ufb01nitesimal bet df t+1). Formally, this is captured by the price sensitivity \u03bbC, de\ufb01ned as the upper\nbound on the operator norm (with respect to the L1 norm) of the Hessian of the cost function Cx\n(over all x). A choice of small \u03bbC translates to a small worst-case budget required by the mechanism.\nHowever, it means that the market prices are sensitive in that the same update df t changes the prices\nmuch more quickly. When we consider protecting the privacy of trader updates in Section 3, we will\nsee that privacy imposes restrictions on the price sensitivity.\n\n2.3 A Nonparametric Mechanism via Kernel Methods\nThe framework we have discussed thus far has involved a general function space F as the \u201cstate\u201d\nof the mechanism, and the contributions by participants are in the form of modi\ufb01cations to these\nfunctions. One of the downsides of this generic template is that participants may not be able to reason\nabout F, and they may have information about the optimal f only through their own privately-held\ndataset S \u2282 X \u00d7Y. A more speci\ufb01c class of functions would be those parameterized by actual data.\nThis brings us to a well-studied type of non-parametric hypothesis class, namely the reproducing\nkernel Hilbert space (RKHS). We can design a market based on an RKHS, which we will refer to\nas a kernel market, that brings together a number of ideas including recent work of [21] as well as\nkernel exponential families [4].\nWe have a positive semide\ufb01nite kernel k : Z \u00d7 Z \u2192 R and associated reproducing kernel Hilbert\nspace F, with basis {fz(\u00b7) = k(z,\u00b7) : z \u2208 Z}. The reproducing property is that for all f \u2208 F,\ns \u03b1sk(zs,\u00b7) for\nsome collection of points {(\u03b1s, zs)}.\nThe kernel approach has several nice properties. One is a natural extension of the exponential family\nmechanism using an RKHS as a building block of the class of exponential family distributions [4]. A\n\n(cid:104)f, k(z,\u00b7)(cid:105) = f (z). Now each hypothesis f \u2208 F can be expressed as f (\u00b7) = (cid:80)\n\n4\n\n\fyy(cid:48)k(x, x(cid:48)). Under certain conditions [4], we again can take Cx(f ) = log(cid:82)\n\nkey assumption in the exponential family mechanism is that evaluating f can be viewed as an inner\nproduct in some feature space; this is precisely what one has given a kernel framework. Speci\ufb01cally,\nassume we have some PSD kernel k : X \u00d7 X \u2192 R, where Y = {\u22121, 1}. Then we can de\ufb01ne the\nassociated classi\ufb01cation kernel \u02c6k : (X \u00d7 Y) \u00d7 (X \u00d7 Y) \u2192 R according to \u02c6k((x, y), (x(cid:48), y(cid:48))) :=\nY exp(f (x, y))dy, and\nfor any f in the RKHS associated to \u02c6k, we have an associated distribution of the form pf (x, y) \u221d\nexp(f (x, y)). And again, a participant updating the market from f t\u22121 to f t is rewarded by the\nconditional log-likelihood ratio of f t\u22121 and f t on the test data.\nThe second nice property mirrors one of standard kernel learning methods, namely that under cer-\ntain conditions one need only search the subset of the RKHS spanned by the basis {k((xi, yi),\u00b7) :\n(xi, yk) \u2208 S}, where S is the set of available data; this is a direct result of the Representer Theo-\nrem [17]. In the context of the kernel market, this suggests that participants need only interact with\nthe mechanism by pushing updates that lie in the span of their own data. In other words, we only\nneed to consider updates of the form df = \u03b1k((x, y),\u00b7). This naturally suggests the idea of directly\npurchasing data points from traders.\n\nBuying Data Points. So far, we have supposed that a participant knows what trade df t she prefers\nto make. But what if she simply has a data point (x, y) drawn from the underlying distribution?\nWe would like to give this trader a \u201csimple\u201d trading interface in which she can sell her data to the\nmechanism without having to reason about the correct df t for this data point.\nOur proposal is to mimic the behavior of natural learning algorithms, such as stochastic gradient\ndescent, when presented with (x, y). The market can offer the trader the purchase bundle corre-\nsponding to the update of the learning algorithm on this data point. In principle, this approach can\nbe used with any online learning algorithm. In particular, stochastic gradient descent gives a clean\nupdate rule, which we now describe. The expected pro\ufb01t (which is the negative of expected loss)\nfor trade df t is Ex\ngradient is \u2212\u2207f t\u22121Cx + \u03b4x,y (where \u03b4x,y is the indicator on data point x, y). This suggests that the\n\n(cid:2)Cx(f t\u22121 + df t) \u2212 Cx(f t\u22121) \u2212 Ey|x[df t(x, y)](cid:3). Given a draw (x, y), the loss\nfunction on which to take a gradient step is \u2212(cid:0)Cx(f t\u22121 + df t) \u2212 Cx(f t\u22121) \u2212 df t(x, y)(cid:1), whose\n(cid:1), where \u0001 can be chosen arbitrarily\nmarket offer the participant the trade df t = \u0001(cid:0)\u2207f t\u22121Cx \u2212 \u03b4x,y\n\nas a \u201clearning rate\u201d. This can be interpreted as buying a unit of shares in the participant\u2019s data point\n(x, y), then \u201chedging\u201d by selling a small amount of all other shares in proportion to their current\nprices (recall that the current prices are given by \u2207f t Cx).\nIn the kernel setting, the choice of stochastic gradient descent may be somewhat problematic, be-\ncause it can result in non-sparse share purchases. It may instead be desirable to use algorithms that\nguarantee sparse updates\u2014a modern discussion of such approaches can be found in [22, 23].\nGiven this framework, participants with access to a private set of samples from the true underlying\ndistribution can simply opt for this \u201cstandard bundle\u201d corresponding to their data point, which is\nprecisely a stochastic gradient descent update. With a small enough learning rate, and assuming\nthat the data point is truly independent of the current hypothesis (i.e. (x, y) has not been previously\nincorporated), the trade is guaranteed to make at least some positive pro\ufb01t in expectation. More\nsophisticated alternative strategies are also possible of course, but even the proposed simple bet type\nhas earning potential.\n\n3 Protecting Participants\u2019 Privacy\n\nWe now extend the mechanism to protect privacy of the participants: An adversary observing the\nhypotheses and prices of the mechanism, and even controlling the trades of other participants, should\nnot be able to infer too much about any one trader\u2019s update df t. This is especially relevant when\nparticipants sell data to the mechanism and this data can be sensitive, e.g. medical data.\nHere, privacy is formalized by (\u0001, \u03b4)-differential privacy, to be de\ufb01ned shortly. One intuitive charac-\nterization is that, for any prior distribution some adversary has about a trader\u2019s data, the adversary\u2019s\nposterior belief after observing the mechanism would be approximately the same even if the trader\ndid not participate at all. The idea is that, rather than posting the exact prices and trades made in the\nmarket, we will publish noisy versions, with the random noise giving the above guarantee.\n\n5\n\n\fA naive approach would be to add independent noise to each participant\u2019s trade. However, this would\nrequire a prohibitively-large amount of noise; the \ufb01nal market hypothesis would be determined by\nthe random noise just as much as by the data and trades. The central challenge is to add carefully\ncorrelated noise that is large enough to hide the effects of any one participant\u2019s data point, but not\nso large that the prices (equivalently, hypothesis) become meaningless. We show this is possible\nby adjusting the \u201cprice sensitivity\u201d \u03bbC of the mechanism, a measure of how fast prices change\nin response to trades de\ufb01ned in 2.2.\nIt will turn out to suf\ufb01ce to set the price sensitivity to be\nO(1/polylog T ) when there are T participants. This can roughly be interpreted as saying that any\none participant does not move the market price noticeably (so their privacy is protected), but just\nO(polylog T ) traders together can move the prices completely.\nWe now formally de\ufb01ne differential privacy and discuss two useful tools at our disposal.\n\n3.1 Differential Privacy and Tools\n\nDifferential privacy in our context is de\ufb01ned as follows. Consider a randomized function M op-\nerating on inputs of the form (cid:126)f = (df 1, . . . , df T ) and having outputs of the form s. Then M is\n(\u0001, \u03b4)-differentially private if, for any coordinate t of the vector, any two distinct df t\n2, and any\n1, df t\n2) \u2208 S] + \u03b4. The\n1) \u2208 S)] \u2264 e\u0001 Pr[M (f\u2212t, df t\n(measurable) set of outputs S, we have Pr[M (f\u2212t, df t\nnotation f\u2212t means the vector (cid:126)f with the tth entry removed.\nIntuitively, M is private if modifying the tth entry in the vector to a different entry does not change\nthe distribution on outputs too much. In our case, the data to be protected will be the trade df t of each\nparticipant t, and the space of outputs will be the entire sequence of prices/predictions published by\nthe mechanism.\nTo preserve privacy, each trade must have a bounded size (e.g. consist only of one data point). To\nenforce this, we de\ufb01ne the following parameter chosen by the mechanism designer:\n\n(cid:112)(cid:104)df, df(cid:105),\ndf = \u03b1k(z,\u00b7), then we would have \u2206 = max\u03b1,z \u03b1(cid:112)k(z, z).\n\n\u2206 = max\n\nallowed df\n\nWe next describe the two tools we require.\n\nwhere the maximum is over all trades df allowed by the mechanism. That is, \u2206 is a scalar capturing\nthe maximum allowed size of any one trade. For instance, if all trades are restricted to be of the form\n\n(3)\n\n(4)\n\nTool 1: Private functions via Gaussian processes. Given a current market state f t = f 0 + df 1 +\n\u00b7\u00b7\u00b7 + df t, where f t lies in a RKHS, we construct a \u201cprivate\u201d version \u02c6f t such that queries to \u02c6f t are\n\u201caccurate\u201d \u2014 close to the outputs of f t \u2014 but also private with respect to each df j. In fact, it will\nbecome convenient to privately output partial sums of trades, so we wish to output a \u02c6ft1:t2 that is\ndf j. This is accomplished by the following construction\n\nprivate and approximates ft1:t2 =(cid:80)t2\n\ndue to [11].\nTheorem 1 ([11], Corollary 9). Let G be the sample path of a Gaussian process with mean zero and\nwhose covariance is given by the kernel function k.2 Then\n\nj=t1\n\n\u221a\n\n\u02c6ft1:t2 = ft1:t2 + \u2206\n\n\u0001\n\n2 ln(2/\u03b4)\n\nG .\n\nis (\u0001, \u03b4)-differentially private with respect to each df j for j \u2208 {t1, . . . , t2}.\nIn general, \u02c6ft1:t2 may be an in\ufb01nite-dimensional object and thus impossible to \ufb01nitely represent.\nIn this case, the theorem implies that releasing the results of any number of queries \u02c6ft1:t2(z) is\ndifferentially private. (Of course, the more queries that are released, the larger the chance of high\nerror on some query.) This is computationally feasible as each sample G(z) is simply a sample from\na Gaussian having known covariance with the previous samples drawn.\nUnfortunately, it would not be suf\ufb01cient to independently release \u02c6f1:t at each time t, because the\namount of noise required would be prohibitive. This leads us to our next tool.\n\n2Formally, each G(z) is a random variable and, for any \ufb01nite subset of Z, the corresponding variables are\n\ndistributed as a multivariate normal with covariance given by k.\n\n6\n\n\f\u2022\ndf 0\n\n\u2022\ndf 1\n\n\u2022\ndf 2\n\n\u2022\ndf 3\n\n\u2022\ndf 4\n\n\u2022\ndf 5\n\n\u2022\ndf 6\n\n\u2022\ndf 7\n\n\u2022\ndf 8\n\n\u2022\ndf 9\n\n\u2022\ndf 10\n\n\u2022\ndf 11\n\n\u2022\ndf 12\n\n\u2022\ndf 13\n\n\u2022\ndf 14\n\n\u2022\ndf 15\n\n\u2022\ndf 16\n\npoint sold to the market). The goal is to release, at each time step t, a noisy version of f t =(cid:80)t\n\nFigure 1: Picturing the continual observation technique for preserving privacy. Each df t is a trade (e.g. a data\nj=1 df j. To do\nso, start at t and follow the arrow back to s(t). Take the partial sum of df j for j from s(t) to t and add some\nrandom noise. Trace the next arrow from s(t) to s(s(t)) to get another partial sum and add noise to that sum\nas well. Repeat until 0 is reached, then add together all the noisy partial sums to get the output at time t, which\nwill equal f t plus noise. The key point is that we can re-use many of the noisy partial sums in many different\ntime steps. For instance, the noisy partial sum from 0 to 8 can be re-used when releasing all of f 9, . . . , f 15.\nMeanwhile, each df t participates in few noisy partial sums (the number of arrows passing above it).\n\nconstruct \u02c6f t =(cid:80)t\n\nTool 2: Continual observation technique. The idea of this technique, pioneered by [9, 5], is to\nj=0 df t by adding together noisy partial sums of the form \u02c6ft1:t2 as constructed in\nEquation 4. The idea for choosing these partial sums is pictured in Figure 1: For a function s(t) that\nreturns an integer smaller than t, we take \u02c6f t = \u02c6f s(t)+1:t + \u02c6f s(s(t))+1:s(t) + \u00b7\u00b7\u00b7 + \u02c6f 0:0. Speci\ufb01cally,\ns(t) is determined by writing t in binary, then \ufb02ipping the rightmost \u201cone\u201d bit to zero. This is\npictured in Figure 1. The intuition behind why this technique helps is twofold. First, the total noise\nin \u02c6f t is the sum of noises of its partial sums, and it turns out that there are at most (cid:100)log T(cid:101) terms.\nSecond, the total noise we need to add to protect privacy is governed by how many different partial\nsums each df j participates in, and it turns out that this number is also at most (cid:100)log T(cid:101). This allows\nfor much better privacy and accuracy guarantees than naively treating each step independently.\n\n3.2 Mechanism and Results\n\nCombining our market template in Mechanism 1 with the above privacy tools, we obtain Mecha-\nnism 2. There are some key differences. First, we have a bound Q on the total number of queries.\n(Each query x returns the instantaneous prices in the market for x.) This is because each query\nreveals information about the participants, so intuitively, allowing too many queries must sacri\ufb01ce\neither privacy or accuracy. Fortunately, this bound Q can be an arbitrarily large polynomial in the\nnumber of traders without affecting the quality of the results. Second, we have PAC-style guaran-\ntees on accuracy: with probability 1 \u2212 \u03b3, all price queries return values within \u03b1 of their true prices.\nThird, it is no longer straightforward to compute and represent the market prices \u2207Cx( \u02c6f t) unless Y\nis \ufb01nite. We leave the more general analysis of Mechanism 2 to future work.\nEither exactly or approximately, Mechanism 2 inherits the desirable properties of Mechanism 1, such\nas bounded budget and incentive-compatitibility (that is, participants are incentivized to minimize\nthe risk of the market hypothesis). In addition, we show that it preserves privacy while maintaining\naccuracy, for an appropriate choice of the price sensitivity \u03bbC.\nTheorem 2. Consider Mechanism 2, where \u2206 is the maximimum trade size (Equation 3) and d =\n|Y|. Then Mechanism 2 is (\u0001, \u03b4) differentially private and, with T traders and Q price queries,\nhas the following accuracy guarantee: with probability 1 \u2212 \u03b3, for each query x the returned prices\nsatisfy (cid:107)\u2207Cx( \u02c6f t) \u2212 \u2207Cx(f t)(cid:107)\u221e \u2264 \u03b1 by setting\n(cid:113)\n\n\u03bbC =\n\n\u03b1\u0001\n\n.\n\n2d\u22062\n\nln Qd\n\n\u03b3 ln 2 log T\n\n\u03b4\n\nlog(T )3\n\nIf one for example takes \u03b4, \u03b3 = exp [\u2212polylog(Q, T )], then except for a superpolynomially low fail-\nure probability, Mechanism 2 answers all queries to within accuracy \u03b1 by setting the price sensitivity\nto be \u03bbC = O (\u03b1\u0001/polylog(Q, T )). We note, however, that this is a somewhat weaker guarantee\nthan is usually desired in the differential privacy literature, where ideally \u03b4 is exponentially small.\n\n7\n\n\fMechanism 2: Privacy Protected Market\nParameters: \u0001, \u03b4 (privacy), \u03b1, \u03b3 (accuracy), k (kernel), \u2206 (trade size 3), Q (#queries), T (#traders)\nMARKET announces \u02c6f 0 = f 0, sets r = 0, sets C with \u03bbC = \u03bbC(\u0001, \u03b4, \u03b1, \u03b3, \u2206, Q, T ) (Theorem 2)\nfor t = 1, 2, . . . , T do\n\nPARTICIPANT t proposes a bet df t\nMARKET updates true position f t = f t\u22121 + df t\nMARKET instantiates \u02c6f s(t)+1,t as de\ufb01ned in Equation 4\nwhile r \u2264 Q and some OBSERVER wishes to make a query do\n\nOBSERVER r submits pricing query on x\nMARKET returns prices \u2207Cx( \u02c6f t), where \u02c6f t = \u02c6f s(t)+1:t + \u02c6f s(s(t))+1:s(t) + \u00b7\u00b7\u00b7 + \u02c6f 0:0\nMARKET sets r \u2190 r + 1\n\nMARKET observes a true sample (x, y)\nfor t = 1, 2, . . . , T do\n\nPARTICIPANT receives payment f t\u22121(x, y) \u2212 f t(x, y) \u2212 Cx( \u02c6f t\u22121 + df t) + Cx( \u02c6f t\u22121)\n\nwhen C comes from an exponential family, so that Cx(f ) = log(cid:82)\n\nComputing \u2207Cx( \u02c6f t). We have already discussed limiting to \ufb01nite |Y| in order to ef\ufb01ciently com-\npute the marginal prices \u2207Cx( \u02c6f t). However, it is still not immediately clear how to compute these\nprices, and hence how to implement Mechanism 2. Here, we show that the problem can be solved\nY exp [f (x, y)] dy. In this case,\nthe marginal prices given by the gradient of C have a nice exponential-weights form, namely the\n(cid:80)\ny\u2208Y ef (x,y) . Thus evaluating the prices can be\nprice of shares in (x, y) is pt\ndone by evaluating f t(x, y) for each y \u2208 Y.\nWe also note that the worst-case bound used here could be greatly improved by taking into account\nthe structure of the kernel. For \u201csmooth\u201d cases such as the Gaussian kernel, querying a second\npoint very close to the \ufb01rst one requires very little additional randomness and builds up very little\nadditional error. We gave only a worst-case bound that holds for all kernels.\nAdding a transaction fee.\nAdding a small \u0398(\u03b1) fee suf\ufb01ces to deter arbitrage opportunities introduced by noisy pricing.\n\nIn the appendix, we discuss the potential need for transaction fees.\n\nx(y) = \u2207yCx(f t) =\n\nef (x,y)\n\nDiscussion\n\nThe main contribution of this work was to bring together several tools to construct a mechanism\nfor incentivized data aggregation with \u201ccontest-like\u201d incentive properties, privacy guarantees, and\nlimited downside for the mechanism.\nOur proposed mechanisms are also extensions of the prediction market literature. Building upon the\nwork of Abernethy et al. [1] we introduce the following innovations:\n\u2022 Conditional markets. Our framework of Mechanism 1 can be interpreted as a prediction market\nfor conditional predictions p(y|x) rather than a classic market which would elicit the joint dis-\ntribution p(x, y), or just the marginals. (This is similar to decision markets [12, 7], but without\nout the associated incentive problems.) Naturally then, we couple conditional predictions with\nrestricted hypothesis spaces, allowing F to capture, e.g., a linear relationship between x and y.\n\u2022 Nonparametric securities. We also extend to nonparametric hypothesis spaces using kernels,\n\u2022 Privacy guarantees. We provide the \ufb01rst private prediction market (to our knowledge), showing\nthat information about individual trades is not revealed. Our approach for preserving privacy also\nholds in the classic prediction market setting with similar privacy and accuracy guarantees.\n\nfollowing the kernel-based scoring rules of [21].\n\nMany directions remain for future work. These mechanisms could be made more practical and\nperhaps even better privacy guarantees derived, especially in nonparametric settings. One could also\nexplore the connections to similar settings, such as when agents have costs for acquiring data.\nAcknoledgements J. Abernethy acknowledges the generous support of the US National Science\nFoundation under CAREER Grant IIS-1453304 and Grant IIS-1421391.\n\n8\n\n\fReferences\n[1] Jacob Abernethy, Yiling Chen, and Jennifer Wortman Vaughan. Ef\ufb01cient market making via convex\noptimization, and a connection to online learning. ACM Transactions on Economics and Computation,\n1(2), May 2013.\n\n[2] Jacob Abernethy, Sindhu Kutty, S\u00b4ebastien Lahaie, and Rahul Sami. Information aggregation in expo-\nnential family markets. In Proceedings of the \ufb01fteenth ACM conference on Economics and computation,\npages 395\u2013412. ACM, 2014.\n\n[3] Jacob D Abernethy and Rafael M Frongillo. A collaborative mechanism for crowdsourcing prediction\n\nproblems. In Advances in Neural Information Processing Systems, pages 2600\u20132608, 2011.\n\n[4] St\u00b4ephane Canu and Alex Smola. Kernel methods and the exponential family. Neurocomputing, 69(7):714\u2013\n\n720, 2006.\n\n[5] T-H Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Transac-\n\ntions on Information and System Security (TISSEC), 14(3):26, 2011.\n\n[6] Y. Chen and J.W. Vaughan. A new understanding of prediction markets via no-regret learning. In Pro-\n\nceedings of the 11th ACM Conference on Electronic Commerce (EC), pages 189\u2013198, 2010.\n\n[7] Yiling Chen, Ian Kash, Mike Ruberry, and Victor Shnayder. Decision markets with good incentives. In\n\nInternet and Network Economics, pages 72\u201383. Springer, 2011.\n\n[8] Yiling Chen and David M. Pennock. A utility framework for bounded-loss market makers. In In Proceed-\n\nings of the 23rd Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 49\u201356, 2007.\n\n[9] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. Differential privacy under continual\nobservation. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 715\u2013724.\nACM, 2010.\n\n[10] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and\n\nTrends in Theoretical Computer Science, 2014.\n\n[11] Rob Hall, Alessandro Rinaldo, and Larry Wasserman. Differential privacy for functions and functional\n\ndata. The Journal of Machine Learning Research, 14(1):703\u2013727, 2013.\n\n[12] R Hanson. Decision markets. Entrepreneurial Economics: Bright Ideas from the Dismal Science, pages\n\n79\u201385, 2002.\n\n[13] R. Hanson. Combinatorial information market design.\n\n2003.\n\nInformation Systems Frontiers, 5(1):105\u2013119,\n\n[14] R. Hanson. Logarithmic market scoring rules for modular combinatorial information aggregation. Journal\n\nof Prediction Markets, 1(1):3\u201315, 2007.\n\n[15] Abraham Othman and Tuomas Sandholm. Automated market makers that enable new settings: extending\nconstant-utility cost functions. In Proceedings of the Second Conference on Auctions, Market Mechanisms\nand their Applications (AMMA), pages 19\u201330, 2011.\n\n[16] David M. Pennock and Rahul Sami. Computational aspects of prediction markets.\n\nIn Noam Nisan,\nTim Roughgarden, Eva Tardos, and Vijay V. Vazirani, editors, Algorithmic Game Theory, chapter 26.\nCambridge University Press, 2007.\n\n[17] Bernhard Sch\u00a8olkopf and Alexander J Smola. Learning with kernels: Support vector machines, regular-\n\nization, optimization, and beyond. MIT press, 2002.\n\n[18] Amos J. Storkey. Machine learning markets. In Proceedings of AI and Statistics (AISTATS), pages 716\u2013\n\n724, 2011.\n\n[19] J. Wolfers and E. Zitzewitz. Prediction markets. Journal of Economic Perspectives, 18(2):107\u2013126, 2004.\n[20] Justin Wolfers and Eric Zitzewitz. Interpreting prediction market prices as probabilities. Technical report,\n\nNational Bureau of Economic Research, 2006.\n\n[21] Erik Zawadzki and S\u00b4ebastien Lahaie. Nonparametric scoring rules. In Proceedings of the Twenty-Ninth\n\nAAAI Conference on Arti\ufb01cial Intelligence, 2015.\n\n[22] Lijun Zhang, Rong Jin, Chun Chen, Jiajun Bu, and Xiaofei He. Ef\ufb01cient online learning for large-scale\n\nsparse kernel logistic regression. In AAAI, 2012.\n\n[23] Lijun Zhang, Jinfeng Yi, Rong Jin, Ming Lin, and Xiaofei He. Online kernel learning with a near optimal\nsparsity bound. In Proceedings of the 30th International Conference on Machine Learning (ICML-13),\npages 621\u2013629, 2013.\n\n9\n\n\f", "award": [], "sourceid": 1941, "authors": [{"given_name": "Bo", "family_name": "Waggoner", "institution": "Harvard"}, {"given_name": "Rafael", "family_name": "Frongillo", "institution": "CU Boulder"}, {"given_name": "Jacob", "family_name": "Abernethy", "institution": "University of Michigan"}]}