{"title": "Privacy-Preserving Belief Propagation and Sampling", "book": "Advances in Neural Information Processing Systems", "page_first": 745, "page_last": 752, "abstract": null, "full_text": "Privacy-Preserving Belief Propagation and Sampling\n\nMichael Kearns, Jinsong Tan, and Jennifer Wortman\n\nDepartment of Computer and Information Science\nUniversity of Pennsylvania, Philadelphia, PA 19104\n\nAbstract\n\nWe provide provably privacy-preserving versions of belief propagation, Gibbs\nsampling, and other local algorithms \u2014 distributed multiparty protocols in which\neach party or vertex learns only its \ufb01nal local value, and absolutely nothing else.\n\n1 Introduction\n\nIn this paper we provide provably privacy-preserving versions of belief propagation, Gibbs sam-\npling, and other local message-passing algorithms on large distributed networks. Consider a network\nof human social contacts, and imagine that each party would like to compute or estimate their prob-\nability of having contracted a contagious disease, which depends on the exposures to the disease of\ntheir immediate neighbors in the network. If network participants (or their proxy algorithms) engage\nin standard belief propagation, each party would learn their probability of exposure conditioned on\nany evidence \u2014 and a great deal more, including information about the exposure probabilities of\ntheir neighbors. Obviously such leakage of non-local information is highly undesirable in settings\nwhere we regard each party in the network as a self-interested agent, and privacy is paramount. Other\nexamples include inference problems in distributed military sensor networks (where we would like\nthe \u201ccapture\u201d of one sensor to reveal as little non-local state information as possible), settings where\nnetworks of \ufb01nancial organizations would like to share limited information, and so on.\n\nBy a privacy-preserving version of inference (for example), we informally mean a protocol in which\neach party learns their conditional probability of exposure to the disease and absolutely nothing else.\nMore precisely, anything a party can ef\ufb01ciently compute after having participated in the protocol,\nthey could have ef\ufb01ciently computed alone given only the value of their conditional probability \u2014\nthus, the protocol leaked no additional information beyond its desired outputs. Classical and power-\nful tools from cryptography [6] provide solutions to this problem, but with the signi\ufb01cant drawback\nof entirely centralizing the privacy-preserving computation. Put another way, the straightforward\nsolution from cryptography requires every party in the network to have the ability to broadcast to\nall others, and the resulting algorithm may bear little resemblance to standard belief propagation.\nClearly this is infeasible in settings where the network is very large and entirely distributed, where\nindividuals may not even know the size of the overall network, much less its structure and the\nidentity of its constituents. While there has been work on minimizing the number of messages ex-\nchanged in centralized privacy-preserving protocols [9], ours are the \ufb01rst results that preserve the\nlocal communication structure of distributed algorithms like belief propagation.\n\nOur protocols are faithful to the network topology, requiring only the passing of messages between\nparties separated by one or two hops in the network. Furthermore, our protocols largely preserve\nthe algebraic structure of the original algorithms (for instance, the sum-product structure of belief\npropagation) and enjoy all the correctness guarantees of the originals (such as exact inference in\ntrees for belief prop or convergence of Gibbs sampling to the joint distribution). Our technical\nmethods show how to blend tools from cryptography (secure multiparty computation and blindable\nencryption) with local message-passing algorithms in a way that preserves the original computations,\nbut in which all messages appear to be randomly distributed from the viewpoint of any individual.\n\n1\n\n\fAll results in this paper apply to the \u201csemi-honest\u201d or \u201chonest but curious\u201d model in the cryptography\nliterature, in which participants obediently execute the protocol but may attempt to infer non-private\ninformation from it. We expect that via the use of zero-knowledge proof techniques, our protocols\nmay be strengthened to models in which participants who deviate from the protocol are detected.\n\n2 Background and Tools from Cryptography\n\n2.1 Secure Multiparty Function Computation\n\nLet f (x1, . . . , xk) be any function on k inputs. Imagine a scenario in which there are k distinct\nparties, each in possession of exactly one of these inputs (that is, party i initially knows xi) and the\nk parties would like to jointly compute the value of f (x1, . . . , xk). Perhaps the simplest protocol\nwould have all parties share their private inputs and then individually compute the value of f. How-\never, in many natural settings, we would like the parties to be able to perform this joint computation\nin a privacy-preserving fashion, with each party revealing as little as possible about their private\ninput. Simple examples include voting \u2014 we would all like to learn the results of the election with-\nout having to broadcast our private votes \u2014 and the so-called \u201cMillionaire\u2019s Problem\u201d in which two\nindividuals would like to learn who is wealthier, without revealing their precise wealth to each other.\nIf a trusted \u201cthird party\u201d is available, one solution would be to provide the private inputs to them,\nand have them perform the computation in secrecy, only announcing the \ufb01nal result. The purpose\nof the theory of secure multiparty function computation [6] is to show that under extremely general\ncircumstances, a third party is surprisingly unnecessary.\n\nNote that it is typically inevitable that some information is revealed just by the result of the compu-\ntation of f itself. For example, in the Millionaire\u2019s Problem, there is no avoiding the poorer party\nlearning a lower bound on the richer\u2019s wealth (namely, the poorer party\u2019s wealth). The goal is thus\nto reveal nothing beyond what it implied by the value of f.\nTo formalize this notion in a complexity-theoretic framework, let us assume without loss of gener-\nality that each input xi is n bits in length. We make the natural and common assumptions that the\nfunction f can be computed in time polynomial in kn, and that each party\u2019s computational resources\nare bounded by a polynomial in n. We (informally) de\ufb01ne a protocol \u03a0 for the k parties to compute\nf to be a speci\ufb01c mechanism by which the parties exchange messages and perform computations,\nending with every party learning the value y = f (x1, . . . , xk). One (uninteresting) protocol is the\none in which each party sends their private inputs to all others, and every party computes y alone.\n\nDe\ufb01nition 1 1 Let \u03a0 be any protocol for the k parties to jointly compute the value y =\nf (x1, . . . , xk) from their n-bit private inputs. We say that \u03a0 is privacy-preserving if for every\n1 \u2264 i \u2264 k, anything that party i can compute in time polynomial in n following the execution\nof \u03a0, they could also compute in polynomial time given only their private input xi and the value y.\n\nIn other words, whatever information party i is able to obtain from their view of the execution of\nprotocol \u03a0, it does not let them ef\ufb01ciently compute anything they couldn\u2019t ef\ufb01ciently compute just\nfrom being told the \ufb01nal output y of \u03a0 (and their private input xi). This captures the notion that\nwhile y itself may \u201cleak\u201d some information about the other private inputs xj, the protocol \u03a0 yields\nnothing further.2 Further, for the following theorem we can consider more general vector outputs\nand randomized functionalities, which we need for our technical results.\n\nTheorem 1 (See e.g. [6]) Let f (x1, . . . , xk) = (y1, . . . , yk) be any (possibly randomized) k-input,\nk-output functionality that can be computed in polynomial time. Then under standard cryptographic\nassumptions, 3 there exists a polynomial time privacy-preserving protocol \u03a0 for f (that is, a protocol\nin which party i learns nothing not already implied by their private input xi and private output yi).\n1We state this de\ufb01nition informally, as the complete technical de\ufb01nition is somewhat lengthy and adds little\nintuition. It involves both formalizing the notion of a multiparty computation protocol, as well as de\ufb01ning the\n\u201cview\u201d of an individual party of a speci\ufb01c execution of the protocol. The de\ufb01nition involves computational\nindistinguishability of probability distributions since the protocols may often use randomization.\n\n2Our de\ufb01nition of privacy does not imply that coalitions of parties cannot together compute additional\ninformation. In the extended version of this paper, we discuss the dif\ufb01culty of achieving this stronger notion of\nprivacy with any protocol that uses a truly distributed method of computation.\n\n3An example would be the existence of trapdoor permutations [6].\n\n2\n\n\fThis remarkable and important theorem essentially says that whatever a population can jointly com-\npute, it can jointly compute with arbitrary restrictions on who learns what. A powerful use of vector\noutputs is to enforce knowledge asymmetries on the parties. For instance, in the Millionaire\u2019s Prob-\nlem, by de\ufb01ning one player\u2019s output to always be nil, we can ensure that this player learns absolutely\nnothing from the protocol, while the other learns which player is wealthier.\n\nThe proof of Theorem 1 is constructive, providing an algorithm to transform any polynomial cir-\ncuit into a polynomial-time privacy-preserving protocol for k parties. As discussed in the intro-\nduction, this theorem can be immediately applied to (say) belief propagation to yield centralized\nprivacy-preserving protocols for inference; our contribution is preserving the highly distributed, lo-\ncal message-passing structure of belief propagation and similar algorithms.\n\n2.2 Public-Key Encryption with Blinding\n\nThe second cryptographic primitive that we shall require is standard public-key encryption with an\nadditional property known as blinding. A standard public-key cryptosystem allows any party to\ngenerate a pair of keys (P, S), which we can think of as k-bit strings; k is often called the security\nparameter. Associated with the public key P there is a (possibly probabilistic) encryption function\nEP and associated with the secret or private key S there is a (deterministic) decryption function DS.\nInformally, the system should have the following security properties:\n\n\u2022 For any n-bit x, the value of the function EP (x) can be computed in polynomial time from\n\ninputs x and P . Similarly, DS(y) can be computed ef\ufb01ciently given y and S.\n\n\u2022 For any n-bit input x, DS(EP (x)) = x. Thus, decryption is the inverse of encryption.\n\u2022 For any n-bit x, it is hard for a party knowing only the public key P and the encryption\n\nEP (x) to compute x. 4\n\nThus, in such a scheme, anyone knowing the public key of Alice can ef\ufb01ciently compute and send\nencrypted messages to Alice, but only Alice, who is the sole party knowing her private key, can\ndecrypt those messages. Such cryptosystems are widely believed to exist and numerous concrete\nproposals have been examined for decades. As one speci\ufb01c example that allows probabilistic en-\ncryption of individual bits, let the public key consist of an integer N = p \u00b7 q that is the product of\ntwo k/2-bit randomly generated prime numbers p and q, as well as a number z that has the property\nthat z is not equal to x2 mod N for any x. It is easy to generate such (N, z) pairs. In order to\nencrypt a 0, one simply chooses x at random and lets the encryption be y = x2 mod N, known\nas a quadratic residue. In order to encrypt a 1, one instead sends y = x2 \u00b7 z mod N, which is\nguaranteed to not be a quadratic residue. It is not dif\ufb01cult to show that given the factors p and q\n(which constitute the secret key), one can ef\ufb01ciently compute whether y is a quadratic residue and\nthus learn the decrypted bit. Furthermore, it is widely believed that decryption is actually equivalent\nto factoring N, and thus intractable without the secret key.\nThis simple public-key cryptosystem also has the additional blinding property that we will require.\nGiven only the public key (N, z) and an encrypted bit y as above, it is the case that for any value w,\nw2y mod N is a quadratic residue if and only if y is a quadratic residue, and that furthermore w2y\nmod N is uniformly distributed among all (non-)quadratic residues if y is a (non-)quadratic residue.\nThus, a party knowing only Alice\u2019s public key can nevertheless take any bit encrypted for Alice and\ngenerate random re-encryptions of that bit without needing to actually know the decryption. We\nrefer to this operation as blinding an encrypted bit.\n\n3 Privacy-Preserving Belief Propagation\n\nIn this section we brie\ufb02y review the standard algorithm for belief propagation on trees [10] and\noutline how to run this algorithm in a privacy-preserving manner such that each variable learns only\nits \ufb01nal marginals and no additional new information that is not implied by these marginals.\n\nIn standard belief propagation, we are given an undirected graphical model with vertex set X for\nwhich the underlying network is a tree. We denote by V(Xi) the set of possible values of Xi \u2208 X ,\n\n4This is often formalized by asserting that the distribution of the encryption is computationally indistin-\n\nguishable from true randomness in time polynomial in n and k.\n\n3\n\n\fand by N (Xi) the set of Xi\u2019s neighbors. For each Xi \u2208 X , we are given a non-negative potential\nfunction \u03c8i over possible values xi \u2208 V(Xi). Similarly, for each pair of adjacent vertices Xi and\nXj, we are given a non-negative potential function \u03c8i,j over joint assignments to Xi and Xj.\nThe main inductive phase of the belief propagation algorithm is the message-passing phase. At each\nstep, a node Xi computes a message \u00b5i\u2192j to send to some Xj \u2208 N (Xi). This message is indexed\nby all possible assignments xj \u2208 V(Xj), and is de\ufb01ned inductively by\n\n\u00b5i\u2192j(xj) = (cid:88)\n\nxi\u2208V(Xi)\n\n\u03c8i(xi)\u03c8i,j(xi, xj) (cid:89)\n\nXk\u2208N (Xi)\\Xj\n\n\u00b5k\u2192i(xi).\n\n(1)\n\nBelief propagation follows the so-called message-passing protocol, in which any vertex of degree d\nthat has received the incoming messages from any d\u22121 of its neighbors can perform the computation\nabove in order to send an outgoing message to its remaining neighbor. Eventually, the vertex will\nreceive a message back from this last neighbor, at which point it will be able to calculate messages\nto send to its remaining d \u2212 1 neighbors. The protocol begins at the leaves of the tree, and it\nfollows from standard arguments that until all nodes have received incoming messages from all of\ntheir neighbors, there must be some vertex that is ready to compute and send a new message. The\nmessage-passing phase ends when all vertices have received messages from all of their neighbors.\nOnce vertex Xi has received all of its incoming messages, the marginal distribution P is proportional\nto their product. That is, if xi is any setting to Xi, then\n\nP[Xi = xi] \u221d \u03c8i(xi) (cid:89)\n\nXj \u2208N (Xi)\n\n\u00b5j\u2192i(xi).\n\n(2)\n\nWhen there is evidence in the network, represented as a partial assignment (cid:126)e to some subset E of the\nvariables, we can simply hard-wire this evidence into the potential functions \u03c8j for each Xj \u2208 E. In\nthis case it is well-known that the algorithm computes the conditional marginals P[Xi = xi|E = (cid:126)e].\nFor a more in-depth review of belief propagation, see Yedidia et al. [13] or Chapter 8 of Bishop [1].\n\n3.1 Mask Propagation and the Privacy-Preserving Protocol\n\nWe assume that at the beginning of the privacy-preserving protocol, each node Xi knows its own\nindividual potential function \u03c8i, as well as the joint potential functions \u03c8i,j for all Xj \u2208 N (Xi).\nRecall that our fundamental privacy goal is to allow each vertex Xi to compute its own marginal\ndistribution P[Xi = xi] (or P[Xi = xi|E = (cid:126)e] if there is evidence), but absolutely nothing else.\nIn particular, Xi should not be able to compute the values of any of the incoming messages from its\nneighbors. Knowledge of \u00b5j\u2192i(xi), for example, along with \u00b5i\u2192j and \u03c8i,j, may give Xi informa-\ntion about the marginals over Xj, a clear privacy violation. We thus must somehow prevent Xi from\nbeing able to \u201cread\u201d any of its incoming messages \u2014 nor even its own outgoing messages \u2014 yet\nstill allow each variable to learn its own set of marginals at the end. To accomplish this we combine\ntools from secure multiparty function computation with a method we call \u201cmask propagation\u201d, in\nwhich messages remain \u201cmasked\u201d (that is, provably unreadable) to the vertices at all times. The\nkeys required to unmask the messages are generated locally as the computation propagates through\nthe tree, thus preserving the original communication pattern of the standard (non-private) algorithm.\n\nBefore diving into the secure protocol, we \ufb01rst must \ufb01x conventions regarding the encoding of\nnumerical values. We will assume throughout that all potential function values, all message values\nand all the required products computed by the algorithm can be represented as n-bit natural numbers\nand thus fall in ZN = {0, . . . , N \u2212 1} where N = 2n. As expressed by Equation (2), marginal\nprobabilities are obtained by taking products of such n-bit numbers and then normalizing to obtain\n\ufb01nite-precision real-valued numbers in the range [0, 1]. It will be convenient to think of values in ZN\nas elements of the cyclic group of order N with addition and subtraction modulo N. In particular,\nwe will make frequent use of the following simple fact: for any \ufb01xed x \u2208 ZN , if r \u2208 ZN is chosen\nrandomly among all n-bit numbers, then x + r mod N is also distributed randomly among all n-bit\nnumbers. We can think of the random value r as \u201cmasking\u201d or hiding the value of x to a party that\ndoes not know r, while leaving it readable to a party that does.\nLet us now return to the message-passing phase of the algorithm described by Equation (1), and let\nus focus on the computation of \u00b5i\u2192j for a \ufb01xed setting xj of Xj. For the secure version of the\nalgorithm, we make the following inductive message and knowledge assumptions:\n\n4\n\n\f\u2022 For each X(cid:96) \u2208 N (Xi)\\Xj, and for each setting xi of Xi, Xi has already obtained a masked\n\nversion of \u00b5(cid:96)\u2192i(xi):\n\n\u00b5(cid:96)\u2192i(xi) + \u03b2j,(cid:96)(xi) mod N\n\n(3)\n\nwhere \u03b2j,(cid:96)(xi) is uniformly distributed in ZN .\n\n\u2022 Xi knows only the sum in Equation (3) (which again is uniformly distributed in ZN and\n\nthus meaningless by itself), and does not know the masking values \u03b2j,(cid:96)(xi).\n\n\u2022 Vertex Xj knows only the masking values \u03b2j,(cid:96)(xi), and not the sum in Equation (3).\n\nFor all leaf nodes, these assumptions hold trivially at the start of the protocol, providing the base\ncase for the induction. Now under these informational assumptions, vertex Xi knows the set Ii =\n{\u00b5(cid:96)\u2192i(xi) + \u03b2j,(cid:96)(xi) mod N : X(cid:96) \u2208 N (Xi)\\Xj, xi \u2208 V(Xi)} while vertex Xj knows the set\nIj = {\u03b2j,(cid:96)(xi) mod N : X(cid:96) \u2208 N (Xi)\\Xj, xi \u2208 V(Xi)}.\nLet us \ufb01rst consider the case in which Xj is not a leaf node and thus has neighbors other than Xi\nitself. In order to complete the inductive step, it will be necessary for each Xk \u2208 N (Xj)\\Xi to\nprovide a set of masking values \u03b2k,i(xj) so that Xj can obtain a set of masked messages of the form\n\u00b5i\u2192j(xj) + \u03b2k,i(xj). Here we focus on a single neighbor Xk of Xj.\nVertex Xk privately generates a masking value \u03b2k,i(xj) that\nZn.\n\u03c8i(xi)\u03c8i,j(xi, xj)(cid:81)X(cid:96)\u2208N (Xi)\\Xj\ninputs Ii, Ij, and \u03b2k,i(xj), ignoring privacy, Xi, Xj, and Xk could compute:\n\nis uniformly distributed in\ntogether could compute\n\u00b5(cid:96)\u2192i(xi) for each \ufb01xed pair xi and xj. Thus from their joint\n\nignoring privacy concerns, Xi and Xj\n\nIt\n\nis clear\n\nthat,\n\n\uf8eb\n\uf8ed (cid:88)\n\nxi\u2208V(Xi)\n\n\u03c8i(xi)\u03c8i,j(xi, xj) (cid:89)\n\nX(cid:96)\u2208N (Xi)\\Xj\n\n\u00b5(cid:96)\u2192i(xi)\uf8f6\n\n\uf8f8 + \u03b2k,i(xj) mod N\n\n= \u00b5i\u2192j(xj) + \u03b2k,i(xj) mod N\n\n(4)\nSince this expression can be computed jointly by Xi, Xj and Xk without privacy considerations,\nTheorem 1 establishes that we can construct an ef\ufb01cient protocol for them to compute it securely,\nallowing Xj to learn only the value of the expression in Equation (4), while Xi and Xk learn no new\ninformation at all (i.e. nil output). Note that this expression, due to the presence of the unknown\nmasking value \u03b2k,i(xj), is a uniform randomly distributed number in Zn from Xj\u2019s point of view.\nAfter this masking process has been completed for all Xk \u2208 N (Xj)\\Xi, we will have begun to\nsatisfy the inductive informational assumptions a step further in the propagation: for each neighbor\nXk of Xj excluding Xi, Xj will know a masked version of \u00b5i\u2192j(xj) in which the masking value\n\u03b2k,i(xj) is known only to Xk. Xj will obtain masked messages in a similar manner from all but one\nof its other neighbors in turn, and for all of its other values, until the inductive assumptions are fully\nsatis\ufb01ed at Xj. Every value received by Xi, Xj, and Xk during the above protocol is distributed\nuniformly at random in Zn from the perspective of its recipient, and thus conveys no information.\nIt remains to consider the case in which Xj is a leaf node. In this case, there is no need to satisfy\nthe inductive assumptions at the next level, as the propagation ends at the leaves. Furthermore, it is\nacceptable for Xj to learn its incoming messages directly, since these messages will be implied by\nits \ufb01nal marginal. From their joint input Ii and Ij, it is clear that Xi and Xj together could compute\n\u00b5i\u2192j(xj) as given in Equation (1). Thus by Theorem 1, we can construct a protocol for them to\nef\ufb01ciently compute this value in such a way that Xj learns only \u00b5i\u2192j(xj) and Xi learns nothing.\nAt the end of the message-passing phase, each internal (non-leaf) node Xi will know a set of masked\nmessages from each of its neighbors. In particular, for each pair Xj, X(cid:96) \u2208 N (Xi), for each xi \u2208\nV(Xi), Xi will know the values of \u00b5j\u2192i(xi) + \u03b2(cid:96),j(xi). Ignoring privacy concerns, it is the case\nthat Xi and any pair of its neighbors could compute the marginal of Xi in Equation (2). Invoking\nTheorem 1 again, we can construct an ef\ufb01cient protocol for Xi and this pair of neighbors to together\ncompute the marginals such that Xi learns only the marginals and the neighbors learn nothing.\nEach leaf vertex Xi will be in possession of its unmasked messages \u00b5j\u2192i(xi) for every xi \u2208 V(Xi)\nfrom its neighbor Xj, and can easily compute its marginals as given in Equation (2) without having\nlearned anything not already implied by its initial potential functions and the marginals themselves.\nWe use PrivateBeliefProp(T ) to denote the algorithm above when applied to a particular tree T .\nThe full proof of the following is omitted, but follows the logic sketched in the preceding sections.\n\n5\n\n\fTheorem 2 Under standard cryptographic assumptions, PrivateBeliefProp(T ) allows every vari-\nable Xi to compute its own marginal distribution P[Xi] and nothing else (that is, nothing not al-\nready computable in polynomial time from only P[Xi] and the initial potential functions). Direct\ncommunication occurs only between variables who are immediate neighbors or two steps away in\nT , and secure function computation is never invoked on sets of more than three variables. 5\n\nWe brie\ufb02y note a number of extensions to Theorem 2 and the methods described above.\nLoopy Belief Propagation: Theorem 2 can be extended to privacy-preserving loopy belief propa-\ngation on graphs that contain cycles. Because of the protocol\u2019s faithfulness to the original algorithm,\nthe same convergence and correctness claims hold as in standard loopy belief propagation [7].\nComputing Only Partial Information: Allowing a variable to learn its exact numerical marginal\ndistribution may actually convey a great deal of information. We might instead only want each\nvariable to learn, for instance, whether its probability of taking on a given value is greater than\n0.1 or not. Theorem 2 can easily be generalized to allow each variable to learn only any partial\ninformation about its own marginal.\nPrivacy-Preserving Junction Tree: The protocol can also be modi\ufb01ed to perform privacy-\npreserving belief propagation on a junction tree [11]. Here it is necessary to take intra-clique privacy\ninto account in order to enforce that variables can learn only their own marginals and not, for exam-\nple, the marginals of other nodes within the same clique.\nNashProp and Other Message-Passing Algorithms: The methods described here can also be\napplied to provide privacy-preserving versions of the NashProp algorithm [8], allowing players in\na multiparty game to jointly compute and draw actions from a Nash equilibrium, with each player\nlearning only his own action and nothing else.6 We are investigating more general applications of\nour methods to a broad class of message-passing algorithms that would include many others.\n\n4 Privacy-Preserving Gibbs Sampling\n\nWe now move on to the problem of secure Gibbs sampling on an undirected graphical model G. The\nlocal potential functions accompanying G can be preprocessed to obtain conditional distributions for\neach variable given a setting of all its neighbors (Markov blanket). Thus we henceforth assume that\neach variable has access to its local conditional distribution, which it will be convenient to represent\nin a particular tabular form. To simplify presentation, we will assume each variable is binary, taking\non values in {0, 1}, but this assumption is easy to relax.\nIf a node Xi is of degree d, the conditional distribution of Xi given a particular assignment to its\nneighbors will be represented by a table Ti with 2d rows and d + 1 columns. The \ufb01rst d columns\nrange over all 2d possible assignments (cid:126)x to N (Xi), while the \ufb01nal column contains the numerical\nvalue P[Xi = 1|N (Xi) = (cid:126)x]. We will use Ti((cid:126)x) to denote the value P[Xi = 1|N (Xi) = (cid:126)x] stored\nin the d + 1st column in the row corresponding to the assignment (cid:126)x.\nWith this notation, the standard (non-private) Gibbs sampling algorithm [4, 2] can be easily de-\nscribed. After choosing an initial assignment to all of the variables in G (for instance, uniformly at\nrandom), the algorithm repeatedly resamples values for individual variables conditioned on the cur-\nrent values of their neighbors. More precisely, at each step, a variable Xi is chosen for resampling.\nIts current value is replaced by randomly drawing value 1 with probability Ti((cid:126)x) and value 0 with\nprobability 1 \u2212 Ti((cid:126)x) where (cid:126)x is the current set of assignments to N (Xi).\nTo implement a privacy-preserving variant of Gibbs sampling, we must solve the following crypto-\ngraphic problem: how can a set of vertices communicate with their neighbors in order to repeatedly\nresample their values from their conditional distributions given their neighbors\u2019 current assignments,\nwithout learning any information except their own \ufb01nal values at the end of the process and anything\nthat is implied by these values? Again, we would like to accomplish this with limited communication\nso that no vertex is required to communicate with a vertex more than two hops away.\n\n5Since the application of standard secure function computation requires broadcast among all participants, it\n\nis a feature of the algorithm that it limits such invocations to three parties at a time.\n\n6See work by Dodis et al. [3] and Teague [12] for more on privacy-preserving computation in game theory.\n\n6\n\n\fi such that the current value of Xi is bi \u2295 b(cid:48)\ni.\n\nIn order for each variable to learn only its \ufb01nal sampled value after some number of iterations, and\nnot its intermediate resampled values (which may be enough to provide a good approximation of the\nmarginal distribution on the variable), we \ufb01rst provide a way of distributing the current value of a\nvertex so that it cannot be learned by any vertex in isolation. One way of accomplishing this is by\nassigning each vertex Xi a \u201cdistinguished neighbor\u201d N \u2217(Xi). Xi will hold one bit bi while N \u2217(Xi)\nwill hold a second bit b(cid:48)\nUsing such an encoding,\nthere is a simple but relatively inef\ufb01cient construction for privacy-\npreserving Gibbs sampling that uses only secure multiparty function computation, but that invokes\nTheorem 1 on entire neighborhoods of the graph. In graphs with high degree, this requires broad-\ncast communication between a large number of parties, which we would like to avoid. Here we\ndescribe a much more communication-ef\ufb01cient protocol using blinded encryption. For concrete-\nness the reader may imagine below that we are using the blindable cryptosystem based on quadratic\nresidues described in Section 2.2, though other choices are possible.\nWe begin by describing a sub-protocol for preprocessing the table Ti before resampling begins. Let\nS be the 2d indices of the rows of the table Ti. For ease of notation, we will refer to the d neighbors\nof Xi as V1, . . . , Vd. The purpose of the sub-protocol is for Xi and its neighbors to compute a\nrandom permutation \u03c0 of S (which can be thought of as a random permutation of the rows of Ti) in\nsuch a way that during the protocol, each Vj \u2208 N (Xi) learns only the sets {\u03c0((cid:126)x) : Vj = 0} and\n{\u03c0((cid:126)x) : Vj = 1} and Xi learns nothing.\nThe sub-protocol is quite simple. First each neighbor Vj of Xi encrypts column j of Ti using its\nown public key and passes the encrypted column to Xi. Next Xi encrypts column d + 1 using its\nown public key. Xi then concatenates the d + 1 encrypted columns together to form an encrypted\nversion of Ti in which column j is encrypted using the public key of Vj for 1 \u2264 j \u2264 d and column\nd + 1 is encrypted using the public key of Xi. Xi then takes the resulting table, randomly permutes\nthe rows, and blinds (randomly re-encrypts) each entry using the appropriate public keys (i.e. the\nkey of Vj for column j where 1 \u2264 j \u2264 d and its own public key for column d + 1). At this point,\nXi sends the resulting table to its distinguished neighbor N \u2217(Xi).\nThe purpose of the blinding steps here is to prevent parties from tracking correspondences between\ncleartext and encrypted table entries. For instance, without blinding above, N \u2217(Xi) could recon-\nstruct the permutation chosen by Xi by seeing how its own encrypted values have been rearranged.\nNow from the perspective of N \u2217(Xi), d columns of the table will look like uniformly distributed\nrandom bits. N \u2217(Xi) will still be able to decrypt the column of the table that corresponds to its own\nvalues, but it will become clear that decrypting this column alone cannot yield useful information.\nIn the next step in the protocol, N \u2217(Xi) re-encrypts column d + 1 of the table with its own public\nkey. It then randomly permutes the rows of the table, blinds each entry using the appropriate public\nkeys (those of Vj for columns 1 \u2264 j \u2264 d and its own for column d + 1), and sends the updated\ntable back to Xi. At this point, every entry in the table will look random bits to Xi. Each column\nj will be encrypted by the public key of Vj, with the exception of the \ufb01nal column, which will be\nencrypted by both Xi and N \u2217(Xi). Call this new table T (cid:48)\ni .\nOnce these encrypted tables have been computed for each node, we begin the main Gibbs sampling\nprotocol. We inductively assume that at the start of each step, for each Xj \u2208 X , the current value\nof Xj is distributed between Xj and N \u2217(Xj). At the end of the step, the only information that has\nbeen learned is the new value of a particular node Xi, but distributed between Xi and N \u2217(Xi).\nConsider a neighbor Vj of Xi. Vj can decrypt column j of T (cid:48)\ni in order to learn which rows correspond\nto its value being 0 and which rows correspond to its values being 1. While Vj alone does not know\nwhat its current value is, Vj and N \u2217(Vj) could compute it together, and thus could together \ufb01gure\nout which rows of the permutation correspond to Vj\u2019s current value. By Theorem 1, since there is a\nway for them to compute this information ignoring privacy, we can construct an ef\ufb01cient protocol for\nVj, N \u2217(Vj), and Xi to perform this computation such that Xi learns only the rows that correspond\nto Vj\u2019s value (and in particular does not learn what this value is), while Vj and N \u2217(Vj) learn nothing.\nAfter this secure computation of partitions has been completed for all neighbors of Xi, Xi will be\nable to compute the intersection of the subsets of rows it has received from each neighbor. This\nintersection will be a single row corresponding to the current values of all nodes in N (Xi). Initially,\nXi will not be able to decrypt any of the entries in this row. However, Xi and N \u2217(Xi) could together\n\n7\n\n\fdecrypt the value in column d + 1, use this value in order to sample Xi\u2019s new value according to the\nappropriate distribution, and distribute the new value between themselves. Calling upon Theorem 1\nonce again, this means that we can construct an ef\ufb01cient protocol for Xi and N \u2217(Xi) to together\ncomplete these computations in such a way that they only learn the new bits bi and b(cid:48)\ni respectively.\nEach time the value of a node Xi is resampled, Xi and N \u2217(Xi) repeat the process of blinding and\npermuting the rows of T (cid:48)\ni . This prevents Xi and its neighbors from learning how frequently they\ntake on different values throughout the sampling process. After the value of each node has been\nprivately resampled suf\ufb01ciently many times, we can use one \ufb01nal application of secure multi-party\ncomputation between each node Xi and its distinguished neighbor N \u2217(Xi) to allow Xi to learn its\n\ufb01nal value.\n\nAs with standard Gibbs sampling, we also need to specify a schedule by which vertices in the\nMarkov network will have their values updated, as well as the number of iterations of this schedule,\nwhich will in turn determine how close the sampled distribution is to the true joint (stationary)\ndistribution. Since our interests are in privacy considerations only, let us use PrivateGibbs to\nrefer to the protocol described above when applied to any \ufb01xed Markov network, combined with\nsome \ufb01xed updating schedule (such as random or a \ufb01xed ordering) and some number r of iterations.\n\nTheorem 3 Under standard cryptographic assumptions7, PrivateGibbs computes a sample from\nthe joint distribution after r iterations, with every variable learning its own value and nothing else.\nDirect communication occurs only between variables who are immediate neighbors or two steps\naway, and secure function computation is never invoked on sets of more than three variables.\n\nThe full proof is again omitted, but largely follows the sketch above. We note that PrivateGibbs en-\njoys an even stronger privacy property \u2014 even if any subset of parties collude by combining their\npost-protocol views, they can learn nothing not implied by their combined sampled values. Fur-\nthermore, any convergence guarantees that hold for standard Gibbs sampling [4, 5] with the same\nupdating schedule will also hold for the secure version.\n\nReferences\n[1] C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.\n[2] G. Casella and E. George. Explaining the Gibbs sampler. The American Statistician, 46:167\u2013174, 1992.\n[3] Y. Dodis, S. Halevi, and T. Rabin. A cryptographic solution to a game theoretic problem. In CRYPTO,\n\npages 112\u2013130, 2000.\n\n[4] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of\n\nimages. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721\u2013741, 1984.\n\n[5] A. Gibbs. Bounding convergence time of the Gibbs sampler in Bayesian image restoration. Biometrika,\n\n87:749\u2013766, 2000.\n\n[6] O. Goldreich. Foundations of Cryptography, Volume 2. Cambridge University Press, 2004.\n[7] A. Ihler, J. Fisher III, and A. Willsky. Loopy belief propagation: Convergence and effects of message\n\nerrors. Journal of Machine Learning Research, 6:905\u2013936, 2005.\n\n[8] M. Kearns, M. Littman, and S. Singh. Graphical models for game theory. In Uncertainty in Arti\ufb01cial\n\nIntelligence, 2001.\n\n[9] M. Naor and K. Nissim. Communication preserving protocols for secure function evaluation. In ACM\n\nSymposium on Theory of Computing, pages 590\u2013599, 2001.\n\n[10] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kauf-\n\nmann, 1988.\n\n[11] P. Shenoy and G. Shafer. Axioms for probability and belief-function propagation.\n\nArti\ufb01cial Intelligence, pages 169\u2013198, 1990.\n\nIn Uncertainty in\n\n[12] V. Teague. Selecting correlated random actions. In Financial Cryptography, pages 181\u2013195, 2004.\n[13] J. Yedidia, W. Freeman, and Y. Weiss. Understanding belief propagation and its generalizations.\n\nExploring Arti\ufb01cial Intelligence in the New Millennium. Morgan Kaufmann, 2003.\n\nIn\n\n7An example would be intractability of recognizing quadratic residues.\n\n8\n\n\f", "award": [], "sourceid": 879, "authors": [{"given_name": "Michael", "family_name": "Kearns", "institution": null}, {"given_name": "Jinsong", "family_name": "Tan", "institution": null}, {"given_name": "Jennifer", "family_name": "Wortman", "institution": null}]}