{"title": "Expectation Particle Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 3609, "page_last": 3617, "abstract": "We propose an original particle-based implementation of the Loopy Belief Propagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a continuous state space. The algorithm constructs adaptively efficient proposal distributions approximating the local beliefs  at each note of the MRF. This is achieved by considering proposal distributions in the exponential family whose parameters are updated iterately in an Expectation Propagation (EP) framework. The proposed particle scheme provides consistent estimation of the LBP marginals as the number of particles increases. We demonstrate that it provides more accurate results than the Particle Belief Propagation (PBP) algorithm of Ihler and McAllester (2009) at a fraction of the computational cost and is additionally more robust empirically. The computational complexity of our algorithm at each iteration is quadratic in the number of particles. We also propose an accelerated implementation with sub-quadratic computational complexity which still provides consistent estimates of the loopy BP marginal distributions and performs almost as well as the original procedure.", "full_text": "Expectation Particle Belief Propagation\n\nThibaut Lienart, Yee Whye Teh, Arnaud Doucet\n\nDepartment of Statistics\n\nUniversity of Oxford\n\n{lienart,teh,doucet}@stats.ox.ac.uk\n\nOxford, UK\n\nAbstract\n\nWe propose an original particle-based implementation of the Loopy Belief Prop-\nagation (LPB) algorithm for pairwise Markov Random Fields (MRF) on a con-\ntinuous state space. The algorithm constructs adaptively ef\ufb01cient proposal distri-\nbutions approximating the local beliefs at each note of the MRF. This is achieved\nby considering proposal distributions in the exponential family whose parameters\nare updated iterately in an Expectation Propagation (EP) framework. The pro-\nposed particle scheme provides consistent estimation of the LBP marginals as the\nnumber of particles increases. We demonstrate that it provides more accurate re-\nsults than the Particle Belief Propagation (PBP) algorithm of [1] at a fraction of\nthe computational cost and is additionally more robust empirically. The compu-\ntational complexity of our algorithm at each iteration is quadratic in the number\nof particles. We also propose an accelerated implementation with sub-quadratic\ncomputational complexity which still provides consistent estimates of the loopy\nBP marginal distributions and performs almost as well as the original procedure.\n\n1\n\nIntroduction\n\nUndirected Graphical Models (also known as Markov Random Fields) provide a \ufb02exible framework\nto represent networks of random variables and have been used in a large variety of applications in\nmachine learning, statistics, signal processing and related \ufb01elds [2]. For many applications such as\ntracking [3, 4], sensor networks [5, 6] or computer vision [7, 8, 9] it can be bene\ufb01cial to de\ufb01ne MRF\non continuous state-spaces.\nGiven a pairwise MRF, we are here interested in computing the marginal distributions at the nodes\nof the graph. A popular approach to do this is to consider the Loopy Belief Propagation (LBP) algo-\nrithm [10, 11, 2]. LBP relies on the transmission of messages between nodes. However when deal-\ning with continuous random variables, computing these messages exactly is generally intractable.\nIn practice, one must select a way to tractably represent these messages and a way to update these\nrepresentations following the LBP algorithm. The Nonparametric Belief Propagation (NBP) algo-\nrithm [12] represents the messages with mixtures of Gaussians while the Particle Belief Propagation\n(PBP) algorithm [1] uses an importance sampling approach. NBP relies on restrictive integrability\nconditions and does not offer consistent estimators of the LBP messages. PBP offers a way to cir-\ncumvent these two issues but the implementation suggested proposes sampling from the estimated\nbeliefs which need not be integrable. Moreover, even when they are integrable, sampling from\nthe estimated beliefs is very expensive computationally. Practically the authors of [1] only sample\napproximately from those using short MCMC runs, leading to biased estimators.\nIn our method, we consider a sequence of proposal distributions at each node from which one can\nsample particles at a given iteration of the LBP algorithm. The messages are then computed using\nimportance sampling. The novelty of the approach is to propose a principled and automated way\nof designing a sequence of proposals in a tractable exponential family using the Expectation Prop-\n\n1\n\n\fagation (EP) framework [13]. The resulting algorithm, which we call Expectation Particle Belief\nPropagation (EPBP), does not suffer from restrictive integrability conditions and sampling is done\nexactly which implies that we obtain consistent estimators of the LBP messages. The method is em-\npirically shown to yield better approximations to the LBP beliefs than the implementation suggested\nin [1], at a much reduced computational cost, and than EP.\n\n2 Background\n\n2.1 Notations\n\nWe consider a pairwise MRF, i.e. a distribution over a set of p random variables indexed by a set\nV = {1, . . . , p}, which factorizes according to an undirected graph G = (V, E) with\n\np(xV ) \u221d (cid:89)\n\n(cid:89)\n\n\u03c8u(xu)\n\n\u03c8uv(xu, xv).\n\n(1)\n\nu\u2208V\n\n(u,v)\u2208E\n\nThe random variables are assumed to take values on a continuous, possibly unbounded, space X .\nThe positive functions \u03c8u : X (cid:55)\u2192 R+ and \u03c8uv : X \u00d7 X (cid:55)\u2192 R+ are respectively known as the node\nand edge potentials. The aim is to approximate the marginals pu(xu) for all u \u2208 V . A popular\napproach is the LBP algorithm discussed earlier. This algorithm is a \ufb01xed point iteration scheme\nyielding approximations called the beliefs at each node [10, 2]. When the underlying graph is a tree,\nthe resulting beliefs can be shown to be proportional to the exact marginals. This is not the case in\nthe presence of loops in the graph. However, even in these cases, LBP has been shown to provide\ngood approximations in a wide range of situations [14, 11]. The LBP \ufb01xed-point iteration can be\nwritten as follows at iteration t:\n\n(cid:90)\n\n(cid:89)\n\nmt\n\nuv(xv) =\n\n\u03c8uv(xu, xv)\u03c8u(xu)\n\nBt\n\nu(xu) = \u03c8u(xu)\n\nmt\n\nwu(xu)\n\n(cid:89)\n\nw\u2208\u0393u\n\nmt\u22121\n\nwu (xu)dxu\n\n,\n\n(2)\n\nw\u2208\u0393u\\v\n\nwhere \u0393u denotes the neighborhood of u i.e., the set of nodes {w | (w, u) \u2208 E}, muv is known as\nthe message from node u to node v and Bu is the belief at node u.\n\n2.2 Related work\n\nThe crux of any generic implementation of LBP for continuous state spaces is to select a way to rep-\nresent the messages and design an appropriate method to compute/approximate the message update.\nIn Nonparametric BP (NBP) [12], the messages are represented by mixtures of Gaussians. In theory,\ncomputing the product of such messages can be done analytically but in practice this is impractical\ndue to the exponential growth in the number of terms to consider. To circumvent this issue, the\nauthors suggest an importance sampling approach targeting the beliefs and \ufb01tting mixtures of Gaus-\nsians to the resulting weighted particles. The computation of the update (2) is then always done over\na constant number of terms.\nA restriction of \u201cvanilla\u201d Nonparametric BP is that the messages must be \ufb01nitely integrable for the\nmessage representation to make sense. This is the case if the following two conditions hold:\n\n\u03c8uv(xu, xv)dxu < \u221e, and\n\n\u03c8u(xu)dxu < \u221e.\n\n(3)\n\n\uf8f1\uf8f4\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f4\uf8f3\n\n(cid:90)\n\nsup\nxv\n\n(cid:90)\n\nThese conditions do however not hold in a number of important cases as acknowledged in [3]. For\ninstance, the potential \u03c8u(xu) is usually proportional to a likelihood of the form p(yu|xu) which\nneed not be integrable in xu. Similarly, in imaging applications for example, the edge potential can\nencode similarity between pixels which also need not verify the integrability condition as in [15].\nFurther, NBP does not offer consistent estimators of the LBP messages.\nParticle BP (PBP) [1] offers a way to overcome the shortcomings of NBP: the authors also consider\nimportance sampling to tackle the update of the messages but without \ufb01tting a mixture of Gaussians.\n\n2\n\n\fFor a chosen proposal distribution qu on node u and a draw of N particles {x(i)\nmessages are represented as mixtures:\n\nu }N\n\n(cid:98)mPBP\n\nuv (xv) :=\n\nN(cid:88)\n\ni=1\n\n\u03c9(i)\nuv \u03c8uv(x(i)\n\nu , xv), with \u03c9(i)\n\nuv :=\n\n1\nN\n\n\u03c8u(x(i)\nu )\nqu(x(i)\nu )\n\ni=1 \u223c qu(xu), the\n(cid:98)mPBP\n\nwu (x(i)\n\nu ).\n\n(4)\n\n(cid:89)\n\nw\u2208\u0393u\\v\n\n(5)\n\nThis algorithm has the advantage that it does not require the conditions (3) to hold. The authors\nsuggest two possible choices of sampling distributions: sampling from the local potential \u03c8u, or\nsampling from the current belief estimate. The \ufb01rst case is only valid if \u03c8u is integrable w.r.t. xu\nwhich, as we have mentioned earlier, might not be the case in general and the second case implies\nsampling from a distribution of the form\n\n(cid:98)BPBP\nu (xu) \u221d \u03c8u(xu)\n\n(cid:89)\n\nw\u2208\u0393u\n\n(cid:98)mPBP\n\nwu (xu)\n\ncost of each iteration, which requires evaluating (cid:98)BPBP\n\nwhich is a product of mixtures. As in NBP, na\u00a8\u0131ve sampling of the proposal has complexity O(N|\u0393u|)\nand is thus in general too expensive to consider. Alternatively, as the authors suggest, one can run\na short MCMC simulation targeting it which reduces the complexity to order O(|\u0393u|N 2) since the\npoint-wise, is of order O(|\u0393u|N ), and we\nneed O(N ) iterations of the MCMC simulation. The issue with this approach is that it is still com-\nputationally expensive, and it is unclear how many iterations are necessary to get N good samples.\n\nu\n\n2.3 Our contribution\n\nIn this paper, we consider the general context where the edge and node-potentials might be non-\nnormalizable and non-Gaussian. Our proposed method is based on PBP, as PBP is theoretically\nbetter suited than NBP since, as discussed earlier, it does not require the conditions (3) to hold, and,\nprovided that one samples from the proposals exactly, it yields consistent estimators of the LBP\nmessages while NBP does not. Further, the development of our method also formally shows that\nconsidering proposals close to the beliefs, as suggested by [1], is a good idea. Our core observation\nis that since sampling from a proposal of the form (5) using MCMC simulation is very expensive,\nwe should consider using a more tractable proposal distribution instead. However it is important that\nthe proposal distribution is constructed adaptively, taking into account evidence collected through\nthe message passing itself, and we propose to achieve this by using proposal distributions lying in a\ntractable exponential family, and adapted using the Expectation Propagation (EP) framework [13].\n\n3 Expectation Particle Belief Propagation\n\nOur aim is to address the issue of selecting the proposals in the PBP algorithm. We suggest using\nexponential family distributions as the proposals on a node for computational ef\ufb01ciency reasons,\nwith parameters chosen adaptively based on current estimates of beliefs and EP. Each step of our\nalgorithm involves both a projection onto the exponential family as in EP, as well as a particle\napproximation of the LBP message, hence we will refer to our method as Expectation Particle\nBelief Propagation or EPBP for short.\nFor each pair of adjacent nodes u and v, we will use muv(xv) to denote the exact (but unavailable)\n\nLBP message from u to v, (cid:98)muv(xv) to denote the particle approximation of muv, and \u03b7uv an expo-\nnential family projection of (cid:98)muv. In addition, let \u03b7\u25e6u denote an exponential family projection of the\nwill derive the form of our particle approximated message (cid:98)muv(xv), along with the choice of the\nproposal distribution qu(xu) used to construct (cid:98)muv. Our starting point is the edge-wise belief over\n\nnode potential \u03c8u. We will consider approximations consisting of N particles. In the following, we\n\nxu and xv, given the incoming particle approximated messages,\n\n(cid:98)Buv(xu, xv) \u221d \u03c8uv(xu, xv)\u03c8u(xu)\u03c8v(xv)\n\n(cid:98)mwu(xu)\n\nThe exact LBP message muv(xv) can be derived by computing the marginal distribution (cid:98)Buv(xv),\nand constructing muv(xv) such that(cid:98)Buv(xv) \u221d muv(xv)(cid:99)Mvu(xv),\n\nw\u2208\u0393u\\v\n\n\u03bd\u2208\u0393v\\u\n\n(7)\n\n(cid:98)m\u03bdv(xv).\n\n(cid:89)\n\n(cid:89)\n\n(6)\n\n3\n\n\fwhere(cid:99)Mvu(xv) = \u03c8v(xv)(cid:81)\n\nu. It is easy to see that the resulting message is as expected,\n\n\u03bd\u2208\u0393v\\u (cid:98)m\u03bdv(xv) is the (particle approximated) pre-message from v to\n(cid:90)\n\n(cid:89)\n\n(cid:98)mwu(xu)dxu.\n\n\u03c8uv(xu, xv)\u03c8u(xu)\n\n(8)\n\nmuv(xv) \u221d\n\nSince the above exact LBP belief and message are intractable in our scenario of interest, the idea\n\nis to use an importance sampler targeting (cid:98)Buv(xu, xv) instead. Consider a proposal distribution\n\nof the form qu(xu)qv(xv). Since xu and xv are independent under the proposal, we can draw\nN independent samples, say {x(i)\nj=1, from qu and qv respectively. We can then\napproximate the belief using a N \u00d7 N cross product of the particles,\n\nv }N\n\nu }N\n\nw\u2208\u0393u\\v\n\n(cid:98)Buv(xu, xv) \u2248 1\n\nN(cid:88)\n\nN(cid:88)\n\nN 2\n\nqu(x(i)\ni,j=1\nu , x(j)\nv )\u03c8u(x(i)\n\n\u03c8uv(x(i)\n\ni=1 and {x(j)\n(cid:98)Buv(x(i)\nv )(cid:81)\nu )(cid:99)Mvu(x(j)\n\nu , x(j)\nv )\nu )qv(x(j)\nv )\n\n\u03b4(x(i)\n\nu ,x(j)\n\nv )(xu, xv)\n\nw\u2208\u0393u\\v (cid:98)mwu(x(i)\n\nu )\n\n\u221d 1\nN 2\n\n\u03b4(x(i)\n\nu ,x(j)\n\nv )(xu, xv)\n\nMarginalizing onto xv, we have the following particle approximation to (cid:98)Buv(xv),\n\nu )qv(x(j)\nv )\n\nqu(x(i)\n\ni,j=1\n\n(cid:98)Buv(xv) \u2248 1\n\nN(cid:88)\n\n(cid:98)muv(x(j)\n\nv )(cid:99)Mvu(x(j)\n\nv )\n\n\u03b4x(j)\n\n(xv)\n\n(10)\n\nv\n\nN\n\nj=1\n\nqv(x(j)\nv )\n\nsentation in the PBP algorithm (4).\n\nwhere the particle approximated message (cid:98)muv(xv) from u to v has the form of the message repre-\nTo determine sensible proposal distributions, we can \ufb01nd qu and qv that are close to the target (cid:98)Buv.\nUsing the KL divergence KL((cid:98)Buv(cid:107)quqv) as the measure of closeness, the optimal qu required for the\nu to v message is the node belief,(cid:98)Buv(xu) \u221d \u03c8u(xu)\n\n(cid:89)\n\n(cid:98)mwu(xu)\n\n(11)\n\n(9)\n\nw\u2208\u0393u\n\n(cid:89)\n\nthus supporting the claim in [1] that a good proposal to use is the current estimate of the node belief.\nAs pointed out in Section 2, it is computationally inef\ufb01cient to use the particle approximated node\nbelief as the proposal distribution. An idea is to use a tractable exponential family distribution for\nqu instead, say\n\n(12)\n\nqu(xu) \u221d \u03b7\u25e6u(xu)\n\n\u03b7wu(xu)\n\nwhere \u03b7\u25e6u and \u03b7wu are exponential family approximations of \u03c8u and (cid:98)mwu respectively. In Section\nsponding tilted distribution (cid:98)mwuq\n\n4 we use a Gaussian family, but we are not limited to this. Using the framework of expectation\npropogation (EP) [13], we can iteratively \ufb01nd good exponential family approximations as follows.\n\\w\nFor each w \u2208 \u0393u, to update the \u03b7wu, we form the cavity distribution q\nu \u221d qu/\u03b7wu and the corre-\nwu is the exponential family factor minimising\n\n\\w\nu . The updated \u03b7+\n\nw\u2208\u0393u\n\n(cid:104)(cid:98)mwu(xu)q\\w\n\n(cid:13)(cid:13)(cid:13) \u03b7(xu)q\\w\n\n(cid:105)\n\nu (xu)\n\nu (xu)\n\n.\n\n(13)\n\n\u03b7+\nwu = arg min\n\n\u03b7\u2208exp.fam.\n\nKL\n\nthe KL divergence,\n\nGeometrically, the update projects the tilted distribution onto the exponential family manifold.\nThe optimal solution requires computing the moments of the tilted distribution through numeri-\n\\w\ncal quadrature, and selecting \u03b7wu so that \u03b7wuq\nu matches the moments of the tilted distribution. In\nour scenario the moment computation can be performed crudely on a small number of evaluation\npoints since it only concerns the updating of the importance sampling proposal. If an optimal \u03b7 in\nthe exponential family does not exist, e.g. in the Gaussian case that the optimal \u03b7 has a negative\nvariance, we simply revert \u03b7wu to its previous value [13]. An analogous update is used for \u03b7\u25e6u.\nIn the above derivation, the expectation propagation steps for each incoming message into u and for\nthe node potential are performed \ufb01rst, to \ufb01t the proposal to the current estimated belief at u, before\n\n4\n\n\fit is used to draw N particles, which can then be used to form the particle approximated messages\n\nfrom u to each of its neighbours. Alternatively, once each particle approximated message (cid:98)muv(xv)\n\nis formed, we can update its exponential family projection \u03b7uv(xv) immediately. This alternative\nscheme is described in Algorithm 1.\n\nu ) = \u03c8u(x(i)\n\nu )(cid:81)\nu ) := (cid:98)Bu(x(i)\n\nw\u2208\u0393u (cid:98)mwu(x(i)\nu )/(cid:98)mvu(x(i)\n\nu )\n\nu )\n\n2: compute (cid:98)Bu(x(i)\n\nAlgorithm 1 Node update\n1: sample {x(i)\nu } \u223c qu(\u00b7 )\n3: for v \u2208 \u0393u do\n4:\n5:\n6:\n7:\n\ncompute the normalized weights w(i)\n\ncompute(cid:99)Muv(x(i)\nupdate the estimator of the outgoing message (cid:98)muv(xv) =(cid:80)N\nv approximates (cid:98)muvq\n\n\\\u25e6\nv , update qv \u221d \u03b7+\u25e6v and let \u03b7\u25e6v \u2190 \u03b7+\u25e6v\n\\u\nv , update qv \u221d \u03b7+\nuv and let \u03b7uv \u2190 \u03b7+\n\nuv \u221d (cid:99)Muv(x(i)\n\\\u25e6\nv \u221d qv/\u03b7\u25e6v, get \u03b7+\u25e6v in the exponential family such that\n\\u\nv \u221d qv/\u03b7uv, get \u03b7+\n\ncompute the cavity distribution q\n\\\u25e6\nv approximates \u03c8vq\ncompute the cavity distribution q\n\\u\n9: end for\n\nuv in the exponential family such that\n\nu )/qu(x(i)\nu )\n\nuv \u03c8uv(x(i)\n\ni=1 w(i)\n\nu , xv)\n\n\u03b7+\nuvq\n\n\u03b7+\u25e6vq\n\n8:\n\nuv\n\n3.1 Computational complexity and sub-quadratic implementation\n\nEach EP projection step costs O(N ) computations since the message(cid:98)mwu is a mixture of N compo-\n\nhave the form (cid:98)muv(xv) = (cid:80)N\n\nnents (see (4)). Drawing N particles from the exponential family proposal qu costs O(N ). The step\nwith highest computational complexity is in evaluating the particle weights in (4). Indeed, evaluating\nthe mixture representation of a message on a single point is O(N ), and we need to compute this for\neach of N particles. Similarly, evaluating the estimator of the belief on N sampling points at node\nu requires O(|\u0393u|N 2). This can be reduced since the algorithm still provides consistent estimators\nif we consider the evaluation of unbiased estimators of the messages instead. Since the messages\nuv(xv), we can follow a method presented in [16] where\n(cid:96)=1 from a multinomial with weights {wi\none draws M indices {i(cid:63)\n(cid:96)}M\ni=1 and evaluates the corre-\nuv. This reduces the cost of the evaluation of the beliefs to O(|\u0393u|M N )\nsponding M components \u03c8i(cid:63)\nwhich leads to an overall sub-quadratic complexity if M is o(N ). We show in the next section how\nit compares to the quadratic implementation when M = O(log N ).\n\nuv}N\n\ni=1 wi\n\nuv\u03c8i\n\n(cid:96)\n\n4 Experiments\n\nWe investigate the performance of our method on MRFs for two simple graphs. This allows us\nto compare the performance of EPBP to the performance of PBP in depth. We also illustrate the\nbehavior of the sub-quadratic version of EPBP. Finally we show that EPBP provides good results in\na simple denoising application.\n\n(cid:26) \u03c8u(xu)\n\n4.1 Comparison with PBP\nWe start by comparing EPBP to PBP as implemented by Ihler et al. on a 3 \u00d7 3 grid (\ufb01gure 1)\nwith random variables taking values on R. The node and edge potentials are selected such that the\nmarginals are multimodal, non-Gaussian and skewed with\n\n= \u03b11N (xu \u2212 yu;\u22122, 1) + \u03b12G(xu \u2212 yu; 2, 1.3)\n\n\u03c8uv(xu, xv) = L(xu \u2212 xv; 0, 2)\n\n(14)\nwhere yu denotes the observation at node u, N (x; \u00b5, \u03c3) \u221d exp(\u2212x2/2\u03c32) (density of a Normal\ndistribution), G(x; \u00b5, \u03b2) \u221d exp(\u2212(x\u2212\u00b5)/\u03b2 +exp(\u2212(x\u2212\u00b5)/\u03b2)) (density of a Gumbel distribution)\nand L(x; \u00b5, \u03b2) \u221d exp(\u2212|x \u2212 \u00b5|/\u03b2) (density of a Laplace distribution). The parameters \u03b11 and \u03b12\nare respectively set to 0.6 and 0.4. We compare the two methods after 20 LBP iterations.1\n\n,\n\n1The scheduling used alternates between the classical orderings: top-down-left-right, left-right-top-down,\ndown-up-right-left and right-left-down-up. One \u201cLBP iteration\u201d implies that all nodes have been updated once.\n\n5\n\n\fFigure 1: Illustration of the grid (left) and tree (right) graphs used in the experiments.\n\nPBP as presented in [1] is implemented using the same parameters than those in an implementation\ncode provided by the authors: the proposal on each node is the last estimated belief and sampled with\na 20-step MCMC chain, the MH proposal is a normal distribution. For EPBP, the approximation of\nthe messages are Gaussians. The ground truth is approximated by running LBP on a deterministic\nequally spaced mesh with 200 points. All simulations were run with Julia on a Mac with 2.5 GHz\nIntel Core i5 processor, our code is available online.2\nFigure 2 compares the performances of both methods. The error is computed as the mean L1 error\nover all nodes between the estimated beliefs and the ground truth evaluated over the same deter-\nministic mesh. One can observe that not only does PBP perform worse than EPBP but also that the\nerror plateaus with increasing number of samples. This is because the secondampling within PBP\nis done approximately and hence the consistency of the estimators is lost. The speed-up offered by\nEPBP is very substantial (\ufb01gure 4 left). Hence, although it would be possible to use more MCMC\n\u221a\n(Metropolis-Hastings) iterations within PBP to improve its performance, it would make the method\nprohibitively expensive to use. Note that for EPBP, one observes the usual 1/\nN convergence of\nparticle methods.\nFigure 3 compares the estimator of the beliefs obtained by the two methods for three arbitrarily\npicked nodes (node 1, 5 and 9 as illustrated on \ufb01gure 1). The \ufb01gure also illustrates the last proposals\nconstructed with our approach and one notices that their supports match closely the support of the\ntrue beliefs. Figure 4 left illustrates how the estimated beliefs converge as compared to the true\nbeliefs with increasing number of iterations. One can observe that PBP converges more slowly and\nthat the results display more variability which might be due to the MCMC runs being too short.\nWe repeated the experiments on a tree with 8 nodes (\ufb01gure 1 right) where we know that, at con-\nvergence, the beliefs computed using BP are proportional to the true marginals. The node and edge\npotentials are again picked such that the marginals are multimodal with\n\n= \u03b11N (xu \u2212 yu;\u22122, 1) + \u03b12N (xu \u2212 yu; 1, 0.5)\n\n\u03c8uv(xu, xv) = L(xu \u2212 xv; 0, 1)\n\n,\n\n(15)\n\n(cid:26) \u03c8u(xu)\n\nwith \u03b11 = 0.3 and \u03b12 = 0.7. On this example, we also show how \u201cpure EP\u201d with normal distribu-\ntions performs. We also try using the distributions obtained with EP as proposals for PBP (referred\nto as \u201cPBP after EP\u201d in \ufb01gures). Both methods underperform compared to EPBP as illustrated vi-\nsually in Figure 5. In particular one can observe in Figure 3 that \u201cPBP after EP\u201d converges slower\nthan EPBP with increasing number of samples.\n\n4.2 Sub-quadratic implementation and denoising application\n\nAs outlined in Section 3.1, in the implementation of EPBP one can use an unbiased estimator of\nthe edge weights based on a draw of M components from a multinomial. The complexity of the\nresulting algorithm is O(M N ). We apply this method to the 3 \u00d7 3 grid example in the case where\ni.e., for N = {10, 20, 50, 100, 200, 500}, we pick\nM is picked to be roughly of order log(N ):\nM = {5, 6, 8, 10, 11, 13}. The results are illustrated in Figure 6 where one can see that the N log N\nimplementation compares very well to the original quadratic implementation at a much reduced\ncost. We apply this sub-quadratic method on a simple probabilistic model for an image denoising\nproblem. The aim of this example is to show that the method can be applied to larger graphs and still\nprovide good results. The model underlined is chosen to showcase the \ufb02exibility and applicability\nof our method in particular when the edge-potential is non-integrable. It is not claimed to be an\noptimal approach to image denoising.3 The node and edge potentials are de\ufb01ned as follows:\n\n= N (xu \u2212 yu; 0, 0.1)\n\u03c8uv(xu, xv) = L\u03bb(xu \u2212 xv; 0, 0.03)\n\n,\n\n(16)\n\n(cid:26) \u03c8u(xu)\n\n2https://github.com/tlienart/EPBP.\n3In this case in particular, an optimization-based method such as [17] is likely to yield better results.\n\n6\n\n12345678958764132\fwhere L\u03bb(x; \u00b5, \u03b2) = L(x; \u00b5, \u03b2) if |x| \u2264 \u03bb and L(\u03bb; \u00b5, \u03b2) otherwise.\nIn this example we set\n\u03bb = 0.2. The value assigned to each pixel of the reconstruction is the estimated mean obtained over\nthe corresponding node (\ufb01gure 7). The image has size 50 \u00d7 50 and the simulation was run with\nN = 30 particles per nodes, M = 5 and 10 BP iterations taking under 2 minutes to complete. We\ncompare it with the result obtained with EP on the same model.\n\nFigure 2: (left) Comparison of the mean L1 error for PBP and EPBP for the 3 \u00d7 3 grid example.\n(right) Comparison of the mean L1 error for \u201cPBP after EP\u201d and EPBP for the tree example. In both\ncases, EPBP is more accurate for the same number of samples.\n\nFigure 3: Comparison of the beliefs on node 1, 5 and 9 as obtained by evaluating LBP on a deter-\nministic mesh (true belief ), with PBP and with EPBP for the 3\u00d7 3 grid example. The proposal used\nby EPBP at the last step is also illustrated. The results are obtained with N = 100 samples on each\nnode and 20 BP iterations. One can observe visually that EPBP outperforms PBP.\n\nFigure 4: (left) Comparison of the convergence in L1 error with increasing number of BP iterations\nfor the 3 \u00d7 3 grid example when using N = 30 particles. (right) Comparison of the wall-clock time\nneeded to perform PBP and EPBP on the 3 \u00d7 3 grid example.\n\n5 Discussion\n\nWe have presented an original way to design adaptively ef\ufb01cient and easy-to-sample-from proposals\nfor a particle implementation of Loopy Belief Propagation. Our proposal is inspired by the Expec-\ntation Propagation framework.\nWe have demonstrated empirically that the resulting algorithm is signi\ufb01cantly faster and more ac-\ncurate than an implementation of PBP using the estimated beliefs as proposals and sampling from\nthem using MCMC as proposed in [1]. It is also more accurate than EP due to the nonparametric\nnature of the messages and offers consistent estimators of the LBP messages. A sub-quadratic ver-\nsion of the method was also outlined and shown to perform almost as well as the original method on\n\n7\n\nNumber of samples per node101102103Mean L1 error10-210-1100PBPEPBPNumber of samples per node101102103Mean L1 error10-210-1100EPBPPBP after EP-505101500.050.10.150.20.250.30.35-505101500.10.20.30.40.50.6-505101500.050.10.150.20.250.3True beliefEstimated belief (EPBP)Estimated belief (PBP)Proposal (EPBP)Number of BP iterations05101520Mean L1 error00.511.522.533.544.5EPBPPBPNumber of samples per node101102103Wall-clock time [s]10-1100101102103104PBPEPBP\fmildly multi-modal models, it was also applied successfully in a simple image denoising example\nillustrating that the method can be applied on graphical models with several hundred nodes.\nWe believe that our method could be applied successfully to a wide range of applications such as\nsmoothing for Hidden Markov Models [18], tracking or computer vision [19, 20]. In future work,\nwe will look at considering other divergences than the KL and the \u201cPower EP\u201d framework [21], we\nwill also look at encapsulating the present algorithm within a sequential Monte Carlo framework\nand the recent work of Naesseth et al. [22].\n\nFigure 5: Comparison of the beliefs on node 1, 3 and 8 as obtained by evaluating LBP on a deter-\nministic mesh, using EPBP, PBP, EP and PBP using the results of EP as proposals. This is for the\ntree example with N = 100 samples on each node and 20 LBP iterations. Again, one can observe\nvisually that EPBP outperforms the other methods.\n\nFigure 6: Comparison of the mean L1 error for PBP and EPBP on a 3 \u00d7 3 grid (left). For the\nsame number of samples, EPBP is more accurate. It is also faster by about two orders of magnitude\n(right). The simulations were run several times for the same observations to illustrate the variability\nof the results.\n\nFigure 7: From left to right: comparison of the original (\ufb01rst), noisy (second) and recovered image\nusing the sub-quadratic implementation of EPBP (third) and with EP (fourth).\n\nAcknowledgments\n\nWe thank Alexander Ihler and Drew Frank for sharing their implementation of Particle Belief Prop-\nagation. TL gratefully acknowledges funding from EPSRC (grant 1379622) and the Scatcherd Eu-\nropean scholarship scheme. YWT\u2019s research leading to these results has received funding from\nEPSRC (grant EP/K009362/1) and ERC under the EU\u2019s FP7 Programme (grant agreement no.\n617411). AD\u2019s research was supported by the EPSRC (grant EP/K000276/1, EP/K009850/1) and\nby AFOSR/AOARD (grant AOARD-144042).\n\n8\n\n-2024600.20.40.60.811.2-2024600.10.20.30.40.50.60.70.80.9-2024600.050.10.150.20.250.30.350.40.450.5True beliefEst. bel. (EPBP)Est. bel. (PBP)Est. bel. (EP)Est. bel. (PBP after EP)Number of samples101102103Mean L1 error10-210-1100NlogN implementationQuadratic implementationNumber of samples per node101102103Wall-clock time [s]10-1100101102NlogN implementationQuadratic implementation\fReferences\n[1] Alexander T. Ihler and David A. McAllester. Particle belief propagation. In Proc. 12th AIS-\n\nTATS, pages 256\u2013263, 2009.\n\n[2] Martin J. Wainwright and Michael I. Jordan. Graphical models, exponential families, and\n\nvariational inference. Found. and Tr. in Mach. Learn., 1(1\u20132):1\u2013305, 2008.\n\n[3] Erik B. Sudderth, Alexander T. Ihler, Michael Isard, William T. Freeman, and Alan S. Willsky.\n\nNonparametric belief propagation. Commun. ACM, 53(10):95\u2013102, 2010.\n\n[4] Jeremy Schiff, Erik B. Sudderth, and Ken Goldberg. Nonparametric belief propagation for\ndistributed tracking of robot networks with noisy inter-distance measurements. In IROS \u201909,\npages 1369\u20131376, 2009.\n\n[5] Alexander T. Ihler, John W. Fisher, Randolph L. Moses, and Alan S. Willsky. Nonparametric\nbelief propagation for self-localization of sensor networks. In IEEE Sel. Ar. Comm., volume 23,\npages 809\u2013819, 2005.\n\n[6] Christopher Crick and Avi Pfeffer. Loopy belief propagation as a basis for communication in\n\nsensor networks. In Proc. 19th UAI, pages 159\u2013166, 2003.\n\n[7] Jian Sun, Nan-Ning Zheng, and Heung-Yeung Shum. Stereo matching using belief propaga-\n\ntion. In IEEE Trans. Patt. An. Mach. Int., volume 25, pages 787\u2013800, 2003.\n\n[8] Andrea Klaus, Mario Sormann, and Konrad Karner. Segment-based stereo matching using\nbelief propagation and a self-adapting dissimilarity measure. In Proc. 18th ICPR, volume 3,\npages 15\u201318, 2006.\n\n[9] Nima Noorshams and Martin J. Wainwright. Belief propagation for continuous state spaces:\n\nStochastic message-passing with quantitative guarantees. JMLR, 14:2799\u20132835, 2013.\n[10] Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman, 1988.\n[11] Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Constructing free energy approxi-\n\nmations and generalized belief propagation algorithms. MERL Technical Report, 2002.\n\n[12] Erik B. Sudderth, Alexander T. Ihler, William T. Freeman, and Alan S. Willsky. Nonparametric\n\nbelief propagation. In Procs. IEEE Comp. Vis. Patt. Rec., volume 1, pages 605\u2013612, 2003.\n\n[13] Thomas P. Minka. Expectation propagation for approximate Bayesian inference. In Proc. 17th\n\nUAI, pages 362\u2013369, 2001.\n\n[14] Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propagation for approximate\n\ninference: an empirical study. In Proc. 15th UAI, pages 467\u2013475, 1999.\n\n[15] Mila Nikolova. Thresholding implied by truncated quadratic regularization. IEEE Trans. Sig.\n\nProc., 48(12):3437\u20133450, 2000.\n\n[16] Mark Briers, Arnaud Doucet, and Sumeetpal S. Singh. Sequential auxiliary particle belief\n\npropagation. In Proc. 8th ICIF, volume 1, pages 705\u2013711, 2005.\n\n[17] Leonid I. Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise re-\n\nmoval algorithms. Physica D, 60(1):259\u2013268, 1992.\n\n[18] M. Briers, A. Doucet, and S. Maskell. Smoothing algorithms for state-space models. Ann. Inst.\n\nStat. Math., 62(1):61\u201389, 2010.\n\n[19] Erik B. Sudderth, Michael I. Mandel, William T. Freeman, and Alan S. Willsky. Visual hand\ntracking using nonparametric belief propagation. In Procs. IEEE Comp. Vis. Patt. Rec., 2004.\n[20] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Ef\ufb01cient graph-based image segmentation.\n\nInt. Journ. Comp. Vis., 59(2), 2004.\n\n[21] Thomas P. Minka. Power EP. Technical Report MSR-TR-2004-149, 2004.\n[22] Christian A. Naesseth, Fredrik Lindsten, and Thomas B. Sch\u00a8on. Sequential monte carlo for\n\ngraphical models. In Proc. 27th NIPS, pages 1862\u20131870, 2014.\n\n9\n\n\f", "award": [], "sourceid": 2007, "authors": [{"given_name": "Thibaut", "family_name": "Lienart", "institution": "University of Oxford"}, {"given_name": "Yee Whye", "family_name": "Teh", "institution": "University of Oxford"}, {"given_name": "Arnaud", "family_name": "Doucet", "institution": "Oxford"}]}