{"title": "Local Algorithms for Approximate Inference in Minor-Excluded Graphs", "book": "Advances in Neural Information Processing Systems", "page_first": 729, "page_last": 736, "abstract": "We present a new local approximation algorithm for computing MAP and log-partition function for arbitrary exponential family distribution represented by a finite-valued pair-wise Markov random field (MRF), say G. Our algorithm is based on decomposing G into appropriately chosen small components; computing estimates locally in each of these components and then producing a good global solution. We prove that the algorithm can provide approximate solution within arbitrary accuracy when $G$ excludes some finite sized graph as its minor and G has bounded degree: all Planar graphs with bounded degree are examples of such graphs. The running time of the algorithm is $\\Theta(n)$ (n is the number of nodes in G), with constant dependent on accuracy, degree of graph and size of the graph that is excluded as a minor (constant for Planar graphs). Our algorithm for minor-excluded graphs uses the decomposition scheme of Klein, Plotkin and Rao (1993). In general, our algorithm works with any decomposition scheme and provides quantifiable approximation guarantee that depends on the decomposition scheme.", "full_text": "Local Algorithms for Approximate Inference in\n\nMinor-Excluded Graphs\n\nKyomin Jung\n\nDept. of Mathematics, MIT\n\nkmjung@mit.edu\n\nDevavrat Shah\n\nDept. of EECS, MIT\n\ndevavrat@mit.edu\n\nAbstract\n\nWe present a new local approximation algorithm for computing MAP and log-\npartition function for arbitrary exponential family distribution represented by a\n\ufb01nite-valued pair-wise Markov random \ufb01eld (MRF), say G. Our algorithm is\nbased on decomposing G into appropriately chosen small components; computing\nestimates locally in each of these components and then producing a good global\nsolution. We prove that the algorithm can provide approximate solution within\narbitrary accuracy when G excludes some \ufb01nite sized graph as its minor and G\nhas bounded degree: all Planar graphs with bounded degree are examples of such\ngraphs. The running time of the algorithm is \u0398(n) (n is the number of nodes in\nG), with constant dependent on accuracy, degree of graph and size of the graph\nthat is excluded as a minor (constant for Planar graphs).\nOur algorithm for minor-excluded graphs uses the decomposition scheme of\nKlein, Plotkin and Rao (1993). In general, our algorithm works with any decom-\nposition scheme and provides quanti\ufb01able approximation guarantee that depends\non the decomposition scheme.\n\n1 Introduction\n\nMarkov Random Field (MRF) based exponential family of distribution allows for representing dis-\ntributions in an intuitive parametric form. Therefore, it has been successful for modeling in many\napplications Speci\ufb01cally, consider an exponential family on n random variables X = (X1, . . . , Xn)\nrepresented by a pair-wise (undirected) MRF with graph structure G = (V, E), where vertices\nV = {1, . . . , n} and edge set E \u2282 V \u00d7 V . Each Xi takes value in a \ufb01nite set \u03a3 (e.g. \u03a3 = {0, 1}).\nThe joint distribution of X = (Xi): for x = (xi) \u2208 \u03a3n,\n\nPr[X = x] \u221d exp\uf8eb\n\uf8edX\n\ni\u2208V\n\n\u03c6i(xi) + X\n\n(i,j)\u2208E\n\n\u03c8ij(xi, xj)\uf8f6\n\uf8f8 .\n\n(1)\n\n: \u03a3 \u2192 R+ 4\n\n= {x \u2208 R : x \u2265 0}, and \u03c8ij\n\nHere, functions \u03c6i\n: \u03a32 \u2192 R+ are as-\nsumed to be arbitrary non-negative (real-valued) functions.1 The two most important computa-\ntional questions of interest are: (i) \ufb01nding maximum a-posteriori (MAP) assignment x\u2217, where\nx\u2217 = arg maxx\u2208\u03a3n Pr[X = x]; and (ii) marginal distributions of variables, i.e. Pr[Xi =\nfor x \u2208 \u03a3, 1 \u2264 i \u2264 n. MAP is equivalent to a minimal energy assignment (or ground state)\nx];\nwhere energy, E(x), of state x \u2208 \u03a3n is de\ufb01ned as E(x) = \u2212H(x) + Constant, where H(x) =\n\nPi\u2208V \u03c6i(xi)+P(i,j)\u2208E \u03c8ij (xi, xj ). Similarly, computing marginal is equivalent to computing log-\npartition function, de\ufb01ned as log Z = log(cid:16)Px\u2208\u03a3n exp(cid:16)Pi\u2208V \u03c6i(xi) +P(i,j)\u2208E \u03c8ij (xi, xj )(cid:17)(cid:17) .\n\nIn this paper, we will \ufb01nd \u03b5-approximation solutions of MAP and log-partition function: that is, \u02c6x\nand log \u02c6Z such that: (1 \u2212 \u03b5)H(x\u2217) \u2264 H(\u02c6x) \u2264 H(x\u2217),\n(1 \u2212 \u03b5) log Z \u2264 log \u02c6Z \u2264 (1 + \u03b5) log Z.\n\n1Here, we assume the positivity of \u03c6i\u2019s and \u03c8ij\u2019s for simplicity of analysis.\n\n1\n\n\fPrevious Work. The question of \ufb01nding MAP (or ground state) comes up in many important appli-\ncation areas such as coding theory, discrete optimization, image denoising.Similarly, log-partition\nfunction is used in counting combinatorial objects loss-probability computation in computer net-\nworks, etc. Both problems are NP-hard for exact and even (constant) approximate computation for\narbitrary graph G. However, applications require solving this problem using very simple algorithms.\nA plausible approach is as follows. First, identify wide class of graphs that have simple algorithms\nfor computing MAP and log-partition function. Then, try to build system (e.g. codes) so that such\ngood graph structure emerges and use the simple algorithm or else use the algorithm as a heuristic.\n\nSuch an approach has resulted in many interesting recent results starting the Belief Propagation\n(BP) algorithm designed for Tree graph [1].Since there a vast literature on this topic, we will recall\nonly few results. Two important algorithms are the generalized belief propagation (BP) [2] and the\ntree-reweighted algorithm (TRW) [3,4].Key properties of interest for these iterative procedures are\nthe correctness of \ufb01xed points and convergence. Many results characterizing properties of the \ufb01xed\npoints are known starting from [2]. Various suf\ufb01cient conditions for their convergence are known\nstarting [5]. However, simultaneous convergence and correctness of such algorithms are established\nfor only speci\ufb01c problems, e.g. [6].\n\nFinally, we discuss two relevant results. The \ufb01rst result is about properties of TRW. The TRW\nalgorithm provides provable upper bound on log-partition function for arbitrary graph [3]However,\nto the best of authors\u2019 knowledge the error is not quanti\ufb01ed. The TRW for MAP estimation has\na strong connection to speci\ufb01c Linear Programming (LP) relaxation of the problem [4]. This was\nmade precise in a sequence of work by Kolmogorov [7], Kolmogorov and Wainwright [8] for binary\nMRF. It is worth noting that LP relaxation can be poor even for simple problems.\n\nThe second is an approximation algorithm proposed by Globerson and Jaakkola [9] to compute\nlog-partition function using Planar graph decomposition (PDC). PDC uses techniques of [3] in con-\njunction with known result about exact computation of partition function for binary MRF when G is\nPlanar and the exponential family has speci\ufb01c form. Their algorithm provides provable upper bound\nfor arbitrary graph. However, they do not quantify the error incurred. Further, their algorithm is\nlimited to binary MRF.\n\nContribution. We propose a novel local algorithm for approximate computation of MAP and log-\npartition function. For any \u03b5 > 0, our algorithm can produce an \u03b5-approximate solution for MAP\nand log-partition function for arbitrary MRF G as long as G excludes a \ufb01nite graph as a minor\n(precise de\ufb01nition later). For example, Planar graph excludes K3,3, K5 as a minor. The running\ntime of the algorithm is \u0398(n), with constant dependent on \u03b5, the maximum vertex degree of G and\nthe size of the graph that is excluded as minor. Speci\ufb01cally, for a Planar graph with bounded degree,\nit takes \u2264 C(\u03b5)n time to \ufb01nd \u03b5-approximate solution with log log C(\u03b5) = O(1/\u03b5). In general, our\nalgorithm works for any G and we can quantify bound on the error incurred by our algorithm. It is\nworth noting that our algorithm provides a provable lower bound on log-partition function as well\nunlike many of previous works.\n\nThe precise results for minor-excluded graphs are stated in Theorems 1 and 2. The result concerning\ngeneral graphs are stated in the form of Lemmas 2-3-4 for log-partition and Lemmas 5-6-7 for MAP.\n\nTechniques. Our algorithm is based on the following idea: First, decompose G into small-size\nconnected components say G1, . . . , Gk by removing few edges of G. Second, compute estimates\n(either MAP or log-partition) in each of Gi separately. Third, combine these estimates to produce a\nglobal estimate while taking care of the effect induced by removed edges. We show that the error in\nthe estimate depends only on the edges removed. This error bound characterization is applicable for\narbitrary graph.\n\nKlein, Plotkin and Rao [10]introduced a clever and simple decomposition method for minor-\nexcluded graphs to study the gap between max-\ufb02ow and min-cut for multicommodity \ufb02ows. We\nuse their method to obtain a good edge-set for decomposing minor-excluded G so that the error\ninduced in our estimate is small (can be made as small as required).\n\nIn general, as long as G allows for such good edge-set for decomposing G into small components,\nour algorithm will provide a good estimate. To compute estimates in individual components, we\nuse dynamic programming. Since each component is small, it is not computationally burdensome.\n\n2\n\n\fHowever, one may obtain further simpler heuristics by replacing dynamic programming by other\nmethod such as BP or TRW for computation in the components.\n\n2 Preliminaries\nHere we present useful de\ufb01nitions and previous results about decomposition of minor-excluded\ngraphs from [10,11].\nDe\ufb01nition 1 (Minor Exclusion) A graph H is called minor of G if we can transform G into H\nthrough an arbitrary sequence of the following two operations: (a) removal of an edge; (b) merge\ntwo connected vertices u, v: that is, remove edge (u, v) as well as vertices u and v; add a new vertex\nand make all edges incident on this new vertex that were incident on u or v. Now, if H is not a minor\nof G then we say that G excludes H as a minor.\n\nThe explanation of the following statement may help understand the de\ufb01nition: any graph H with\nr nodes is a minor of Kr, where Kr is a complete graph of r nodes. This is true because one may\nobtain H by removing edges from Kr that are absent in H. More generally, if G is a subgraph of\nG0 and G has H as a minor, then G0 has H as its minor. Let Kr,r denote a complete bipartite graph\nwith r nodes in each partition. Then Kr is a minor of Kr,r. An important implication of this is as\nfollows: to prove property P for graph G that excludes H, of size r, as a minor, it is suf\ufb01cient to\nprove that any graph that excludes Kr,r as a minor has property P. This fact was cleverly used by\nKlein et. al. [10] to obtain a good decomposition scheme described next. First, a de\ufb01nition.\nDe\ufb01nition 2 ((\u03b4, \u2206)-decomposition) Given graph G = (V, E), a randomly chosen subset of edges\nB \u2282 E is called (\u03b4, \u2206) decomposition of G if the following holds: (a) For any edge e \u2208 E,\nPr(e \u2208 B) \u2264 \u03b4. (b) Let S1, . . . , SK be connected components of graph G0 = (V, E\\B) obtained by\nremoving edges of B from G. Then, for any such component Sj, 1 \u2264 j \u2264 K and any u, v \u2208 Sj the\nshortest-path distance between (u, v) in the original graph G is at most \u2206 with probability 1.\n\nThe existence of (\u03b4, \u2206)-decomposition implies that it is possible to remove \u03b4 fraction of edges so\nthat graph decomposes into connected components whose diameter is small. We describe a simple\nand explicit construction of such a decomposition for minor excluded class of graphs. This scheme\nwas proposed by Klein, Plotkin, Rao [10] and Rao [11].\n\nDeC(G, r, \u2206)\n\n(0) Input is graph G = (V, E) and r, \u2206 \u2208 N. Initially, i = 0, G0 = G, B = \u2205.\n(1) For i = 0, . . . , r \u2212 1, do the following.\n\n(a) Let Si\n(b) For each Si\n\n1, . . . , Si\nki\n\nbe the connected components of Gi.\n\nj, 1 \u2264 j \u2264 ki, pick an arbitrary node vj \u2208 Si\nj.\nj rooted at vj in Si\nj.\n\n\u25e6 Create a breadth-\ufb01rst search tree T i\n\u25e6 Choose a number Li\n\u25e6 Let Bi\n\u25e6 Update B = B \u222aki\n\nj be the set of edges at level Li\n\nj.\nj=1 Bi\n\nj uniformly at random from {0, . . . , \u2206 \u2212 1}.\n\nj, \u2206 + Li\n\nj, 2\u2206 + Li\n\nj, . . . in T i\nj .\n\n(c) set i = i + 1.\n\n(3) Output B and graph G0 = (V, E\\B).\n\nAs stated above, the basic idea is to use the following step recursively (upto depth r of recursion):\nin each connected component, say S, choose a node arbitrarily and create a breadth-\ufb01rst search tree,\nsay T . Choose a number, say L, uniformly at random from {0, . . . , \u2206 \u2212 1}. Remove (add to B) all\nedges that are at level L + k\u2206, k \u2265 0 in T . Clearly, the total running time of such an algorithm is\nO(r(n + |E|)) for a graph G = (V, E) with |V | = n; with possible parallel implementation across\ndifferent connected components.\nThe algorithm DeC(G, r, \u2206) is designed to provide a good decomposition for class of graphs that\nexclude Kr,r as a minor. Figure 1 explains the algorithm for a line-graph of n = 9 nodes, which\nexcludes K2,2 as a minor. The example is about a sample run of DeC(G, 2, 3) (Figure 1 shows the\n\ufb01rst iteration of the algorithm).\n\n3\n\n\fG0\n\nL1\n\n1\n\n\u0001\n\n\u0001\n\n2\n\n\n\u0001\n\n\u0001\n\n3\n\n\n\u0001\n\n\u0001\n\n4\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n5\n\n\u0001\n\n\u0001\n\n6\n\n\n\u0001\n\n\u0001\n\n7\n\n\u0001\n\n\u0001\n\n8\n\n\n\u0001\n\n\u0001\n\n9\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n5\n\n\n\u0001\n\n\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n4\n\n\n\u0001\n\n\u0001\n\n3\n\n2\n\u0001\u0001\n\n\u0001\n\n1\n\n\u0001\n\n6\n\n7\n\n\u0001\n\n\u0001\n\n8\n\n9\n\n\u0001\n\nT1\n\nG1\n\n\u0001\u0001\n\n\u0001\u0001 \u0001\u0001 \u0001\u0001 \u0001\u0001 \u0001\u0001 \u0001\u0001 \u0001\u0001 \u0001\u0001\n\n1\n\n2\n\n3\n\n4\n\n5\n\n6\n\n7\n\n8\n\n9\n\nS1\n\nS2\n\nS3\n\nS4\n\nS5\n\nFigure 1: The \ufb01rst of two iterations in execution of DeC(G, 2, 3) is shown.\n\nLemma 1 If G excludes Kr,r as a minor,\n(r/\u2206, O(\u2206))-decomposition of G.\n\nthen algorithm DeC(G, r, \u2206) outputs B which is\n\nIt is known that Planar graph excludes K3,3 as a minor. Hence, Lemma 1 implies the following.\nCorollary 1 Given a planar graph G,\ndecomposition for any \u2206 \u2265 1.\n\nthe algorithm DeC(G, 3, \u2206) produces (3/\u2206, O(\u2206))-\n\n3 Approximate log Z\nHere, we describe algorithm for approximate computation of log Z for any graph G. The algorithm\nuses a decomposition algorithm as a sub-routine. In what follows, we use term DECOMP for a\ngeneric decomposition algorithm. The key point is that our algorithm provides provable upper and\nlower bound on log Z for any graph; the approximation guarantee and computation time depends on\nthe property of DECOMP. Speci\ufb01cally, for Kr,r minor excluded G (e.g. Planar graph with r = 3),\nwe will use DeC(G, r, \u2206) in place of DECOMP. Using Lemma 1, we show that our algorithm based\non DeC provides approximation upto arbitrary multiplicative accuracy by tuning parameter \u2206.\n\nLOG PARTITION(G)\n\n(1) Use DECOMP(G) to obtain B \u2282 E such that\n\n(a) G0 = (V, E\\B) is made of connected components S1, . . . , SK.\n\n(2) For each connected component Sj, 1 \u2264 j \u2264 K, do the following:\n\n(a) Compute partition function Zj restricted to Sj by dynamic programming(or exhaus-\n\n(3) Let \u03c8L\n\ntive computation).\nij = min(x,x0)\u2208\u03a32 \u03c8ij(x, x0), \u03c8U\nlog Zj + X\n\nlog \u02c6ZLB =\n\nKX\n\nj=1\n\n(i,j)\u2208B\n\nij = max(x,x0)\u2208\u03a32 \u03c8ij (x, x0). Then\nlog Zj + X\n\nlog \u02c6ZUB =\n\nKX\n\n\u03c8L\nij ;\n\nj=1\n\n(i,j)\u2208B\n\n\u03c8U\nij.\n\n(4) Output: lower bound log \u02c6ZLB and upper bound log \u02c6ZUB.\n\nIn words, LOG PARTITION(G) produces upper and lower bound on log Z of MRF G as follows:\ndecompose graph G into (small) components S1, . . . , SK by removing (few) edges B \u2282 E using\nDECOMP(G). Compute exact log-partition function in each of the components. To produce bounds\nlog \u02c6ZLB, log \u02c6ZUB take the summation of thus computed component-wise log-partition function along\nwith minimal and maximal effect of edges from B.\nAnalysis of LOG PARTITION for General G : Here, we analyze performance of LOG PARTI-\nTION for any G. In the next section, we will specialize our analysis for minor excluded G when\nLOG PARTITION uses DeC as the DECOMP algorithm.\nLemma 2 Given an MRF G described by (1), the LOG PARTITION produces log \u02c6ZLB, log \u02c6ZUB such\nthat\n\nlog \u02c6ZLB \u2264 log Z \u2264 log \u02c6ZUB,\n\nlog \u02c6ZUB \u2212 log \u02c6ZLB = X\n\nij(cid:1) .\n(cid:0)\u03c8U\nij \u2212 \u03c8L\n\n(i,j)\u2208B\n\n4\n\n\fIt takes O(cid:0)|E|K\u03a3|S \u2217|(cid:1) + TDECOMP time to produce this estimate, where |S\u2217| = maxK\nj=1 |Sj| with\niji .\nij \u2212 \u03c8L\n\nDECOMP producing decomposition of G into S1, . . . , SK in time TDECOMP .\nLemma 3 If G has maximum vertex degree D then, log Z \u2265 1\n\nD+1 hP(i,j)\u2208E \u03c8U\n\nLemma 4 If G has maximum vertex degree D and the DECOMP(G) produces B that is (\u03b4, \u2206)-\ndecomposition, then\n\nEhlog \u02c6ZUB \u2212 log \u02c6ZLBi \u2264 \u03b4(D + 1) log Z,\nw.r.t. the randomness in B, and LOG PARTITION takes time O(nD|\u03a3|D\u2206\n\n) + TDECOMP .\n\nAnalysis of LOG PARTITION for Minor-excluded G : Here, we specialize analysis of LOG PAR-\nTITIONfor minor exclude graph G. For G that exclude minor Kr,r, we use algorithm DeC(G, r, \u2206).\nNow, we state the main result for log-partition function computation.\nTheorem 1 Let G exclude Kr,r as minor and have D as maximum vertex degree. Given \u03b5 > 0, use\nLOG PARTITION algorithm with DeC(G, r, \u2206) where \u2206 = d r(D+1)\n\ne. Then,\n\nlog \u02c6ZLB \u2264 log Z \u2264 log \u02c6ZUB;\n\n\u03b5\n\nEhlog \u02c6ZUB \u2212 log \u02c6ZLBi \u2264 \u03b5 log Z.\n\nFurther, algorithm takes (nC(D, |\u03a3|, \u03b5)), where constant C(D, |\u03a3|, \u03b5) = D|\u03a3|DO(rD/\u03b5)\n\n.\n\nWe obtain the following immediate implication of Theorem 1.\nCorollary 2 For any \u03b5 > 0, the LOG PARTITION algorithm with DeC algorithm for constant degree\nPlanar graph G based MRF, produces log \u02c6ZLB, log \u02c6ZUB so that\n\n(1 \u2212 \u03b5) log Z \u2264 log \u02c6ZLB \u2264 log Z \u2264 log \u02c6ZUB \u2264 (1 + \u03b5) log Z,\n\nin time O(nC(\u03b5)) where log log C(\u03b5) = O(1/\u03b5).\n\n4 Approximate MAP\nNow, we describe algorithm to compute MAP approximately. It is very similar to the LOG PAR-\nTITION algorithm: given G, decompose it into (small) components S1, . . . , SK by removing (few)\nedges B \u2282 E. Then, compute an approximate MAP assignment by computing exact MAP restricted\nto the components. As in LOG PARTITION, the computation time and performance of the algorithm\ndepends on property of decomposition scheme. We describe algorithm for any graph G; which will\nbe specialized for Kr,r minor excluded G using DeC(G, r, \u2206).\n\nMODE(G)\n\n(1) Use DECOMP(G) to obtain B \u2282 E such that\n\n(a) G0 = (V, E\\B) is made of connected components S1, . . . , SK.\n\n(2) For each connected component Sj, 1 \u2264 j \u2264 K, do the following:\n\n(a) Through dynamic programming (or exhaustive computation) \ufb01nd exact MAP x\u2217,j for\n\ncomponent Sj, where x\u2217,j = (x\u2217,j\n\ni\n\n)i\u2208Sj .\n\n(3) Produce output cx\u2217, which is obtained by assigning values to nodes using x\u2217,j, 1 \u2264 j \u2264 K.\n\nAnalysis of MODE for General G : Here, we analyze performance of MODE for any G. Later,\nwe will specialize our analysis for minor excluded G when it uses DeC as the DECOMP algorithm.\n\nLemma 5 Given an MRF G described by (1), the MODE algorithm produces outputs cx\u2217 such that\nH(x\u2217) \u2212 P(i,j)\u2208B(cid:0)\u03c8U\nij(cid:1) \u2264 H(cx\u2217) \u2264 H(x\u2217). It takes O(cid:0)|E|K\u03a3|S \u2217|(cid:1) + TDECOMP time to\nj=1 |Sj| with DECOMP producing decomposition of G into\n\nij \u2212 \u03c8L\n\nproduce this estimate, where |S\u2217| = maxK\nS1, . . . , SK in time TDECOMP .\nLemma 6 If G has maximum vertex degree D, then\n\nH(x\u2217) \u2265\n\n1\n\nD + 1\n\n\uf8ee\n\uf8f0 X\n\n(i,j)\u2208E\n\n\uf8f9\n\uf8fb \u2265\n\n1\n\nD + 1\n\n\uf8ee\n\uf8f0 X\n\n(i,j)\u2208E\n\n\u03c8U\n\nij \u2212 \u03c8L\n\nij\n\n\uf8f9\n\uf8fb .\n\n\u03c8U\nij\n\n5\n\n\fLemma 7 If G has maximum vertex degree D and the DECOMP(G) produces B that is (\u03b4, \u2206)-\ndecomposition, then\n\nEhH(x\u2217) \u2212 H(cx\u2217)i \u2264 \u03b4(D + 1)H(x\u2217),\n\nwhere expectation is w.r.t. the randomness in B. Further, MODE takes time O(nD|\u03a3|D\u2206\n\n)+TDECOMP .\n\nAnalysis of MODE for Minor-excluded G : Here, we specialize analysis of MODE for minor\nexclude graph G. For G that exclude minor Kr,r, we use algorithm DeC(G, r, \u2206). Now, we state\nthe main result for MAP computation.\nTheorem 2 Let G exclude Kr,r as minor and have D as the maximum vertex degree. Given \u03b5 > 0,\nuse MODE algorithm with DeC(G, r, \u2206) where \u2206 = d r(D+1)\n\ne. Then,\n\n\u03b5\n\nFurther, algorithm takes n \u00b7 C(D, |\u03a3|, \u03b5) time, where constant C(D, |\u03a3|, \u03b5) = D|\u03a3|DO(rD/\u03b5)\n\n(1 \u2212 \u03b5)H(x\u2217) \u2264 H(cx\u2217) \u2264 H(x\u2217).\n\n.\n\nWe obtain the following immediate implication of Theorem 2.\nCorollary 3 For any \u03b5 > 0, the MODE algorithm with DeC algorithm for constant degree Planar\n\ngraph G based MRF, produces estimate cx\u2217 so that\n\nin time O(nC(\u03b5)) where log log C(\u03b5) = O(1/\u03b5).\n\n(1 \u2212 \u03b5)H(x\u2217) \u2264 H(cx\u2217) \u2264 H(x\u2217),\n\n5 Experiments\nOur algorithm provides provably good approximation for any MRF with minor excluded graph\nstructure, with planar graph as a special case. In this section, we present experimental evaluation of\nour algorithm for popular synthetic model.\nSetup 1.2 Consider binary (i.e. \u03a3 = {0, 1}) MRF on an n \u00d7 n lattice G = (V, E):\n\nPr(x) \u221d exp\uf8eb\n\uf8edX\n\ni\u2208V\n\n\u03b8ixi + X\n\n(i,j)\u2208E\n\n\u03b8ijxixj\n\n\uf8f6\n\uf8f8 ,\n\nfor x \u2208 {0, 1}n2\n\n.\n\nFigure 2 shows a lattice or grid graph with n = 4 (on the left side). There are two scenarios for\nchoosing parameters (with notation U[a, b] being uniform distribution over interval [a, b]):\n(1) Varying interaction. \u03b8i is chosen independently from distribution U[\u22120.05, 0.05] and \u03b8ij chosen\nindependent from U[\u2212\u03b1, \u03b1] with \u03b1 \u2208 {0.2, 0.4, . . . , 2}.\n(2) Varying \ufb01eld. \u03b8ij is chosen independently from distribution U[\u22120.5, 0.5] and \u03b8i chosen indepen-\ndently from U[\u2212\u03b1, \u03b1] with \u03b1 \u2208 {0.2, 0.4, . . . , 2}.\nThe grid graph is planar. Hence, we run our algorithms LOG PARTITION and MODE, with decom-\nposition scheme DeC(G, 3, \u2206), \u2206 \u2208 {3, 4, 5}. We consider two measures to evaluate performance:\nn2 |H(xalg \u2212 H(x\u2217)|.\nerror in log Z, de\ufb01ned as 1\nWe compare our algorithm for error in log Z with the two recently very successful algorithms \u2013\nTree re-weighted algorithm (TRW) and planar decomposition algorithm (PDC). The comparison is\nplotted in Figure 3 where n = 7 and results are averages over 40 trials. The Figure 3(A) plots\nerror with respect to varying interaction while Figure 3(B) plots error with respect to varying \ufb01eld\nstrength. Our algorithm, essentially outperforms TRW for these values of \u2206 and perform very\ncompetitively with respect to PDC.\n\nn2 | log Zalg \u2212 log Z|; and error in H(x\u2217), de\ufb01ned as 1\n\nThe key feature of our algorithm is scalability. Speci\ufb01cally, running time of our algorithm with a\ngiven parameter value \u2206 scales linearly in n, while keeping the relative error bound exactly the\nsame. To explain this important feature, we plot the theoretically evaluated bound on error in log Z\n\n2Though this setup has \u03c6i, \u03c8ij taking negative values, they are equivalent to the setup considered in the\npaper as the function values are lower bounded and hence af\ufb01ne shift will make them non-negative without\nchanging the distribution.\n\n6\n\n\fin Figure 4 with tags (A), (B) and (C). Note that error bound plot is the same for n = 100 (A) and\nn = 1000 (B). Clearly, actual error is likely to be smaller than these theoretically plotted bounds.\nWe note that these bounds only depend on the interaction strengths and not on the values of \ufb01elds\nstrengths (C).\n\nResults similar to of LOG PARTITION are expected from MODE. We plot the theoretically evaluated\nbounds on the error in MAP in Figure 4 with tags (A), (B) and (C). Again, the bound on MAP relative\nerror for given \u2206 parameter remains the same for all values of n as shown in (A) for n = 100 and\n(B) for n = 1000. There is no change in error bound with respect to the \ufb01eld strength (C).\nSetup 2. Everything is exactly the same as the above setup with the only difference that grid graph\nis replaced by cris-cross graph which is obtained by adding extra four neighboring edges per node\n(exception of boundary nodes). Figure 2 shows cris-cross graph with n = 4 (on the right side).\nWe again run the same algorithm as above setup on this graph. For cris-cross graph, we obtained\nits graph decomposition from the decomposition of its grid sub-graph. graph Though the cris-cross\ngraph is not planar, due to the structure of the cris-cross graph it can be shown (proved) that the\nrunning time of our algorithm will remain the same (in order) and error bound will become only 3\ntimes weaker than that for the grid graph ! We compute these theoretical error bounds for log Z and\nMAP which is plotted in Figure 5. This \ufb01gure is similar to the Figure 4 for grid graph. This clearly\nexhibits the generality of our algorithm even beyond minor excluded graphs.\nReferences\n[1] J. Pearl, \u201cProbabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,\u201d San Francisco,\nCA: Morgan Kaufmann, 1988.\n[2] J. Yedidia, W. Freeman and Y. Weiss, \u201cGeneralized Belief Propagation,\u201d Mitsubishi Elect. Res. Lab., TR-\n2000-26, 2000.\n[3] M. J. Wainwright, T. Jaakkola and A. S. Willsky, \u201cTree-based reparameterization framework for analysis of\nsum-product and related algorithms,\u201d IEEE Trans. on Info. Theory, 2003.\n[4] M. J. Wainwright, T. S. Jaakkola and A. S. Willsky, \u201cMAP estimation via agreement on (hyper)trees:\nMessage-passing and linear-programming approaches,\u201d IEEE Trans. on Info. Theory, 51(11), 2005.\n[5] S. C. Tatikonda and M. I. Jordan, \u201cLoopy Belief Propagation and Gibbs Measure,\u201d Uncertainty in Arti\ufb01cial\nIntelligence, 2002.\n[6] M. Bayati, D. Shah and M. Sharma, \u201cMaximum Weight Matching via Max-Product Belief Propagation,\u201d\nIEEE ISIT, 2005.\n[7] V. Kolmogorov, \u201cConvergent Tree-reweighted Message Passing for Energy Minimization,\u201d IEEE Transac-\ntions on Pattern Analysis and Machine Intelligence, 2006.\n[8] V. Kolmogorov and M. Wainwright, \u201cOn optimality of tree-reweighted max-product message-passing,\u201d\nUncertainty in Arti\ufb01cial Intelligence, 2005.\n[9] A. Globerson and T. Jaakkola, \u201cBound on Partition function through Planar Graph Decomposition,\u201d NIPS,\n2006.\n[10] P. Klein, S. Plotkin and S. Rao, \u201cExcluded minors, network decomposition, and multicommodity \ufb02ow,\u201d\nACM STOC, 1993.\n[11] S. Rao, \u201cSmall distortion and volume preserving embeddings for Planar and Euclidian metrics,\u201d ACM\nSCG, 1999.\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\nGrid\n\n\n\u0001\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\u0001\n\n\u0001\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\n\n\u0001\n\n\u0001\n\nCris\n\nFigure 2: Example of grid graph (left) and cris-cross graph (right) with n = 4.\n\n7\n\n\f(1-A) Grid, N=7\n\n(1-B) Gird, n=7\n\nTRW\n\nPDC\n\n3(cid:32)(cid:39)\n\n4(cid:32)(cid:39)\n\n5(cid:32)(cid:39)\n\n(cid:865)(cid:863)(cid:868)\n\n(cid:865)(cid:863)(cid:867)(cid:870)\n\n(cid:865)(cid:863)(cid:867)\n\n \n\nZ\nE\nr\nr\no\nr\n\n(cid:865)(cid:863)(cid:866)(cid:870)\n\n(cid:865)(cid:863)(cid:866)\n\n(cid:865)(cid:863)(cid:865)(cid:870)\n\n(cid:865)\n\nTRW\n\nPDC\n\n3(cid:32)(cid:39)\n\n4(cid:32)(cid:39)\n\n5(cid:32)(cid:39)\n\n \n\nZ\nE\nr\nr\no\nr\n\n(cid:865)(cid:863)(cid:865)(cid:868)(cid:870)\n\n(cid:865)(cid:863)(cid:865)(cid:868)\n\n(cid:865)(cid:863)(cid:865)(cid:867)(cid:870)\n\n(cid:865)(cid:863)(cid:865)(cid:867)\n\n(cid:865)(cid:863)(cid:865)(cid:866)(cid:870)\n\n(cid:865)(cid:863)(cid:865)(cid:866)\n\n(cid:865)(cid:863)(cid:865)(cid:865)(cid:870)\n\n(cid:865)\n\n(cid:865)(cid:863)(cid:867)\n\n(cid:865)(cid:863)(cid:869)\n\n(cid:865)(cid:863)(cid:871)\n\n(cid:865)(cid:863)(cid:873)\n\n(cid:866)\n\n(cid:866)(cid:863)(cid:867)\n\n(cid:866)(cid:863)(cid:869)\n\n(cid:866)(cid:863)(cid:871)\n\n(cid:866)(cid:863)(cid:873)\n\n(cid:867)\n\n(cid:865)(cid:863)(cid:867)\n\n(cid:865)(cid:863)(cid:869)\n\n(cid:865)(cid:863)(cid:871)\n\n(cid:865)(cid:863)(cid:873)\n\nInteraction Strength\n\n(cid:866)\n\n(cid:866)(cid:863)(cid:867)\nField Strength\n\n(cid:866)(cid:863)(cid:869)\n\n(cid:866)(cid:863)(cid:871)\n\n(cid:866)(cid:863)(cid:873)\n\n(cid:867)\n\nFigure 3: Comparison of TRW, PDC and our algorithm for grid graph with n = 7 with respect to error in log Z. Our algorithm outperforms TRW and is\ncompetitive with respect to PDC.\n\n(2-A) Grid, n=100\n\n(2-B) Grid, n=1000\n\n(2-C) Grid, n=1000\n\n5(cid:32)(cid:39)\n\n10(cid:32)(cid:39)\n\n20(cid:32)(cid:39)\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n(3-A) Grid, n=100\n\nInteraction Strength\n\n(3-B) Grid, n=1000\n\nInteraction Strength\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n5(cid:32)(cid:39)\n\n10(cid:32)(cid:39)\n\n20(cid:32)(cid:39)\n\n0.2\n\n0.4\n\n0.6\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n(3-C) Grid, n=1000\n\nField Strength\n\n1\n\n1.2\n\n0.8\n1.4\nInteraction Strength\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\nInteraction Strength\n\nField Strength\n\nFigure 4: The theoretically computable error bounds for log Z and MAP under our algorithm for grid with n = 100 and n = 1000 under varying\ninteraction and varying \ufb01eld model. This clearly shows scalability of our algorithm.\n\n(4-A) Cris Cross, n=100\n\n5(cid:32)(cid:39)\n\n10(cid:32)(cid:39)\n\n20(cid:32)(cid:39)\n\n(4-B) Cris Cross, n=1000\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n(5-A) Criss Cross, n=100\n\nInteraction Strength\n\n5(cid:32)(cid:39)\n\n10(cid:32)(cid:39)\n\n20(cid:32)(cid:39)\n\n(5-B) Cris Cross, n=1000\n\nInteraction Strength\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n2.5\n\n2\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n1.5\n\n1\n\n0.5\n\n0\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n \n\nZ\nE\n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\nM\nA\nP\nE\n\n \n\nr\nr\no\nr\n \n\nB\no\nu\nn\nd\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n(4-C) Cris Cross, n=1000\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n(5-C) Cris Cross, n=1000\n\n1\n\n1.2\nField Strength\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\n1.2\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\nInteraction Strength\n\nInteraction Strength\n\n1.4\n\n1.6\n\n1.8\n\n2\n\n1\n1.2\nField Strength\n\nFigure 5: The theoretically computable error bounds for log Z and MAP under our algorithm for cris-cross with n = 100 and n = 1000 under varying\ninteraction and varying \ufb01eld model. This clearly shows scalability of our algorithm and robustness to graph structure.\n\n8\n\n\f", "award": [], "sourceid": 781, "authors": [{"given_name": "Kyomin", "family_name": "Jung", "institution": null}, {"given_name": "Devavrat", "family_name": "Shah", "institution": null}]}