{"title": "Semidefinite Relaxations for Approximate Inference on Graphs with Cycles", "book": "Advances in Neural Information Processing Systems", "page_first": 369, "page_last": 376, "abstract": "", "full_text": "Semide\ufb01nite relaxations for approximate\n\ninference on graphs with cycles\n\nMartin J. Wainwright\n\nElectrical Engineering and Computer Science\n\nUC Berkeley, Berkeley, CA 94720\n\nwainwrig@eecs.berkeley.edu\n\nMichael I. Jordan\n\nComputer Science and Statistics\nUC Berkeley, Berkeley, CA 94720\njordan@cs.berkeley.edu\n\nAbstract\n\nWe present a new method for calculating approximate marginals for\nprobability distributions de\ufb01ned by graphs with cycles, based on a Gaus-\nsian entropy bound combined with a semide\ufb01nite outer bound on the\nmarginal polytope. This combination leads to a log-determinant max-\nimization problem that can be solved by ef\ufb01cient interior point meth-\nods [8]. As with the Bethe approximation and its generalizations [12], the\noptimizing arguments of this problem can be taken as approximations to\nthe exact marginals. In contrast to Bethe/Kikuchi approaches, our vari-\national problem is strictly convex and so has a unique global optimum.\nAn additional desirable feature is that the value of the optimal solution\nis guaranteed to provide an upper bound on the log partition function. In\nexperimental trials, the performance of the log-determinant relaxation is\ncomparable to or better than the sum-product algorithm, and by a sub-\nstantial margin for certain problem classes. Finally, the zero-temperature\nlimit of our log-determinant relaxation recovers a class of well-known\nsemide\ufb01nite relaxations for integer programming [e.g., 3].\n\n1\n\nIntroduction\n\nGiven a probability distribution de\ufb01ned by a graphical model (e.g., Markov random \ufb01eld,\nfactor graph), a key problem is the computation of marginal distributions. Although highly\nef\ufb01cient algorithms exist for trees, exact solutions are prohibitively complex for more gen-\neral graphs of any substantial size. This dif\ufb01culty motivates the use of algorithms for\ncomputing approximations to marginal distributions, a problem to which we refer as ap-\nproximate inference. One widely-used algorithm is the belief propagation or sum-product\nalgorithm. As shown by Yedidia et al. [12], it can be interpreted as a method for attempting\nto solve a variational problem wherein the exact entropy is replaced by the Bethe approxi-\nmation. Moreover, Yedidia et al. proposed extensions to the Bethe approximation based on\nclustering operations.\n\nAn unattractive feature of the Bethe approach and its extensions is that with certain excep-\ntions [e.g., 6], the associated variational problems are typically not convex, thus leading to\nalgorithmic complications, and also raising the possibility of multiple local optima. Sec-\nondly, in contrast to other variational methods (e.g., mean \ufb01eld [4]), the optimal values of\nBethe-type variational problems fail to provide bounds on the log partition function. This\n\n\ffunction arises in various contexts, including approximate parameter estimation and large\ndeviations exponents, so that such bounds are of interest in their own right.\n\nThis paper introduces a new class of variational problems that are both convex and provide\nupper bounds. Our derivation relies on a Gaussian upper bound on the discrete entropy of\na suitably regularized random vector, and a semide\ufb01nite outer bound on the set of valid\nmarginal distributions. The combination leads to a log-determinant maximization problem\nwith a unique optimum that can be found by ef\ufb01cient interior point methods [8]. As with the\nBethe/Kikuchi approximations and sum-product algorithms, the optimizing arguments of\nthis problem can be taken as approximations to the marginal distributions of the underlying\ngraphical model. Moreover, taking the \u201czero-temperature\u201d limit recovers a class of well-\nknown semide\ufb01nite programming relaxations for integer programming problems [e.g., 3].\n\n2 Problem set-up\n\nWe consider an undirected graph G = (V; E) with n = jV j nodes. Associated\nwith each vertex s 2 V is a random variable xs taking values in the discrete space\nX = f0; 1; : : : ; m (cid:0) 1g. We let x = fxs j s 2 V g denote a random vector taking values\nin the Cartesian product space X n. Our analysis makes use of the following exponen-\ntial representation of a graph-structured distribution p(x). For some index set I, we let\n(cid:30) = f(cid:30)(cid:11) j (cid:11) 2 Ig denote a collection of potential functions associated with the cliques of\nG, and let (cid:18) = f(cid:18)(cid:11) j (cid:11) 2 Ig be a vector of parameters associated with these potential\nfunctions. The exponential family determined by (cid:30) is the following collection:\n\np(x; (cid:18)) = exp(cid:8)X(cid:11)\n(cid:8)((cid:18)) = log Xx2X n\n\n(cid:18)(cid:11)(cid:30)(cid:11)(x) (cid:0) (cid:8)((cid:18))(cid:9)\nexp(cid:8)X(cid:11)\n(cid:18)(cid:11)(cid:30)(cid:11)(x)(cid:9):\n\n(1a)\n\n(1b)\n\nHere (cid:8)((cid:18)) is the log partition function that serves to normalize the distribution. In a mini-\nmal representation, the functions f(cid:30)(cid:11)g are af\ufb01nely independent, and d = jIj corresponds\nto the dimension of the family. For example, one minimal representation of a binary-\nvalued random vector on a graph with pairwise cliques is the standard Ising model, in\nwhich (cid:30) = fxs j s 2 V g [ f xsxt j (s; t) 2 Eg. Here the index set I = V [ E,\nand d = n + jEj. In order to incorporate higher order interactions, we simply add higher\ndegree monomials (e.g., xsxtxu for a third order interaction) to the collection of potential\nfunctions. Similar representations exist for discrete processes on alphabets with m > 2\nelements.\n\n2.1 Duality and marginal polytopes\n\nIt is well known that (cid:8) is convex in terms of (cid:18), and strictly so for a minimal representation.\nAccordingly, it is natural to consider its conjugate dual function, which is de\ufb01ned by the\nrelation:\n\n(cid:8)(cid:3)((cid:22)) = sup\n(cid:18)2Rd\n\nfh(cid:22); (cid:18)i (cid:0) (cid:8)((cid:18))g:\n\n(2)\n\nHere the vector of dual variables (cid:22) is the same dimension as exponential parameter (cid:18) (i.e.,\n(cid:22) 2 Rd). It is straightforward to show that the partial derivatives of (cid:8) with respect to\n(cid:18) correspond to cumulants of (cid:30)(x); in particular, the \ufb01rst order derivatives de\ufb01ne mean\nparameters:\n\n@(cid:8)\n@(cid:18)(cid:11)\n\n((cid:18)) = Xx2X n\n\np(x; (cid:18))(cid:30)(cid:11)(x) = E(cid:18)[(cid:30)(cid:11)(x)]:\n\n(3)\n\n\fIn order to compute (cid:8)(cid:3)((cid:22)) for a given (cid:22), we take the derivative with respect to (cid:18) of the\nquantity within curly braces in Eqn. (2). Setting this derivative to zero and making use of\nEqn. (3) yields de\ufb01ning conditions for a vector (cid:18)((cid:22)) attaining the optimum in Eqn. (2):\n\n(4)\nIt can be shown [10] that Eqn. (4) has a solution if and only if (cid:22) belongs to the relative\ninterior of the set:\n\n(cid:18)((cid:22))[(cid:30)(cid:11)(x)]\n\n(cid:22)(cid:11) = E\n\n8 (cid:11) 2 I\n\np(x) (cid:30)(x) = (cid:22) for some p((cid:1))g\n\n(5)\n\nMARG(G; (cid:30)) = f (cid:22) 2 Rd (cid:12)(cid:12) Xx2X n\n\nNote that this set is equivalent to the convex hull of the \ufb01nite collection of vectors\nf(cid:30)(x) j x 2 X ng; consequently, the Minkowski-Weyl theorem [7] guarantees that it can\nbe characterized by a \ufb01nite number of linear inequality constraints. We refer to this set as\nthe marginal polytope1 associated with the graph G and the potentials (cid:30).\nIn order to calculate an explicit form for (cid:8)(cid:3)((cid:22)) for any (cid:22) 2 MARG(G; (cid:30)), we substitute\nthe relation in Eqn. (4) into the de\ufb01nition of (cid:8)(cid:3), thereby obtaining:\n\n(cid:8)(cid:3)((cid:22)) = h(cid:22); (cid:18)((cid:22))i (cid:0) (cid:8)((cid:18)((cid:22))) = Xx2X n\n\np(x; (cid:18)((cid:22))) log p(x; (cid:18)((cid:22))):\n\n(6)\n\nThis relation establishes that for (cid:22) in the relative interior of MARG(G; (cid:30)), the value of the\nconjugate dual (cid:8)(cid:3)((cid:22)) is given by the negative entropy of the distribution p(x; (cid:18)((cid:22))), where\nthe pair (cid:18)((cid:22)) and (cid:22) are dually coupled via Eqn. (4). For (cid:22) =2 cl MARG(G; (cid:30)), it can be\nshown [10] that the value of the dual is +1.\nSince (cid:8) is lower semi-continuous, taking the conjugate twice recovers the original func-\ntion [7]; applying this fact to (cid:8)(cid:3) and (cid:8), we obtain the following relation:\n\n(cid:8)((cid:18)) =\n\n(cid:22)2MARG(G;(cid:30))(cid:8)h(cid:18); (cid:22)i (cid:0) (cid:8)(cid:3)((cid:22))(cid:9):\n\nmax\n\n(7)\n\nMoreover, we are guaranteed that the optimum is attained uniquely at the exact marginals\n(cid:22) = f(cid:22)(cid:11)g of p(x; (cid:18)). This variational formulation plays a central role in our development\nin the sequel.\n\n2.2 Challenges with the variational formulation\n\nThere are two dif\ufb01culties associated with the variational formulation (7). First of all, ob-\nserve that the (negative) entropy (cid:8)(cid:3), as a function of only the mean parameters (cid:22), is im-\nplicitly de\ufb01ned; indeed, it is typically impossible to specify an explicit form for (cid:8)(cid:3). Key\nexceptions are trees and hypertrees, for which the entropy is well-known to decompose into\na sum of local entropies de\ufb01ned by local marginals on the (hyper)edges [1]. Secondly, for\na general graph with cycles, the marginal polytope MARG(G; (cid:30)) is de\ufb01ned by a number\nof inequalities that grows rapidly in graph size [e.g., 2]. Trees and hypertrees again are\nimportant exceptions: in this case, the junction tree theorem [e.g., 1] provides a compact\nrepresentation of the associated marginal polytopes.\n\nThe Bethe approach (and its generalizations) can be understood as consisting of two steps:\n(a) replacing the exact entropy (cid:0)(cid:8)(cid:3) with a tree (or hypertree) approximation; and (b)\nreplacing the marginal polytope MARG(G; (cid:30)) with constraint sets de\ufb01ned by tree (or hy-\npertree) consistency conditions. However, since the (hyper)tree approximations used do\nnot bound the exact entropy, the optimal values of Bethe-type variational problems do not\nprovide a bound on the value of the log partition function (cid:8)((cid:18)). Requirements for bound-\ning (cid:8) are both an outer bound on the marginal polytope, as well as an upper bound on the\nentropy (cid:0)(cid:8)(cid:3).\n\n1When (cid:30)(cid:11) corresponds to an indicator function, then (cid:22)(cid:11) is a marginal probability; otherwise, this\n\nchoice entails a minor abuse of terminology.\n\n\f3 Log-determinant relaxation\n\nIn this section, we state and prove a set of upper bounds based on the solution of a varia-\ntional problem involving determinant maximization and semide\ufb01nite constraints. Although\nthe ideas and methods described here are more generally applicable, for the sake of clarity\nin exposition we focus here on the case of a binary vector x 2 f(cid:0)1; +1gn of \u201cspins\u201d.\nIt is also convenient to de\ufb01ne all problems with respect to the complete graph Kn (i.e.,\nfully connected). We use the standard (minimal) Ising representation for a binary problem,\nin terms of the potential functions (cid:30) = fxs j s 2 V g [ fxsxt j (s; t)g. On the complete\n\ngraph, there are d = n +(cid:0)n\n\nbe embedded into the complete graph by setting to zero a subset of the f(cid:18)stg parameters.\n(In particular, for a graph G = (V; E), we simply set (cid:18)st = 0 for all pairs (s; t) =2 E.)\n\n2(cid:1) such potential functions in total. Of course, any problem can\n\n3.1 Outer bounds on the marginal polytope\n\nWe \ufb01rst focus on the marginal polytope MARG(Kn) (cid:17) MARG(Kn; (cid:30)) of valid dual\nvariables f(cid:22)s; (cid:22)stg, as de\ufb01ned in Eqn. (5). In this section, we describe a set of semide\ufb01nite\nand linear constraints that any valid dual vector (cid:22) 2 MARG(Kn) must satisfy.\n\n3.1.1 Semide\ufb01nite constraints\n\nGiven an arbitrary vector (cid:22) 2 Rd, consider the following (n + 1) (cid:2) (n + 1) matrix:\n\nM1[(cid:22)]\n\n:=\n\n2\n\n66666664\n\n1\n(cid:22)1\n(cid:22)2\n...\n(cid:22)n(cid:0)1\n(cid:22)n\n\n(cid:22)2\n(cid:22)1\n(cid:22)12\n1\n1\n(cid:22)21\n...\n...\n...\n...\n(cid:22)n1 (cid:22)n2\n\n(cid:22)n(cid:0)1\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n...\n...\n\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n(cid:1) (cid:1) (cid:1)\n...\n...\n(cid:1) (cid:1) (cid:1) (cid:22)(n(cid:0)1);n\n\n(cid:22)n\n(cid:22)1n\n(cid:22)2n\n...\n\n(cid:22)n;(n(cid:0)1)\n\n1\n\n3\n\n77777775\n\n(8)\n\n(9)\n\nThe motivation underlying this de\ufb01nition is the following: suppose that the given dual vec-\ntor (cid:22) actually belongs to MARG(Kn), in which case there exists some distribution p(x; (cid:18))\n\nsuch that (cid:22)s = Px p(x; (cid:18)) xs and (cid:22)st = Px p(x; (cid:18)) xsxt. Thus, if (cid:22) 2 MARG(Kn),\n\nthe matrix M1[(cid:22)] can be interpreted as the matrix of second order moments for the vector\n(1; x), as computed under p(x; (cid:18)). (Note in particular that the diagonal elements are all\none, since x2\ns = 1 when xs 2 f(cid:0)1; +1g.) Since any such moment matrix must be positive\nsemide\ufb01nite,2 we have established the following:\nLemma 1 (Semide\ufb01nite outer bound). The binary marginal polytope MARG(Kn) is\ncontained within the semide\ufb01nite constraint set:\n\nSDEF1\n\n:= (cid:8) (cid:22) 2 Rd (cid:12)(cid:12) M1[(cid:22)] (cid:23) 0(cid:9)\n\nThis semide\ufb01nite relaxation can be further strengthened by including higher order terms in\nthe moment matrices [5].\n\n3.1.2 Additional linear constraints\n\nIt is straightforward to augment these semide\ufb01nite constraints with additional linear con-\nstraints. Here we focus in particular on two classes of constraints, referred to as rooted and\nunrooted triangle inequalities by Deza and Laurent [2], that are of especial relevance in the\ngraphical model setting.\n\n2To be explicit, letting z = (1; x), then for any vector a 2 Rn+1, we have aT M1[(cid:22)]a =\n\naT E[zz\n\nT ]a = E[kaT\n\nzk2], which is certainly non-negative.\n\n\fPairwise edge constraints:\nIt is natural to require that the subset of mean parameters\nassociated with each pair of random variables (xs; xt) \u2014 namely, (cid:22)s, (cid:22)t and (cid:22)st \u2014 specify\na valid pairwise marginal distribution. Letting fa; bg take values in f(cid:0)1; +1g2, consider\nthe set of four linear constraints of the following form:\n\n1 + a (cid:22)s + b (cid:22)t + ab (cid:22)st (cid:21) 0:\n\n(10)\n\nIt can be shown [11, 10] that these constraints are necessary and suf\ufb01cient to guarantee the\nexistence of a consistent pairwise marginal. By the junction tree theorem [1], this pairwise\nconsistency guarantees that the constraints of Eqn. (10) provide a complete description\nof the binary marginal polytope for any tree-structured graph. Moreover, for a general\ngraph with cycles, they are equivalent to the tree-consistent constraint set used in the Bethe\napproach [12] when applied to a binary vector x 2 f(cid:0)1; +1gn.\n\nTriplet constraints: Local consistency can be extended to triplets fxs; xt; xug, and\neven more generally to higher order subsets. For the triplet case, consider the follow-\ning set of constraints (and permutations thereof) among the pairwise mean parameters\nf(cid:22)st; (cid:22)su; (cid:22)tug:\n\n(cid:22)st + (cid:22)su + (cid:22)tu (cid:21) (cid:0)1;\n\n(cid:22)st (cid:0) (cid:22)su (cid:0) (cid:22)tu (cid:21) (cid:0)1:\n\n(11)\n\nIt can be shown [11, 10] that these constraints, in conjunction with the pairwise con-\nstraints (10), are necessary and suf\ufb01cient to ensure that the collection of mean parameters\nf(cid:22)s; (cid:22)t; (cid:22)u; (cid:22)st; (cid:22)su; (cid:22)tug uniquely determine a valid marginal over the triplet (xs; xt; xu).\nOnce again, by applying the junction tree theorem [1], we conclude that the constraints (10)\nand (11) provide a complete characterization of the binary marginal polytope for hypertrees\nof width two. It is worthwhile observing that this set of constraints is equivalent to those\nthat are implicitly enforced by any Kikuchi approximation [12] with clusters of size three\n(when applied to a binary problem).\n\n3.2 Gaussian entropy bound\n\nWe now turn to the task of upper bounding the entropy. Our starting point is the familiar\ninterpretation of the Gaussian as the maximum entropy distribution subject to covariance\nconstraints:\n\nLemma 2. The (differential) entropy h(ex) := (cid:0)R p(ex) log p(ex)dex is upper bounded by\n\n2 log(2(cid:25)e) of a Gaussian with matched covariance.\n\nthe entropy 1\n\n2 log det cov(ex) + n\n\nOf interest to us is the discrete entropy of a discrete-valued random vector x 2 f(cid:0)1; +1gn,\nwhereas the Gaussian bound of Lemma 2 applies to the differential entropy of a continuous-\nvalued random vector. Therefore, we need to convert our discrete vector to the continuous\nx + u, where\nu is a random vector independent of x, with each element independently and identically\ndistributed3 as us (cid:24) U[(cid:0) 1\n2 ; 1\n2 is to pack the boxes\ntogether as tightly as possible.\n\nspace. In order to do so, we de\ufb01ne a new continuous random vector viaex = 1\nLemma 3. We have h(ex) = H(x), where h and H denote the differential and discrete\nentropies ofex and x respectively.\n\n2 ]. The motivation for rescaling x by 1\n\nProof. By construction, the differential entropy can be decomposed as a sum of integrals\nover hyperboxes of unit volume, one for each con\ufb01guration, over which the probability\n\n2\n\ndensity ofex is constant.\n\n3The notation U[a; b] denotes the uniform distribution on the interval [a; b].\n\n\f3.3 Log-determinant relaxation\n\nEquipped with these building blocks, we are now ready to state and prove a log-determinant\nrelaxation for the log partition function.\nTheorem 1. Let x 2 f(cid:0)1; +1gn, and let OUT(Kn) be any convex outer bound on\nMARG(Kn) that is contained within SDEF1. Then there holds\n\n(cid:8)((cid:18)) (cid:20)\n\n(cid:22)2OUT(Kn)(cid:26)h(cid:18); (cid:22)i +\n\nmax\n\n1\n2\n\nlog det(cid:2)M1((cid:22)) +\n\n1\n3\n\nblkdiag[0; In](cid:3)(cid:27) +\n\nn\n2\n\nlog(\n\n(cid:25)e\n2\n\n)\n\n(12)\nwhere blkdiag[0; In] is an (n+1)(cid:2)(n+1) block-diagonal matrix. Moreover, the optimum\n\nProof. For any (cid:22) 2 MARG(Kn), let x be a random vector with these mean parameters.\nx + u. From Lemma 3, we have\n\nis attained at a uniqueb(cid:22) 2 OUT(Kn).\nConsider the continuous-valued random vector ex = 1\nH(x) = h(ex); combining this equality with Lemma 2, we obtain the upper bound H(x) (cid:20)\n2 log det cov(ex) + n\ncan write cov(ex) = 1\n\n2 log(2(cid:25)e). Since x and u are independent and u (cid:24) U[(cid:0)1=2; 1=2], we\n4 cov(x) + 1\n12 In. Next we use the Schur complement formula [8] to\n\nexpress the log determinant as follows:\n\n1\n\n2\n\nCombining Eqn. (13) with the Gaussian upper bound leads to the following expression:\n\nlog det cov(ex) = log det(cid:8)M1[(cid:22)] +\nlog det(cid:0)M1[(cid:22)] +\n\n1\n2\n\n1\n3\n\nblkdiag[0; In](cid:9) + n log\nblkdiag[0; In](cid:1) +\n\nn\n2\n\n1\n3\n\nH(x) = (cid:0)(cid:8)(cid:3)((cid:22)) (cid:20)\n\n1\n4\n\nSubstituting this upper bound into the variational representation of Eqn. (7) and using the\nfact that OUT(Kn) is an outer bound on MARG(G) yields Eqn. (12). By construction,\nthe cost function is strictly convex so that the optimum is unique.\n\nlog(\n\n(cid:25)e\n2\n\n)\n\n:\n\n(13)\n\nThe inclusion OUT(Kn) (cid:18) SDEF1 in the statement of Theorem 1 guarantees that the\nmatrix M1((cid:22)) will always be positive semide\ufb01nite. Importantly, the optimization prob-\nlem in Eqn. (12) is a determinant maximization problem, for which ef\ufb01cient interior point\nmethods have been developed [e.g., 8].\n\n4 Experimental results\n\nThe relevance of the log-determinant relaxation for applications is two-fold: it provides\n\nupper bounds on the log partition function, and the maximizing argumentsb(cid:22) 2 OUT(Kn)\n\nof Eqn. (12) can be taken as approximations to the exact marginals of the distribution\np(x; (cid:18)). So as to test its performance in computing approximate marginals, we per-\nformed extensive experiments on the complete graph (fully connected) and the 2-D nearest-\nneighbor lattice model. We treated relatively small problems with 16 nodes so as to en-\nable comparison to the exact answer. For any given trial, we speci\ufb01ed the distribution\np(x; (cid:18)) by randomly choosing (cid:18) as follows. The single node parameters were chosen\nas (cid:18)s (cid:24) U[(cid:0)0:25; 0:25] independently4 for each node. For a given coupling strength\ndcoup > 0, we investigated three possible types of coupling: (a) for repulsive interac-\ntions, (cid:18)st (cid:24) U[(cid:0)2dcoup; 0]; (b) for mixed interactions, (cid:18)st (cid:24) U[(cid:0)dcoup; +dcoup]; (c) for\nattractive interactions, (cid:18)st (cid:24) U[0; 2dcoup].\nFor each distribution p(x; (cid:18)), we performed the following computations: (a) the exact\nmarginal probability p(xs = 1; (cid:18)) at each node; and (b) approximate marginals computed\n\n4Here U[a; b] denotes the uniform distribution on [a; b].\n\n\ffrom the Bethe approximation with the sum-product algorithm, or (c) log-determinant\napproximate marginals from Theorem 1 using the outer bound OUT(Kn) given by the\n\ufb01rst semide\ufb01nite relaxation SDEF1 in conjunction with the pairwise linear constraints in\nEqn. (10). We computed the exact marginal values either by exhaustive summation (com-\nplete graph), or by the junction tree algorithm (lattices). We used the standard parallel\nmessage-passing form of the sum-product algorithm with a damping factor5 (cid:13) = 0:05.\nThe log-determinant problem of Theorem 1 was solved using interior point methods [8].\nFor each graph (fully connected or grid), we examined a total of 6 conditions: 2 different\npotential strengths (weak or strong) for each of the 3 types of coupling (attractive, mixed,\nand repulsive). We computed the \u20181-error 1\napproximate marginal computed either by SP or by LD.\n\ns=1 jp(xs = 1; (cid:18)) (cid:0)b(cid:22)sj, where b(cid:22)s was the\n\nnPn\n\nProblem type\n\nMethod\n\nGraph Coupling\n\nStrength\n\nMedian\n\nRange\n\nSum-product\n\nLog-determinant\nRange\n\nMedian\n\nFull\n\nGrid\n\nR\nR\nM(cid:3)\nM\nA(cid:3)\nA\n\nR\nR\nM(cid:3)\nM\nA\nA\n\n(0:25; 0:25)\n\n(0:25; 0:50)\n\n(0:25; 0:25)\n\n(0:25; 0:50)\n\n(0:25; 0:06)\n\n(0:25; 0:12)\n\n(0:25; 1:0)\n\n(0:25; 2:0)\n\n(0:25; 1:0)\n\n(0:25; 2:0)\n\n(0:25; 1:0)\n\n(0:25; 2:0)\n\n0:035\n\n0:066\n\n0:003\n\n0:035\n\n0:021\n\n0:422\n\n0:285\n\n0:342\n\n0:008\n\n0:053\n\n0:404\n\n0:550\n\n[0:01; 0:10]\n\n[0:03; 0:20]\n\n[0:00; 0:04]\n\n[0:01; 0:31]\n\n[0:00; 0:08]\n\n[0:08; 0:86]\n\n[0:04; 0:59]\n\n[0:04; 0:78]\n\n[0:00; 0:20]\n\n[0:01; 0:54]\n\n[0:06; 0:90]\n\n[0:06; 0:94]\n\n0:020\n\n0:017\n\n0:019\n\n0:010\n\n0:026\n\n0:023\n\n0:041\n\n0:033\n\n0:016\n\n0:032\n\n0:037\n\n0:031\n\n[0:01; 0:03]\n\n[0:01; 0:04]\n\n[0:01; 0:03]\n\n[0:01; 0:06]\n\n[0:01; 0:06]\n\n[0:01; 0:09]\n\n[0:01; 0:12]\n\n[0:00; 0:12]\n\n[0:01; 0:02]\n\n[0:01; 0:11]\n\n[0:01; 0:13]\n\n[0:00; 0:12]\n\nTable 1. Statistics of the \u20181-approximation error for the sum-product (SP) and log-\ndeterminant (LD) methods for the fully connected graph K16, as well as the 4-nearest neigh-\nbor grid with 16 nodes, with varying coupling and potential strengths.\n\nTable 1 shows quantitative results for 100 trials performed in each of the 12 experimental\nconditions, including only those trials for which SP converged. The potential strength is\ngiven as the pair (dobs; dcoup); note that dobs = 0:25 in all trials. For each method, we show\nthe sample median, and the range [min, max] of the errors. Overall, the performance of\nLD is better than that of SP , and often substantially so. The performance of SP is slightly\nbetter in the regime of weak coupling and relatively strong observations ((cid:18)s values); see the\nentries marked with (cid:3) in the table. In the remaining cases, the LD method outperforms SP,\nand with a large margin for many examples with strong coupling. The two methods also\ndiffer substantially in the ranges of the approximation error. The SP method exhibits some\ninstability, with the error for certain problems being larger than 0:5; for the same problems,\nthe LD error ranges are much smaller, with a worst case maximum error over all trials and\nconditions of 0:13. In addition, the behavior of SP can change dramatically between the\nweakly coupled and strongly coupled conditions, whereas the LD results remain stable.\n\n5More precisely, we updated messages in the log domain as (cid:13) log M new\n\nst .\nst + (1 (cid:0) (cid:13)) log M old\n\n\f5 Discussion\n\nIn this paper, we developed a new method for approximate inference based on the combina-\ntion of a Gaussian entropy bound with semide\ufb01nite constraints on the marginal polytope.\nThe resultant log-determinant maximization problem can be solved by ef\ufb01cient interior\npoint methods [8].\nIn experimental trials, the log-determinant method was either com-\nparable or better than the sum-product algorithm, and by a substantial margin for certain\nproblem classes. Of particular interest is that, in contrast to the sum-product algorithm,\nthe performance degrades gracefully as the interaction strength is increased.\nIt can be\nshown [11, 10] that in the zero-temperature limit, the log-determinant relaxation (12) re-\nduces to a class of semide\ufb01nite relaxations that are widely used in combinatorial opti-\nmization. One open question is whether techniques for bounding the performance of such\nsemide\ufb01nite relaxations [e.g., 3] can be adapted to the \ufb01nite temperature case.\n\nAlthough this paper focused exclusively on the binary problem, the methods described here\ncan be extended to other classes of random variables. It remains to develop a deeper un-\nderstanding of the interaction between the two components to these approximations (i.e.,\nthe entropy bound, and the outer bound on the marginal polytope), as well as how to tai-\nlor approximations to particular graph structures. Finally, semide\ufb01nite constraints can be\ncombined with entropy approximations (preferably convex) other than the Gaussian bound\nused in this paper, among them \u201cconvexi\ufb01ed\u201d Bethe/Kikuchi entropy approximations [9].\n\nAcknowledgements: Thanks to Constantine Caramanis and Laurent El Ghaoui for helpful dis-\ncussions. Work funded by NSF grant IIS-9988642, ARO MURI DAA19-02-1-0383, and a grant from\nIntel Corporation.\n\nReferences\n[1] R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic networks and\n\nexpert systems. Statistics for Engineering and Information Science. Springer-Verlag, 1999.\n\n[2] M. Deza and M. Laurent. Geometry of cuts and metric embeddings. Springer-Verlag, New\n\nYork, 1997.\n\n[3] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut\nJournal of the ACM, 42:1115\u2013\n\nand satis\ufb01ability problems using semide\ufb01nite programming.\n1145, 1995.\n\n[4] M. Jordan, editor. Learning in graphical models. MIT Press, Cambridge, MA, 1999.\n[5] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM\n\nJournal on Optimization, 11(3):796\u2013817, 2001.\n\n[6] R. J. McEliece and M. Yildirim. Belief propagation on partially ordered sets. In D. Gilliam and\nJ. Rosenthal, editors, Mathematical Theory of Systems and Networks. Institute for Mathematics\nand its Applications, 2002.\n\n[7] G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.\n[8] L. Vandenberghe, S. Boyd, and S. Wu. Determinant maximization with linear matrix inequality\n\nconstraints. SIAM Journal on Matrix Analysis and Applications, 19:499\u2013533, 1998.\n\n[9] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log\npartition function. In Uncertainty in Arti\ufb01cial Intelligence, volume 18, pages 536\u2013543, August\n2002.\n\n[10] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational\n\ninference. Technical report, UC Berkeley, Department of Statistics, No. 649, 2003.\n\n[11] M. J. Wainwright and M. I. Jordan. Semide\ufb01nite relaxations for approximate inference on\n\ngraphs with cycles. Technical report, UC Berkeley, UCB/CSD-3-1226, January 2003.\n\n[12] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its general-\n\nizations. Technical Report TR2001-22, Mitsubishi Electric Research Labs, January 2002.\n\n\f", "award": [], "sourceid": 2438, "authors": [{"given_name": "Michael", "family_name": "Jordan", "institution": null}, {"given_name": "Martin", "family_name": "Wainwright", "institution": null}]}