{"title": "Exactness of Approximate MAP Inference in Continuous MRFs", "book": "Advances in Neural Information Processing Systems", "page_first": 2332, "page_last": 2340, "abstract": "Computing the MAP assignment in graphical models is generally intractable. As a result, for discrete graphical models, the MAP problem is often approximated using linear programming relaxations. Much research has focused on characterizing when these LP relaxations are tight, and while they are relatively well-understood in the discrete case, only a few results are known for their continuous analog. In this work, we use graph covers to provide necessary and sufficient conditions for continuous MAP relaxations to be tight. We use this characterization to give simple proofs that the relaxation is tight for log-concave decomposable and log-supermodular decomposable models. We conclude by exploring the relationship between these two seemingly distinct classes of functions and providing specific conditions under which the MAP relaxation can and cannot be tight.", "full_text": "Exactness of Approximate MAP Inference in\n\nContinuous MRFs\n\nDepartment of Computer Science\n\nUniversity of Texas at Dallas\n\nNicholas Ruozzi\n\nRichardson, TX 75080\n\nAbstract\n\nComputing the MAP assignment in graphical models is generally intractable. As a\nresult, for discrete graphical models, the MAP problem is often approximated us-\ning linear programming relaxations. Much research has focused on characterizing\nwhen these LP relaxations are tight, and while they are relatively well-understood\nin the discrete case, only a few results are known for their continuous analog.\nIn this work, we use graph covers to provide necessary and suf\ufb01cient conditions\nfor continuous MAP relaxations to be tight. We use this characterization to give\nsimple proofs that the relaxation is tight for log-concave decomposable and log-\nsupermodular decomposable models. We conclude by exploring the relationship\nbetween these two seemingly distinct classes of functions and providing speci\ufb01c\nconditions under which the MAP relaxation can and cannot be tight.\n\n1\n\nIntroduction\n\nGraphical models are a popular modeling tool for both discrete and continuous distributions. We are\ncommonly interested in one of two inference tasks in graphical models: \ufb01nding the most probable\nassignment (a.k.a., MAP inference) and computing marginal distributions. These problems are NP-\nhard in general, and a variety of approximate inference schemes are used in practice.\nIn this work, we will focus on approximate MAP inference. For discrete state spaces, linear pro-\ngramming relaxations of the MAP problem (speci\ufb01cally, the MAP LP) are quite common [1; 2; 3].\nThese relaxations replace global marginalization constraints with a collection of local marginaliza-\ntion constraints. Wald and Globerson [4] refer to these as local consistency relaxations (LCRs). The\nadvantage of LCRs is that they are often much easier to specify and to optimize over (e.g., by using\na message-passing algorithm such as loopy belief propagation (LBP)). However, the analogous re-\nlaxations for continuous state spaces may not be compactly speci\ufb01ed and can lead to an unbounded\nnumber of constraints (except in certain special cases). To overcome this problem, further relax-\nations have been proposed [5; 4]. By construction, each of these further relaxations can only be tight\nif the initial LCR was tight. As a result, there are compelling theoretical and algorithmic reasons to\ninvestigate when LCRs are tight.\nAmong the most well-studied continuous models are the Gaussian graphical models. For this class\nof models, it is known that the continuous MAP relaxation is tight when the corresponding inverse\ncovariance matrix is positive de\ufb01nite and scaled diagonally dominant (a special case of the so-called\nlog-concave decomposable models)[4; 6; 7]. In addition, LBP is known to converge to the correct\nsolution for Gaussian graphical models and log-concave decomposable models that satisfy a scaled\ndiagonal dominance condition [8; 9]. While much of the prior work in this domain has focused on\nlog-concave graphical models, in this work, we provide a general necessary and suf\ufb01cient condition\nfor the continuous MAP relaxation to be tight. This condition mirrors the known results for the\ndiscrete case and is based on the notion of graph covers: the MAP LP is tight if and only if the\n\n1\n\n\foptimal solution to the MAP problem is an upper bound on the MAP solution over any graph cover,\nappropriately scaled. This characterization will allow us to understand when the MAP relaxation is\ntight for more general models.\nApart from this characterization theorem, the primary goal of this work is to move towards a uni-\nform treatment of the discrete and continuous cases; they are not as different as they may initially\nappear. To this end, we explore the relationship between log-concave decomposable models and log-\nsupermodular decomposable models (introduced here in the continuous case). Log-supermodular\nmodels provide an example of continuous graphical models for which the MAP relaxation is tight,\nbut the objective function is not necessarily log-concave. These two concepts have analogs in dis-\ncrete state spaces. In particular, log-concave decomposability is related to log-concave closures of\ndiscrete functions and log-supermodular decomposability is a known condition which guarantees\nthat the MAP LP is exact in the discrete setting. We prove a number of results that highlight the\nsimilarities and differences between these two concepts as well as a general condition under which\nthe MAP relaxation corresponding to a pairwise twice continuously differentiable model cannot be\ntight.\n\n2 Prerequisites\nLet f : X n \u2192 R\u22650 be a non-negative function where X is the set of possible assignments of each\nvariable. A function f factors with respect to a hypergraph G = (V,A), if there exist potential\nfunctions fi : X \u2192 R\u22650 for each i \u2208 V and f\u03b1 : X |\u03b1| \u2192 R\u22650 for each \u03b1 \u2208 A such that\n\nf (x1, . . . , xn) =\n\nfi(xi)\n\nf\u03b1(x\u03b1).\n\n(cid:89)\n\ni\u2208V\n\n(cid:89)\n\n\u03b1\u2208A\n\nThe hypergraph G together with the potential functions fi\u2208V and f\u03b1\u2208A de\ufb01ne a graphical model.\nIn general, this MAP inference task is NP-hard,\nWe are interested computing supx\u2208X n f G(x).\nbut in practice, local message-passing algorithms based on approximations from statistical physics,\nsuch as LBP, produce reasonable estimates in many settings. Much effort has been invested into\nunderstanding when LBP solves the MAP problem. In this section, we brie\ufb02y review approximate\nMAP inference in the discrete setting (i.e., when X is a \ufb01nite set). For simplicity and consistency, we\nwill focus on log-linear models as in [4]. Given a vector of suf\ufb01cient statistics \u03c6i(xi) \u2208 Rk for each\ni \u2208 V and xi \u2208 X and a parameter vector \u03b8i \u2208 Rk, we will assume that fi(xi) = exp ((cid:104)\u03b8i, \u03c6i(xi)(cid:105)) .\nSimilarly, given a vector of suf\ufb01cient statistics \u03c6\u03b1(x\u03b1) for each \u03b1 \u2208 A and x\u03b1 \u2208 X |\u03b1| and a\nparameter vector \u03b8\u03b1, we will assume that f\u03b1(x\u03b1) = exp ((cid:104)\u03b8\u03b1, \u03c6\u03b1(x\u03b1)(cid:105)) . We will write \u03c6(x) to\nrepresent the concatenation of the individual suf\ufb01cient statistics and \u03b8 to represent the concatenation\nof the parameters. The objective function can then be expressed as f G(x) = exp ((cid:104)\u03b8, \u03c6(x)(cid:105)) .\n\n2.1 The MAP LP relaxation\n\nThe MAP problem can be formulated in terms of mean parameters [10].\n\nsup\nx\u2208X n\n\nlog f (x) = sup\n\u00b5\u2208M\n\n(cid:104)\u03b8, \u00b5(cid:105)\n\nM = {\u00b5 \u2208 Rm : \u2203\u03c4 \u2208 \u2206 s.t. E\u03c4 [\u03c6(x)] = \u00b5}\n\nwhere \u2206 is the space of all densities over X n and M is the set of all realizable mean parameters.\nIn general, M is a dif\ufb01cult object to compactly describe and to optimize over. As a result, one\ntypically constructs convex outerbounds on M that are more manageable. In the case that X is \ufb01nite,\none such outerbound is given by the MAP LP. For each i \u2208 V and k \u2208 X , de\ufb01ne \u03c6i(xi)k (cid:44) 1{xi=k}.\nSimilarly, for each \u03b1 \u2208 A and k \u2208 X |\u03b1|, de\ufb01ne \u03c6\u03b1(x\u03b1)k (cid:44) 1{x\u03b1=k}. With this choice of suf\ufb01cient\nstatistics, M is equivalent to the set of all marginal distributions over the individual variables and\nelements of A that arise from some joint probability distribution. The MAP LP is obtained by\nreplacing M with a relaxation that only enforces local consistency constraints.\n\n(cid:80)\nThe set of constraints, ML, is known as the local marginal polytope. The approximate MAP prob-\nlem is then to compute max\u00b5\u2208ML(cid:104)\u03b8, \u00b5(cid:105).\n\nfor all \u03b1 \u2208 A, i \u2208 \u03b1, xi \u2208 X\n\nx\u03b1\\{i} \u00b5\u03b1(x\u03b1) = \u00b5i(xi),\n\nfor all i \u2208 V\n\n\u00b5i(xi) = 1,\n\nxi\n\nML =\n\n\u00b5 \u2265 0 :\n\n(cid:80)\n\n(cid:40)\n\n(cid:41)\n\n2\n\n\f1, 2, 3\n\n1, 4\n\n2, 3, 4\n\n1, 2, 3\n\n1, 4\n\n2, 3, 4\n\n2, 3, 4\n\n1, 4\n\n1, 2, 3\n\n1\n\n2\n\n3\n\n4\n\n2\n\n3\n\n1\n\n4\n\n4\n\n1\n\n3\n\n2\n\n(a) A hypergraph graph, G.\n\n(b) One possible 2-cover of G.\n\nFigure 1: An example of a graph cover of a factor graph. The nodes in the cover are labeled for the\nnode that they copy in the base graph.\n\n2.2 Graph covers\n\nIn this work, we are interested in understanding when this relaxation is tight (i.e., when does\nsup\u00b5\u2208ML(cid:104)\u03b8, \u00b5(cid:105) = supx\u2208X n log f (x)). For discrete MRFs, the MAP LP is known to be tight in\na variety of different settings [11; 12; 13; 14]. Two different theoretical tools are often used to inves-\ntigate the tightness of the MAP LP: duality and graph covers. Duality has been particularly useful in\nthe design of convergent and correct message-passing schemes that solve the MAP LP [1; 15; 2; 16].\nGraph covers provide a theoretical framework for understanding when and why message-passing al-\ngorithms such as belief propagation fail to solve the MAP problem [17; 18; 3].\nDe\ufb01nition 2.1. A graph H covers a graph G = (V, E) if there exists a graph homomorphism\nh : H \u2192 G such that for all vertices i \u2208 G and all j \u2208 h\u22121(i), h maps the neighborhood \u2202j of j in\nH bijectively to the neighborhood \u2202i of i in G.\n\nIf a graph H covers a graph G, then H looks locally the same as G. In particular, local message-\npassing algorithms such as LBP have dif\ufb01culty distinguishing a graph and its covers. If h(j) = i,\nthen we say that j \u2208 H is a copy of i \u2208 G. Further, H is said to be an M-cover of G if every vertex\nof G has exactly M copies in H.\nThis de\ufb01nition can be easily extended to hypergraphs. Each hypergraph G can be represented in\nfactor graph form: create a node in the factor graph for each vertex (called variable nodes) and each\nhyperedge (called factor nodes) of G. Each factor node is connected via an edge in the factor graph\nto the variable nodes on which the corresponding hyperedge depends. For an example of a 2-cover,\nsee Figure 1.\nTo any M-cover H = (V H ,AH ) of G given by the homomorphism h, we can associate a collection\nthe potential at node i \u2208 V H is equal to fh(i), the potential at node h(i) \u2208 G,\nof potentials:\nand for each \u03b2 \u2208 AH, we associate the potential fh(\u03b2). In this way, we can construct a function\nf H : X M|V | \u2192 R\u22650 such that f H factorizes over H. We will say that the graphical model H is an\nM-cover of the graphical model G whenever H is an M-cover of G and f H is chosen as described\nis the mth\nabove. It will be convenient in the sequel to write f H (xH ) = f H (x1, . . . , xM ) where xm\ncopy of variable i \u2208 V .\ni\nThere is a direct correspondence between \u00b5 \u2208 ML and assignments on graph covers. This corre-\nspondence is the basis of the following theorem.\nTheorem 2.2 (Ruozzi and Tatikonda 3).\n\nsup\n\u00b5\u2208ML\n\n(cid:104)\u03b8, \u00b5(cid:105) = sup\n\nM\n\nsup\n\nH\u2208CM (G)\n\nsup\nxH\n\n1\nM\n\nlog f H (xH )\n\nwhere CM (G) is the set of all M-covers of G.\n\nTheorem 2.2 claims that the optimal value of the MAP LP is equal to the supremum over all MAP\nassignments over all graph covers, appropriately scaled. In particular, the proof of this result shows\nthat, under mild conditions, there exists an M-cover H of G and an assignment xH such that\nM log f H (xH ) = sup\u00b5\u2208ML(cid:104)\u03b8, \u00b5(cid:105).\n1\n\n3 Continuous MRFs\n\nIn this section, we will describe how to extend the previous results from discrete to continuous MRFs\n(i.e., X = R) using graph covers. The relaxation that we consider here is the appropriate extension\n\n3\n\n\fof the MAP LP where each of the sums are replaced by integrals [4].\n\n\uf8f1\uf8f4\uf8f4\uf8f2\uf8f4\uf8f4\uf8f3\u00b5 :\n\n(cid:82) \u03c4\u03b1(x\u03b1)dx\u03b1\\i = \u03c4i(xi),\n\n\u2203 densities \u03c4i, \u03c4\u03b1 s.t.\n\u00b5i = E\u03c4i [\u03c6i],\n\u00b5\u03b1 = E\u03c4\u03b1 [\u03c6\u03b1],\n\nML =\n\nfor all \u03b1 \u2208 A, i \u2208 \u03b1, xi \u2208 X\nfor all i \u2208 V\nfor all \u03b1 \u2208 A\n\n\uf8fc\uf8f4\uf8f4\uf8fd\uf8f4\uf8f4\uf8fe\n\nOur goal is to understand under what conditions this continuous relaxation is tight. Wald and Glober-\nson [4] have approached this problem by introducing a further relaxation of ML which they call the\nweak local consistency relaxation (weak LCR). They provide conditions under which the weak LCR\n(and hence the above relaxation) is tight. In particular, they show that weak LCR is tight for the class\nof log-concave decomposable models. In this work, we take a different approach. We \ufb01rst prove\nthe analog of Theorem 2.2 in the continuous case and then we show that the known conditions that\nguarantee tightness of the continuous relaxation are simple consequences of this general theorem.\nTheorem 3.1.\n\n(cid:104)\u03b8, \u00b5(cid:105) = sup\n\nsup\n\u00b5\u2208ML\n\nsup\n\nH\u2208CM (G)\n\nsup\nxH\n\nM\n\n1\nM\n\nlog f H (xH )\n\nwhere CM (G) is the set of all M-covers of G.\n\nThe proof of Theorem 3.1 is conceptually straightforward, albeit technical, and can be found in\nAppendix A. The proof approximates the expectations in ML as expectations with respect to sim-\nple functions, applies the known results for \ufb01nite spaces, and takes the appropriate limit. Like its\ndiscrete counterpart, Theorem 3.1 provides necessary and suf\ufb01cient conditions for the continuous\nrelaxation to be tight. In particular, for the relaxation to be tight, the optimal solution on any M-\ncover, appropriately scaled, cannot exceed the value of the optimal solution of the MAP problem\nover G.\n\n3.1 Tightness of the MAP relaxation\n\nTheorem 3.1 provides necessary and suf\ufb01cient conditions for the tightness of the continuous re-\nlaxation. However, checking that the maximum value attained on any M-cover is bounded by the\nmaximum value over the base graph to the M, in and of itself, appears to be a daunting task. In\nthis section, we describe two families of graphical models for which this condition is easy to ver-\nify: the log-concave decomposable functions and the log-supermodular decomposable functions.\nLog-concave decomposability has been studied before, particularly in the case of Gaussian graphi-\ncal models. Log-supermodularity with respect to graphical models, however, appears to have been\nprimarily studied in the discrete case.\n\n3.1.1 Log-concave decomposability\nA function f : Rn \u2192 R\u22650 is log-concave if f (x)\u03bbf (y)1\u2212\u03bb \u2264 f (\u03bbx + (1 \u2212 \u03bb)y) for all x, y \u2208 Rn\nand all \u03bb \u2208 [0, 1]. If f can be written as a product of log-concave potentials over a hypergraph G,\nwe say that f is log-concave decomposable over G.\nTheorem 3.2. If f is log-concave decomposable, then supx log f (x) = sup\u00b5\u2208ML(cid:104)\u03b8, \u00b5(cid:105).\nProof. By log-concave decomposability, for any M-cover H of G,\n\nf H (x1, . . . , xM ) \u2264 f G\n\n(cid:18) x1 + \u00b7\u00b7\u00b7 + xM\n\n(cid:19)M\n\n,\n\nM\n\nwhich we obtain by applying the de\ufb01nition of log-concavity separately to each of the M copies of\nthe potential functions for each node and factor of G. As a result, supx1,...,xM f H (x1, . . . , xM ) \u2264\nsupx f G(x)M . The proof of the theorem then follows by applying Theorem 3.1.\n\nWald and Globerson [4] provide a different proof of Theorem 3.2 by exploiting duality and the weak\nLCR.\n\n4\n\n\ff H (x1, . . . , xM ) \u2264 M(cid:89)\n(cid:81)M\n\nm=1\n\nf G(zm(x1, . . . , xM )).\n\n3.1.2 Log-supermodular decomposability\n\nLog-supermodular functions have played an important role in the study of discrete graphical models,\nand log-supermodularity arises in a number of classical correlations inequalities (e.g., the FKG\ninequality). For log-supermodular decomposable models, the MAP LP is tight and the MAP problem\ncan be solved exactly in polynomial time [19; 20]. In the continuous case, log-supermodularity is\nde\ufb01ned analogously to the discrete case. That is, f : Rn \u2192 R\u22650 is log-supermodular if f (x)f (y) \u2264\nf (x \u2227 y)f (x \u2228 y) for all x, y \u2208 Rn, where x \u2228 y is the componentwise maximum of the vectors\nx and y and x \u2227 y is the componentwise minimum. Continuous log-supermodular functions are\nsometimes said to be multivariate totally positive of order two [21]. We will say that a graphical\nmodel is log-supermodular decomposable if f can be factorized as a product of log-supermodular\npotentials.\nFor any collection of vectors x1, . . . , xk \u2208 Rn, let zi(x1, . . . , xk) be the vector whose jth compo-\nnent is the ith largest element of x1\nTheorem 3.3. If f is log-supermodular decomposable, then supx log f (x) = sup\u00b5\u2208ML(cid:104)\u03b8, \u00b5(cid:105).\nProof. By log-supermodular decomposability, for any M-cover H of G,\n\nj for each j \u2208 {1, . . . , n}.\n\nj , . . . , xk\n\nAgain, this follows by repeatedly applying the de\ufb01nition of log-supermodularity separately to\neach of the M copies of the potential functions for each node and factor of G. As a result,\nsupx1,...,xM f H (x1, . . . , xM ) \u2264 supx1,...,xM\nm=1 f G(xm). The proof of the theorem then fol-\nlows by applying Theorem 3.1.\n\n4 Log-supermodular decomposability vs. log-concave decomposability\n\nAs discussed above, log-concave decomposable and log-supermodular decomposable models are\nboth examples of continuous graphical models for which the MAP relaxation is tight. These two\nclasses are not equivalent: twice continuously differentiable functions are supermodular if and only\nif all off diagonal elements of the Hessian matrix are non-negative. Contrast this with twice con-\ntinuously differentiable concave functions where the Hessian matrix must be negative semide\ufb01nite.\nIn particular, this means that log-supermodular functions can be multimodel. In this section, we\nexplore the relationship between log-supermodularity and log-concavity.\n\n4.1 Gaussian MRFs\n\nWe begin with the case of Gaussian graphical models, i.e., pairwise graphical models given by\nexp (\u2212Aijxixj)\n\nf (x) \u221d =(cid:0)\u22121/2xT Ax + bT x(cid:1) =\n\nAiix2\n\nexp\n\ni + bixi\n\n(cid:19) (cid:89)\n\n(i,j)\u2208E\n\n(cid:89)\n\ni\u2208V\n\n(cid:18)\n\n\u2212 1\n2\n\nfor some symmetric positive de\ufb01nite matrix A \u2208 Rn\u00d7n and vector b \u2208 Rn. Here, f factors over the\ngraph G corresponding to the non-zero entries of the matrix A.\nGaussian graphical models are a relatively well-studied class of continuous graphical models.\nIn fact, suf\ufb01cient conditions for the convergence and correctness of Gaussian belief propagation\n(GaBP) are known for these models. Speci\ufb01cally, GaBP converges to the optimal solution if the\npositive de\ufb01nite matrix A is walk-summable, scaled diagonally dominant, or log-concave decom-\nposable [22; 7; 8; 9]. These conditions are known to be equivalent [23; 6].\nDe\ufb01nition 4.1. \u0393 \u2208 Rn\u00d7n is scaled diagonally dominant if \u2203w \u2208 Rn, w > 0 such that |\u0393ii|wi >\n\n(cid:80)\nj(cid:54)=i |\u0393ij|wj.\n\nIn addition, the following theorem provides a characterization of scaled diagonal dominance (and\nhence log-concave decomposability) in terms of graph covers for these models.\nTheorem 4.2 (Ruozzi and Tatikonda 6). Let A be a symmetric positive de\ufb01nite matrix. The following\nare equivalent.\n\n5\n\n\f1. A is scaled diagonally dominant.\n\n2. All covers of A are positive de\ufb01nite.\n\n3. All 2-covers of A are positive de\ufb01nite.\n\nThe proof of this theorem constructs a speci\ufb01c 2-cover whose covariance matrix has negative eigen-\nvalues whenever the matrix A is positive de\ufb01nite but not scaled diagonally dominant. The joint\ndistribution corresponding to this 2-cover is not bounded from above, so the optimal value of the\nMAP relaxation is +\u221e as per Theorem 3.1.\nFor Gaussian graphical models, log-concave decomposability and log-supermodular decomposabil-\nity are related: every positive de\ufb01nite, log-supermodular decomposable model is log-concave de-\ncomposable, and every positive de\ufb01nite, log-concave decomposable model is a signed version of\nsome positive de\ufb01nite, log-supermodular decomposable Gaussian graphical model. This follows\nfrom the following simple lemma.\nLemma 4.3. A symmetric positive de\ufb01nite matrix A is scaled diagonally dominant if and only if the\nmatrix B such that Bii = Aii for all i and Bij = \u2212|Aij| for all i (cid:54)= j is positive de\ufb01nite.\nIf A is positive de\ufb01nite and scaled diagonally dominant, then the model is log-concave decompos-\nable. In contrast, the model would be log-supermodular decomposable if all of the off-diagonal ele-\nments of A were negative, independent of the diagonal. In particular, the diagonal could have both\npositive and negative elements, meaning that f (x) could be either log-concave, log-convex, or nei-\nther. As log-convex quadratic forms do not correspond to normalizable Gaussian graphical models,\nthe log-convex case appears to be less interesting as the MAP problem is unbounded from above.\nHowever, the situation is entirely different for constrained (over some convex set) log-quadratic\nmaximization. As an example, consider a box constrained log-quadratic maximization problem\nwhere the matrix A has all negative off-diagonal entries. Such a model is always log-supermodular\ndecomposable. Hence, the MAP relaxation is tight, but the model is not necessarily log-concave.\n\n4.2 Pairwise twice differentiable MRFs\n\nAll of the results from the previous section can be extended to general twice continuously differen-\ntiable functions over pairwise graphical models (i.e., |\u03b1| = 2 for all \u03b1 \u2208 A). In this section, unless\notherwise speci\ufb01ed, assume that all models are pairwise.\nTheorem 4.4. If log f (x) is strictly concave and twice continuously differentiable, the following are\nequivalent.\n\n1. \u22072(log f )(x) is scaled diagonally dominant for all x.\n2. \u22072(log f H )(xH ) is negative de\ufb01nite for every graph cover H of G and every xH.\n3. \u22072(log f H )(xH ) is negative de\ufb01nite for every 2-cover H of G and every xH.\n\nThe equivalence of 1-3 in Theorem 4.4 follows from Theorem 4.2.\nCorollary 4.5. If \u22072(log f )(x) is scaled diagonally dominant for all x, then the continuous MAP\nrelaxation is tight.\nCorollary 4.6. If f is log-concave decomposable over a pairwise graphical model and strictly log-\nconcave, then \u22072(log f )(x) is scaled diagonally dominant for all x.\nWhether or not log-concave decomposability is equivalent to the other conditions listed in the state-\nment of Theorem 4.4 remains an open question (though we conjecture that this is the case). Similar\nideas can be extended to general twice continuously differentiable functions.\nTheorem 4.7. Suppose log f (x) is twice continuously differentiable with a maximum at x\u2217. Let\nBij = |\u22072(log f )(x\u2217)ij| for all i (cid:54)= j and Bii = \u22072(log f )(x\u2217)ii. If f admits a pairwise factoriza-\ntion over G and B has both positive and negative eigenvalues, then the continuous MAP relaxation\nis not tight.\n\nProof. If B has both positive and negative eigenvalues, then there exists a 2-cover H of G such that\n\u22072(log f H )(x\u2217, x\u2217) has both positive and negative eigenvalues. As a result, the lift of x\u2217 to the\n\n6\n\n\f2-cover f H is a saddle point. Consequently, f H (x\u2217, x\u2217) < supxH f H (xH ). By Theorem 3.1, the\ncontinuous MAP relaxation cannot be tight.\n\nIf \u22072(log f ) is positive de\ufb01nite but not scaled diagonally\nThis negative result is quite general.\ndominant at any global optimum, then the MAP relaxation is not tight. In particular, this means that\nall log-supermodular decomposable functions that meet the conditions of the theorem must be s.d.d.\nat their optima.\nAlgorithmically, Moallemi and Van Roy [9] argued that belief propagation converges for models\nthat are log-concave decomposable and scaled diagonally dominant. It is unknown whether or not a\nsimilar convergence argument applies to log-supermodular decomposable functions.\n\n4.3 Concave closures\n\nMany of the tightness results in the discrete case can be seen as a speci\ufb01c case of the continuous\nresults described above. Again, suppose that X \u2282 R is a \ufb01nite set.\nDe\ufb01nition 4.8. The concave closure of a function g : X n \u2192 R \u222a {\u2212\u221e} at x \u2208 Rn is given by\n\n\u03bb(y)g(y) :(cid:80)\n\ny \u03bb(y) = 1,(cid:80)\n\ny \u03bb(y)y = x, \u03bb(y) \u2265 0\n\ng(x) = sup\n\n\uf8f1\uf8f2\uf8f3(cid:88)\n\ny\u2208X n\n\n\uf8fc\uf8fd\uf8fe\n\nEquivalently, the concave closure of a function is the smallest concave function such that g(x) \u2264\ng(x) for all x. A function and its concave closure must necessarily have the same maximum. Com-\nputing the concave (or convex) closure of a function is NP-hard in general, but it can be ef\ufb01ciently\ncomputed for certain special classes of discrete functions. In particular, when X = {0, 1} and log f\nis supermodular, then its concave closure can be computed in polynomial time as it is equal to the\nLov\u00b4asz extension of log f. The Lov\u00b4asz extension has a number of interesting properties. Most\nnotably, it is linear (the Lov\u00b4asz extension of a sum of functions is equal to sum of the Lov\u00b4asz ex-\ntensions). De\ufb01ne the log-concave closure of f to be \u02c6f (x) = exp(log f (x)). As a result, if f is\nlog-supermodular decomposable, then \u02c6f is log-concave decomposable.\n\n(cid:81)\n\u03b1\u2208A \u02c6f\u03b1, then supx\u2208X n f (x) =(cid:80)\n\nThis theorem is a direct consequence of Theorem 3.2. For example, the tightness results of Bayati\net al. [11] and Sanghavi et al. [14] (and indeed many others) can be seen as a special case of this\ntheorem. Even when |X| is not \ufb01nite, the concave closure can be similarly de\ufb01ned, and the the-\norem holds in this case as well. Given the characterization in the discrete case, this suggests that\nthere could be a, possibly deep, connection between log-concave closures and log-supermodular\ndecomposability.\n\nTheorem 4.9. If \u02c6f =(cid:81)\n\ni\u2208V\n\n\u02c6fi\n\n(cid:104)\u03b8, \u00b5(cid:105).\n\n\u00b5\u2208ML\n\n5 Discussion\n\nWe have demonstrated that the same necessary and suf\ufb01cient condition based on graph covers for\nthe tightness of the MAP LP in the discrete case translates seamlessly to the continuous case. This\ncharacterization allowed us to provide simple proofs of the tightness of the MAP relaxation for log-\nconcave decomposable and log-supermodular decomposable models. While the proof of Theorem\n3.1 is nontrivial, it provides a powerful tool to reason about the tightness of MAP relaxations. We\nalso explored the intricate relationship between log-concave and log-supermodular decomposablity\nin both the discrete and continuous cases which provided intuition about when the MAP relaxation\ncan or cannot be tight for pairwise graphical models.\n\nA Proof of Theorem 3.1\n\nThe proof of this theorem proceeds in two parts. First, we will argue that\n\nsup\n\u00b5\u2208ML\n\n(cid:104)\u03b8, \u00b5(cid:105) \u2265 sup\n\nM\n\nsup\n\nH\u2208CM (G)\n\nsup\nxH\n\n1\nM\n\nlog f H (xH ).\n\nTo see this, \ufb01x an M-cover, H, of G via the homomorphism h and consider any assignment xH.\nConstruct the mean parameters \u00b5(cid:48) \u2208 ML as follows.\n\n7\n\n\f(cid:88)\n(cid:88)\n\nj\u2208V (H):h(j)=i\n\n\u03b2\u2208A(H):h(\u03b2)=\u03b1\n\n\u03b4(xH\n\nj \u2212 xi)\n\u03b2 \u2212 x\u03b1)\n\n\u03b4(xH\n\n\u03c4i(xi) =\n\n\u03c4\u03b1(x\u03b1) =\n\n1\nM\n1\nM\n\n(cid:90)\n(cid:90)\n\n\u00b5(cid:48)\ni =\n\n\u00b5(cid:48)\n\u03b1 =\n\n\u03c4i(xi)\u03c6i(xi)dxi\n\n\u03c4\u03b1(x\u03b1)\u03c6\u03b1(x\u03b1)dx\u03b1\n\nHere, \u03b4(\u00b7) is the Dirac delta function1. This implies that\n\n1\nM\n\nlog f H (xH ) = (cid:104)\u03b8, \u00b5(cid:48)(cid:105) \u2264 sup\n\u00b5\u2208ML\n\n(cid:104)\u03b8, \u00b5(cid:105).\n\n(cid:26) 2s(cid:82)\n\nFor the other direction, \ufb01x some \u00b5(cid:48) \u2208 ML such that \u00b5(cid:48) is generated by the vector of densities \u03c4.\nWe will prove the result for locally consistent probability distributions with bounded support. The\nresult for arbitrary \u03c4 will then follow by constructing sequences of these distributions that converge\n(in measure) to \u03c4. For simplicity, we will assume that each potential function is strictly positive2.\nConsider the space [\u2212t, t]|V | for some positive integer t. We will consider local probability dis-\ntributions that are supported on subsets of this space. That is, supp(\u03c4i) \u2286 [\u2212t, t] for each i and\nsupp(\u03c4\u03b1) \u2286 [\u2212t, t]|\u03b1| for each \u03b1. For a \ufb01xed positive integer s, divide the interval [\u2212t, t] into 2s+1t\nintervals of size 1/2s and let Sk denote the kth interval. This partitioning divides [\u2212t, t]|V | into dis-\njoint cubes of volume 1/2s|V |. The distribution \u03c4 can be approximated as a sequence of distributions\n\u03c4 1, \u03c4 2, . . . as follows. De\ufb01ne a vector of approximate densities \u03c4 s by setting\n\nSk\n\n\u03c4i(xi)dxi,\n\nif x(cid:48)\n\ni \u2208 Sk\n0, otherwise\n\n\u03b1 \u2208(cid:81)\n\nif x(cid:48)\n\nkj :j\u2208\u03b1 Skj\n\n(cid:40)\n\ni) (cid:44)\n\n2|\u03b1|s(cid:82)(cid:81)\ni (x(cid:48)\n\u03c4 s\n(cid:82)\n\u03b1(x\u03b1)\u03c6\u03b1(x\u03b1)dx\u03b1 \u2192 \u00b5(cid:48)\n\n[\u2212t,t] \u03c4 s\n\n\u03b1(x(cid:48)\n\u03c4 s\n\n(cid:82)\n\n\u03c4\u03b1(x\u03b1)dx\u03b1,\n\n0, otherwise\n\nkj :j\u2208\u03b1 Skj\ni (xi)\u03c6i(xi)dxi \u2192 \u00b5(cid:48)\n\n\u03b1) (cid:44)\nWe have \u03c4 s \u2192 \u03c4,\n[\u2212t,t]|\u03b1| \u03c4 s\nThe continuous MAP relaxation for local probability distributions of this form can be expressed in\nterms of discrete variables over X = {1, . . . , 2s+1t}. To see this, de\ufb01ne \u00b5s\n\u03c4 s\ni (xi)dxi\n\u03b1(x\u03b1)dx\u03b1 for each z\u03b1 \u2208 {1, . . . , 2s+1t}|\u03b1|. The\nfor each zi \u2208 {1, . . . , 2s+1t} and \u00b5s\n\u03c4 s\n(cid:88)\n(cid:88)\ncorresponding MAP LP objective, evaluated at \u00b5s, is then\n\n\u03b1(z\u03b1) =(cid:82)\n\ni (zi) =(cid:82)\n\n\u03b1 for each \u03b1 \u2208 A(G).\n\nfor each i\n\n(cid:88)\n\n(cid:88)\n\nV (G),\n\n(cid:90)\n\n(cid:90)\n\nand\n\n\u2208\n\nSz\u03b1\n\nSzi\n\ni\n\n2|\u03b1|s log f\u03b1(x\u03b1)dx\u03b1.\n\n(1)\n\n\u00b5s\n\ni (zi)\n\n2s log fi(xi)dxi +\n\n\u00b5s\n\n\u03b1(z\u03b1)\n\nThis MAP LP objective corresponds to a discrete graphical model that factors over the hypergraph\nG, with potential functions corresponding to the above integrals over the partitions indexed by the\nvector z.\n\ni\u2208V\n\nzi\n\ngs(z) \u221d (cid:89)\n(cid:89)\n\ni\u2208V (G)\n\n=\n\ni\u2208V (G)\n\nexp\n\nexp\n\nSzi\n\n(cid:32)(cid:90)\n(cid:18)(cid:90)\n\nSzi\n\nSz\n\n\u03b1\u2208A\n\nz\u03b1\n\n(cid:33) (cid:89)\n(cid:19) (cid:89)\n\n\u03b1\u2208A(G)\n\n(cid:32)(cid:90)\n\nSz\u03b1\n\nSz\u03b1\n\n(cid:18)(cid:90)\n\nexp\n\nSz\n\n\u03b1\u2208A(G)\n\n2s log fi(xi)dxi\n\nexp\n\n2|\u03b1|s log f\u03b1(x\u03b1)dx\u03b1\n\n2|V (G)|s log fi(xi)dx\n\n2|V (G)|s log f\u03b1(x\u03b1)dx\n\n(cid:33)\n\n(cid:19)\n\nEvery assignment selects a single cube indexed by z. The value of the objective is calculated by\naveraging log f over the cube indexed by z. As a result, maxz gs(z) \u2264 supx f (x) and for any\nM-cover H of G, maxz1:M gH,s(z1, . . . , zM ) \u2264 supx1:m f H (x1, . . . , xM ). As this upper bound\nholds for any \ufb01xed s, it must also hold for any vector of distributions that can be written as a limit of\nsuch distributions. Now, by applying Theorem 2.2 for the discrete case, (cid:104)\u03b8, \u00b5(cid:48)(cid:105) = lims\u2192\u221e(cid:104)\u03b8, \u00b5s(cid:105) \u2264\nM log f H (xH ) as desired. To \ufb01nish the proof, observe that any Riemann\nsupM supH\u2208CM (G) supxH\nintegrable density can be arbitrarily well approximated by densities of this form as t \u2192 \u221e.\n\n1\n\n1In order to make this precise, we would need to use Lebesgue integration or take a sequence of probability\n\ndistributions over the space RM|V | that arbitrarily well-approximate the desired assignment xH.\nthe support of the corresponding potential function (i.e., supp(\u03c4i) \u2286 supp(fi)) for the integrals to exist.\n\n2The same argument will apply in the general case, but each of the local distributions must be contained in\n\n8\n\n\fReferences\n[1] A. Globerson and T. S. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP\nLP-relaxations. In Proc. 21st Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada,\n2007.\n\n[2] T. Werner. A linear programming approach to max-sum problem: A review. Pattern Analysis and Machine\n\nIntelligence, IEEE Transactions on, 29(7):1165\u20131179, 2007.\n\n[3] N. Ruozzi and S. Tatikonda. Message-passing algorithms: Reparameterizations and splittings.\n\nTransactions on Information Theory, 59(9):5860\u20135881, Sept. 2013.\n\nIEEE\n\n[4] Y. Wald and A. Globerson. Tightness results for local consistency relaxations in continuous MRFs. In\n\nProc. 30th Uncertainty in Arti\ufb01cal Intelligence (UAI), Quebec City, Quebec, Canada, 2014.\n\n[5] T. P. Minka. Expectation propagation for approximate Bayesian inference. In Proceedings of the Seven-\n\nteenth conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), pages 362\u2013369, 2001.\n\n[6] N. Ruozzi and S. Tatikonda. Message-passing algorithms for quadratic minimization. Journal of Machine\n\nLearning Research, 14:2287\u20132314, 2013.\n\n[7] D. M. Malioutov, J. K. Johnson, and A. S. Willsky. Walk-sums and belief propagation in Gaussian\n\ngraphical models. Journal of Machine Learning Research, 7:2031\u20132064, 2006.\n\n[8] C. C. Moallemi and B. Van Roy. Convergence of min-sum message passing for quadratic optimization.\n\nInformation Theory, IEEE Transactions on, 55(5):2413 \u20132423, May 2009.\n\n[9] C. C. Moallemi and B. Van Roy. Convergence of min-sum message-passing for convex optimization.\n\nInformation Theory, IEEE Transactions on, 56(4):2041 \u20132050, April 2010.\n\n[10] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference.\n\nFoundations and Trends R(cid:13) in Machine Learning, 1(1-2):1\u2013305, 2008.\n\n[11] M. Bayati, C. Borgs, J. Chayes, and R. Zecchina. Belief propagation for weighted b-matchings on arbi-\ntrary graphs and its relation to linear programs with integer solutions. SIAM Journal on Discrete Mathe-\nmatics, 25(2):989\u20131011, 2011.\n\n[12] V. Kolmogorov and R. Zabih. What energy functions can be minimized via graph cuts? In Computer\n\nVisionECCV 2002, pages 65\u201381. Springer, 2002.\n\n[13] S. Sanghavi, D. M. Malioutov, and A. S. Willsky. Belief propagation and LP relaxation for weighted\nmatching in general graphs. Information Theory, IEEE Transactions on, 57(4):2203 \u20132212, April 2011.\n[14] S. Sanghavi, D. Shah, and A. S. Willsky. Message passing for maximum weight independent set. Infor-\n\nmation Theory, IEEE Transactions on, 55(11):4822\u20134834, Nov. 2009.\n\n[15] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky. MAP estimation via agreement on (hyper)trees:\nInformation Theory, IEEE Transactions on, 51(11):3697\u2013\n\nMessage-passing and linear programming.\n3717, Nov. 2005.\n\n[16] David Sontag, Talya Meltzer, Amir Globerson, Yair Weiss, and Tommi Jaakkola. Tightening LP relax-\nations for MAP using message-passing. In 24th Conference in Uncertainty in Arti\ufb01cial Intelligence, pages\n503\u2013510. AUAI Press, 2008.\n\n[17] P. O. Vontobel. Counting in graph covers: A combinatorial characterization of the Bethe entropy function.\n\nInformation Theory, IEEE Transactions on, Jan. 2013.\n\n[18] P. O. Vontobel and R. Koetter. Graph-cover decoding and \ufb01nite-length analysis of message-passing itera-\n\ntive decoding of LDPC codes. CoRR, abs/cs/0512078, 2005.\n\n[19] S. Iwata, L. Fleischer, and S. Fujishige. A strongly polynomial-time algorithm for minimizing submodular\n\nfunctions. Journal of The ACM, 1999.\n\n[20] A. Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomial time.\n\nJournal of Combinatorial Theory, Series B, 80(2):346 \u2013 355, 2000.\n\n[21] S. Karlin and Y. Rinott. Classes of orderings of measures and related correlation inequalities. i. multivari-\n\nate totally positive distributions. Journal of Multivariate Analysis, 10(4):467 \u2013 498, 1980.\n\n[22] Y. Weiss and W. T. Freeman. Correctness of belief propagation in Gaussian graphical models of arbitrary\n\ntopology. Neural Comput., 13(10):2173\u20132200, Oct. 2001.\n\n[23] D. M. Malioutov. Approximate inference in Gaussian graphical models. Ph.D. thesis, EECS, MIT, 2008.\n\n9\n\n\f", "award": [], "sourceid": 1373, "authors": [{"given_name": "Nicholas", "family_name": "Ruozzi", "institution": "UTDallas"}]}