{"title": "Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 2017, "page_last": 2025, "abstract": "We propose a new approach to the analysis of Loopy Belief Propagation (LBP) by establishing a formula that connects the Hessian of the Bethe free energy with the edge zeta function. The formula has a number of theoretical implications on LBP. It is applied to give a sufficient condition that the Hessian of the Bethe free energy is positive definite, which shows non-convexity for graphs with multiple cycles. The formula clarifies the relation between the local stability of a fixed point of LBP and local minima of the Bethe free energy. We also propose a new approach to the uniqueness of LBP fixed point, and show various conditions of uniqueness.", "full_text": "Graph Zeta Function in the Bethe Free Energy and\n\nLoopy Belief Propagation\n\nYusuke Watanabe\n\nKenji Fukumizu\n\nThe Institute of Statistical Mathematics\n\nThe Institute of Statistical Mathematics\n\n10-3 Midori-cho, Tachikawa\n\nTokyo 190-8562, Japan\nwatay@ism.ac.jp\n\n10-3 Midori-cho, Tachikawa\n\nTokyo 190-8562, Japan\n\nfukumizu@ism.ac.jp\n\nAbstract\n\nWe propose a new approach to the analysis of Loopy Belief Propagation (LBP) by\nestablishing a formula that connects the Hessian of the Bethe free energy with the\nedge zeta function. The formula has a number of theoretical implications on LBP.\nIt is applied to give a suf\ufb01cient condition that the Hessian of the Bethe free energy\nis positive de\ufb01nite, which shows non-convexity for graphs with multiple cycles.\nThe formula clari\ufb01es the relation between the local stability of a \ufb01xed point of\nLBP and local minima of the Bethe free energy. We also propose a new approach\nto the uniqueness of LBP \ufb01xed point, and show various conditions of uniqueness.\n\n1 Introduction\n\nPearl\u2019s belief propagation [1] provides an ef\ufb01cient method for exact computation in the inference\nwith probabilistic models associated to trees. As an extension to general graphs allowing cycles,\nLoopy Belief Propagation (LBP) algorithm [2] has been proposed, showing successful performance\nin various problems such as computer vision and error correcting codes.\n\nOne of the interesting theoretical aspects of LBP is its connection with the Bethe free energy [3]. It\nis known, for example, the \ufb01xed points of LBP correspond to the stationary points of the Bethe free\nenergy. Nonetheless, many of the properties of LBP such as exactness, convergence and stability are\nstill unclear, and further theoretical understanding is needed.\n\nThis paper theoretically analyzes LBP by establishing a formula asserting that the determinant of\nthe Hessian of the Bethe free energy equals the reciprocal of the edge zeta function up to a positive\nfactor. This formula derives a variety of results on the properties of LBP such as stability and\nuniqueness, since the zeta function has a direct link with the dynamics of LBP as we show.\n\nThe \ufb01rst application of the formula is the condition for the positive de\ufb01niteness of the Hessian of\nthe Bethe free energy. The Bethe free energy is not necessarily convex, which causes unfavorable\nbehaviors of LBP such as oscillation and multiple \ufb01xed points. Thus, clarifying the region where\nthe Hessian is positive de\ufb01nite is an importance problem. Unlike the previous approaches which\nconsider the global structure of the Bethe free energy such as [4, 5], we focus the local structure.\nNamely, we provide a simple suf\ufb01cient condition that determines the positive de\ufb01nite region: if all\nthe correlation coef\ufb01cients of the pseudomarginals are smaller than a value given by a characteristic\nof the graph, the Hessian is positive de\ufb01nite. Additionally, we show that the Hessian always has a\nnegative eigenvalue around the boundary of the domain if the graph has at least two cycles.\n\nSecond, we clarify a relation between the local stability of a LBP \ufb01xed point and the local structure\nof the Bethe free energy. Such a relation is not necessarily obvious, since LBP is not the gradient\ndescent of the Bethe free energy. In this line of studies, Heskes [6] shows that a locally stable \ufb01xed\npoint of LBP is a local minimum of the Bethe free energy. It is thus interesting to ask which local\n\n1\n\n\fminima of the Bethe free energy are stable or unstable \ufb01xed points of LBP. We answer this question\nby elucidating the conditions of the local stability of LBP and the positive de\ufb01niteness of the Bethe\nfree energy in terms of the eigenvalues of a matrix, which appears in the graph zeta function.\n\nFinally, we discuss the uniqueness of LBP \ufb01xed point by developing a differential topological result\non the Bethe free energy. The result shows that the determinant of the Hessian at the \ufb01xed points,\nwhich appears in the formula of zeta function, must satisfy a strong constraint. As a consequence,\nin addition to the known result on the one-cycle case, we show that the LBP \ufb01xed point is unique\nfor any unattractive connected graph with two cycles without restricting the strength of interactions.\n\n2 Loopy belief propagation algorithm and the Bethe free energy\n\nThroughout this paper, G = (V, E) is a connected undirected graph with V the vertices and E the\nundirected edges. The cardinality of V and E are denoted by N and M respectively.\nIn this article we focus on binary variables, i.e., xi \u2208 {\u00b11}. Suppose that the probability distribution\nover the set of variables x = (xi)i\u2208V is given by the following factorization form with respect to G:\n\n\u220f\n\n\u220f\n\np(x) =\n\n1\nZ\n\n\u03c8ij(xi, xj)\n\n\u03c8i(xi),\n\ni\u2208V\n\nij\u2208E\n\n(1)\n\n\u2211\n\n\u2211\n\nwhere Z is a normalization constant and \u03c8ij and \u03c8i are positive functions given by \u03c8ij(xi, xj) =\nexp(Jijxixj) and \u03c8i(xi) = exp(hixi) without loss of generality.\nx\\{xi} p(x) and\nIn various applications, the computation of marginal distributions pi(xi) :=\nx\\{xixj} p(x) is required though the exact computation is intractable for large\npij(xi, xj) :=\ngraphs. If the graph is a tree, they are ef\ufb01ciently computed by Pearl\u2019s belief propagation algorithm\n[1]. Even if the graph has cycles, it is empirically known that the direct application of this algorithm,\ncalled Loopy Belief Propagation (LBP), often gives good approximation.\nLBP is a message passing algorithm. For each directed edge, a message vector \u00b5i\u2192j(xj) is assigned\nand initialized arbitrarily. The update rule of messages is given by\n\n\u2211\n\n\u220f\n\ni\u2192j(xj) \u221d\n\u00b5new\n\n\u00b5k\u2192i(xi),\n\nxi\n\nk\u2208Ni\\j\n\n\u03c8ji(xj, xi)\u03c8i(xi)\n\n(2)\nwhere Ni is the neighborhood of i \u2208 V . The order of edges in the update is arbitrary. In this paper\n\u220f\nwe consider parallel update, that is, all edges are updated simultaneously. If the messages converge\nto a \ufb01xed point {\u00b5\n}, the approximations of pi(xi) and pij(xi, xj) are calculated by the beliefs,\n\u221e\ni\u2192j\nbi(xi) / i(xi)\n1\n1\n\u2211\nk!j(xj);\nk!i(xi);\n(cid:22)\n\u2211\n\n(3)\nbij(xi, xj) = 1. From (2) and (3), the constraints\n\nbij(xi; xj) / ij(xi; xj) i(xi) j(xj)\n\nbi(xi) = 1 and\nbij(xi, xj) = bi(xi) are automatically satis\ufb01ed.\n\nwith normalization\nbij(xi, xj) > 0 and\n\n(cid:22)\nk2Ninj\n\n(cid:22)\nk2Njni\n\n1\nk!i(xi)\n\n\u220f\n\n\u220f\n\n\u2211\n\nk2Ni\n\nxi,xj\n\nxi\n\nxj\n\nb(x) \u221d \u220f\n\nWe introduce the Bethe free energy as a tractable approximation of the Gibbs free energy. The\nexact distribution (1) is characterized by a variational problem p(x) = argmin ^p FGibbs(^p), where\nthe minimum is taken over all probability distributions on (xi)i\u2208V and FGibbs(^p) is the Gibbs free\nenergy de\ufb01ned by FGibbs(^p) = KL(^p||p) \u2212 log Z. Here KL(^p||p) =\n^p log(^p/p) is the Kullback-\nLeibler divergence from ^p to p. Note that FGibbs(^p) is a convex function of ^p.\nIn the Bethe approximation, we con\ufb01ne the above minimization to the distribution of the form\ni\u2208V bi(xi)1\u2212di , where di := |Ni| is the degree and the constraints\nbij(xi, xj) > 0,\nbij(xi, xj) = bi(xi) are satis\ufb01ed. A set\n{bi(xi), bij(xi, xj)} satisfying these constraints is called pseudomarginals. For computational\ntractability, we modify the Gibbs free energy to the objective function called Bethe free energy:\n\nij\u2208E bij(xi, xj)\nxi,xj\n\nbij(xi, xj) = 1 and\n\n\u2211\n\n\u220f\n\nxj\n\n\u222b\n\n\u2211\n\u2211\n\nij\u2208E\n\n+\n\n\u2211\n\u2211\n\nxixj\n\nF (b) := \u2212\n\n\u2211\n\ni\u2208V\n\n\u2211\n\u2211\n\nxi\n\n\u2211\n\nbi(xi) log \u03c8i(xi)\n(1 \u2212 di)\n\nxi\n\n\u2211\nbij(xi, xj) log \u03c8ij(xi, xj) \u2212\n\nij\u2208E\n\nxixj\n\nbij(xi, xj) log bij(xi, xj) +\n\ni\u2208V\n\n2\n\nbi(xi) log bi(xi).\n\n(4)\n\n\fThe domain of the objective function F is the set of pseudomarginals. The function F does not\nnecessarily have a unique minimum. The outcome of this modi\ufb01ed variational problem is the same\nas that of LBP [3]. To put it more precisely, There is a one-to-one correspondence between the set\nof stationary points of the Bethe free energy and the set of \ufb01xed points of LBP.\nIt is more convenient if we work with minimal parameters, mean mi = Ebi [xi] and correlation\n\u03c7ij = Ebij [xixj]. Then we have an effective parametrization of pseudomarginals:\n\n(1 + mixi + mjxj + \u03c7ijxixj),\n\nbi(xi) =\n\n1\n2\n\n(1 + mi).\n\n(5)\n\nbij(xi, xj) =\n\n1\n4\n\n\u2211\n\nThe Bethe free energy (4) is rewritten as\nJij\u03c7ij \u2212\n\nF ({mi, \u03c7ij}) = \u2212\n(\n\n\u2211\n\n\u2211\n\nij\u2208E\n\n\u2211\n\nhimi\n\ni\u2208V\n\n+\n\n\u03b7\n\n1 + mixi + mjxj + \u03c7ijxixj\n\n)\n\n\u2211\n\n+\n\n(1 \u2212 di)\n\n\u2211\n\n(\n\n)\n\n1 + mixi\n\n\u03b7\n\n,\n\n(6)\n\n4\n\nxixj\n\ni\u2208V\n\nij\u2208E\n\nL(G) :=\n\nwhere \u03b7(x) := x log x. The domain of F is written as\n\n{\n}\n{mi, \u03c7ij} \u2208 RN +M|1 + mixi + mjxj + \u03c7ijxixj > 0 for all ij \u2208 E and xi, xj = \u00b11\nThe Hessian of F , which consists of the second derivatives with respect to {mi, \u03c7ij}, is a square\nmatrix of size N + M and denoted by \u22072F . This is considered to be a matrix-valued function on\nL(G). Note that, from (6), \u22072F does not depend on Jij and hi.\n\nxi\n\n2\n\n.\n\n3 Zeta function and Hessian of Bethe free energy\n\n3.1 Zeta function and Ihara\u2019s formula\n\nFor each undirected edge of G, we make a pair of oppositely directed edges, which form a set of\ndirected edges \u20d7E. Thus | \u20d7E| = 2M. For each directed edge e \u2208 \u20d7E, o(e) \u2208 V is the origin of e and\nt(e) \u2208 V is the terminus of e. For e \u2208 \u20d7E, the inverse edge is denoted by (cid:22)e, and the corresponding\nundirected edge by [e] = [(cid:22)e] \u2208 E.\nA closed geodesic in G is a sequence (e1, . . . , ek) of directed edges such that t(ei) =\no(ei+1) and ei \u0338= (cid:22)ei+1 for i \u2208 Z/kZ. Two closed geodesics are said to be equivalent if one is\nobtained by cyclic permutation of the other. An equivalent class of closed geodesics is called a\nprime cycle if it is not a repeated concatenation of a shorter closed geodesic. Let P be the set of\nprime cycles of G. For given weights u = (ue)e\u2208 \u20d7E, the edge zeta function [7, 8] is de\ufb01ned by\n\n\u03b6G(u) :=\n\n(1 \u2212 g(p))\n\n\u22121,\n\ng(p) := ue1\n\n\u00b7\u00b7\u00b7 uek\n\nfor p = (e1, . . . , ek),\n\nwhere ue \u2208 C is assumed to be suf\ufb01ciently small for convergence. This is an analogue of the\nRiemann zeta function which is represented by the product over all the prime numbers.\n\u220f\nExample 1. If G is a tree, which has no prime cycles, \u03b6G(u) = 1. For 1-cycle graph CN of\nlength N, the prime cycles are (e1, e2, . . . , eN ) and ((cid:22)eN , (cid:22)eN\u22121, . . . , (cid:22)e1), and thus \u03b6CN (u) = (1 \u2212\n\u22121. Except for these two types of graphs, the number of prime cycles is\nN\nl=1 uel )\n\n\u22121(1 \u2212\u220f\n\nN\nl=1 u(cid:22)el )\n\n\u220f\n\np\u2208P\n\nin\ufb01nite.\n\nIt is known that the edge zeta function has the following simple determinant formula, which gives\nanalytical continuation to the whole C2M . Let C( \u20d7E) be the set of functions on the directed edges.\nWe de\ufb01ne a matrix on C( \u20d7E), which is determined by the graph G, by\nif e \u0338= (cid:22)e\u2032 and o(e) = t(e\n\u2032\notherwise.\n\nMe,e\u2032 :=\n\n{\n\n(7)\n\n1\n0\n\n),\n\nTheorem 1 ([8], Theorem 3).\nwhere U is a diagonal matrix de\ufb01ned by Ue,e\u2032 := ue\u03b4e,e\u2032.\n\n\u03b6G(u) = det(I \u2212 UM)\n\n\u22121,\n\n(8)\n\n3\n\n\fWe need to show another determinant formula of the edge zeta function, which is used in the proof\nof theorem 3. We leave the proof of theorem 2 to the supplementary material.\n( \u2211\nTheorem 2 (Multivariable version of Ihara\u2019s formula). Let C(V ) be the set of functions on V . We\nde\ufb01ne two linear operators on C(V ) by\nwhere f \u2208 C(V ).\n( ^Df )(i) :=\nThen we have (\n\n)\nueu(cid:22)e\n1 \u2212 ueu(cid:22)e\n)\n\n1 \u2212 ueu(cid:22)e\n\u220f\n\n( ^Af )(i) :=\n\n\u2211\n\nf (o(e)),\n\ne\u2208 \u20d7E\nt(e)=i\n\ne\u2208 \u20d7E\nt(e)=i\n\nf (i),\n\n(9)\n\nue\n\n\u22121 =\n\ndet(I \u2212 UM) = det(I + ^D \u2212 ^A)\n\n\u03b6G(u)\n\n(1 \u2212 ueu(cid:22)e).\n\n(10)\n\n[e]\u2208E\n\nIf we set ue = u for all e \u2208 \u20d7E , the edge zeta function is called the Ihara zeta function [9] and\ndenoted by \u03b6G(u). In this single variable case, theorem 2 is reduced to Ihara\u2019s formula [10]:\n\n\u03b6G(u)\n\n\u22121 = det(I \u2212 uM) = (1 \u2212 u2)M det(I +\n\nu2\n1 \u2212 u2\nwhere D is the degree matrix and A is the adjacency matrix de\ufb01ned by\nf (o(e)),\n\n(Df )(i) := dif (i),\n\n(Af )(i) :=\n\n\u2211\n\nD \u2212 u\n\n1 \u2212 u2\nf \u2208 C(V ).\n\nA),\n\n3.2 Main formula\n\n(\n\n)\n\nTheorem 3 (Main Formula). The following equality holds at any point of L(G):\n\n\u22121=\n\ndet(I \u2212 UM) = det(\u22072F )\n\n\u03b6G(u)\n\nbij(xi, xj)\n\nxi,xj =\u00b11\n\ni\u2208V\n\nij\u2208E\n\nxi=\u00b11\n\nbi(xi)1\u2212di 22N +4M ,\n\ne\u2208 \u20d7E,t(e)=i\n\n\u220f\n\n\u220f\n\n\u220f\n\n\u220f\n\n(11)\n\n(12)\n\n(13)\n\n)\n\nwhere bij and bi are given by (5) and\n\nui\u2192j :=\n\n\u03c7ij \u2212 mimj\n1 \u2212 m2\n\nj\n\n.\n\n[\n\n)]\n\n.\n\n(\n\nY\n0\n\n0\n\u22022F\n\n\u2202\u03c7ij \u2202\u03c7kl\n\n(\n\nProof. (The detail of the computation is given in the supplementary material.)\nFrom (6), it is easy to see that the (E,E)-block of the Hessian is a diagonal matrix given by\n1\n\n@2F\n\n1\n\n1\n\n1\n\n@(cid:31)ij@(cid:31)kl\nUsing this diagonal block, we erase (V,E)-block and (E,V)-block of the Hessian. In other words,\nwe choose a square matrix X such that det X = 1 and\n\n1 + mi + mj + (cid:31)ij\n\n+\n\n1(cid:0) mi + mj (cid:0) (cid:31)ij\n\n+\n\n1 + mi(cid:0) mj (cid:0) (cid:31)ij\n\n+\n\n1(cid:0) mi(cid:0) mj + (cid:31)ij\n\n:\n\n= (cid:14)ij;kl\n\n1\n4\n\nX T (\u22072F )X =\n\u2211\n\n\uf8f1\uf8f2\uf8f3 1\n\nAfter the computation given in the supplementary material, we see that\nk+2mimk\u03c7ik\u2212\u03c72\nik)\n\n(\u03c7ik\u2212mimk)2\n\n\u2212m2\n\ni\n\n(Y )i,j =\n\nk\u2208Ni\n\u2212m2\n\n+\n1\u2212m2\n\n1\u2212m2\n\u2212Ai,j\n(14)\n, it is easy to check that IN + ^D \u2212 ^A = Y W , where ^A and ^D is de\ufb01ned in\n\ni )(1\u2212m2\n(1\u2212m2\n\u03c7ij\u2212mimj\nj +2mimj \u03c7ij\u2212\u03c72\n\nif i = j,\notherwise.\n\nij\n\ni\n\ni\n\nFrom uj\u2192i = \u03c7ij\u2212mimj\n1\u2212m2\n(9) and W is a diagonal matrix de\ufb01ned by Wi,j := \u03b4i,j(1 \u2212 m2\n\ni\n\ndet(I \u2212 UM) = det(Y )\n\n(1 \u2212 m2\ni )\n\ni ). Therefore,\n\n(1 \u2212 ueu(cid:22)e) = R.H.S. of (12)\n\n\u220f\n\ni\u2208V\n\n\u220f\n\n[e]\u2208E\n\nFor the left equality, theorem 2 is used.\n\nTheorem 3 shows that the determinant of the Hessian of the Bethe free energy is essentially equal to\ndet(I\u2212UM), the reciprocal of the edge zeta function. Since the matrix UM has a direct connection\nwith LBP as seen in section 5, the above formula derives many consequences shown in the rest of\nthe paper.\n\n4\n\n\f4 Application to positive de\ufb01niteness conditions\n\nThe convexity of the Bethe free energy is an important issue, as it guarantees uniqueness of the \ufb01xed\npoint. Pakzad et al [11] and Heskes [5] derive suf\ufb01cient conditions of convexity and show that the\nBethe free energy is convex for trees and graphs with one cycle. In this section, instead of such\nglobal structure, we shall focus the local structure of the Bethe free energy as an application of the\nmain formula.\nFor given square matrix X, Spec(X) \u2282 C denotes the set of eigenvalues (spectra), and \u03c1(X) the\nspectral radius of a matrix X, i.e., the maximum of the modulus of the eigenvalues.\nTheorem 4. Let M be the matrix given by (7). For given {mi, \u03c7ij} \u2208 L(G), U is de\ufb01ned by (13).\nThen,\nProof. We de\ufb01ne mi(t) := mi and \u03c7ij(t) := t\u03c7ij + (1 \u2212 t)mimj. Then {mi(t), \u03c7ij(t)} \u2208 L(G)\nand {mi(1), \u03c7ij(1)} = {mi, \u03c7ij}. For t \u2208 [0, 1], we de\ufb01ne U(t) and \u22072F (t) in the same way by\n{mi(t), \u03c7ij(t)}. We see that U(t) = tU. Since Spec(UM) \u2282 C\\R\u22651, we have det(I\u2212tUM) \u0338= 0\nt \u2208 [0, 1]. From theorem 3, det(\u22072F (t)) \u0338= 0 holds on this interval. Using (14) and \u03c7ij(0) =\n\u2200\nmi(0)mj(0), we can check that \u22072F (0) is positive de\ufb01nite. Since the eigenvalues of \u22072F (t) are\nreal and continuous with respect t, the eigenvalues of \u22072F (1) must be positive reals.\n\nSpec(UM) \u2282 C \\ R\u22651 =\u21d2 \u22072F is a positive de\ufb01nite matrix at {mi, \u03c7ij}.\n\nWe de\ufb01ne the symmetrization of ui\u2192j and uj\u2192i by\n\u03c7ij \u2212 mimj\ni )(1 \u2212 m2\n\n\u03b2i\u2192j = \u03b2j\u2192i :=\n\n{(1 \u2212 m2\n\nj )}1/2\n\n=\n\nCovbij [xi, xj]\n\n{Varbi[xi]Varbj [xj]}1/2\n\n.\n\n(15)\n\no(e))\n\n\u22121/2 = \u03b2e(M)e,e\u2032.\n\nt(e))1/2ue(M)e,e\u2032(1 \u2212 m2\n\n\u22121 for all e \u2208 \u20d7E}. Then, the Hessian \u22072F is positive de\ufb01nite on L\u03b1\u22121 (G).\n\nThus, ui\u2192juj\u2192i = \u03b2i\u2192j\u03b2j\u2192i. Since \u03b2i\u2192j = \u03b2j\u2192i, we sometimes abbreviate \u03b2i\u2192j as \u03b2ij. From\nthe \ufb01nal expression, we see that |\u03b2ij| < 1. De\ufb01ne diagonal matrices Z and B by (Z)e,e\u2032\n:=\n\u03b4e,e\u2032 (1 \u2212 m2\nt(e))1/2 and (B)e,e\u2032 := \u03b4e,e\u2032\u03b2e respectively. Then we have ZUMZ\u22121 = BM, because\n(ZUMZ\u22121)e,e\u2032 = (1 \u2212 m2\nTherefore Spec(UM) = Spec(BM).\nThe following corollary gives a more explicit condition of the region where the Hessian is positive\nde\ufb01nite in terms of the correlation coef\ufb01cients of the pseudomarginals.\nCorollary 1. Let \u03b1 be the Perron Frobenius eigenvalue of M and de\ufb01ne L\u03b1\u22121(G) := {{mi, \u03c7ij} \u2208\nL(G)||\u03b2e| < \u03b1\nProof. Since |\u03b2e| < \u03b1\nSpec(BM) \u2229 R\u22651 = \u03d5.\n\u22121 is the distance from the origin to the nearest pole of Ihara\u2019s zeta \u03b6G(u).\nAs is seen from (11), \u03b1\nFrom example 1, we see that \u03b6G(u) = 1 for a tree G and \u03b6CN (u) = (1 \u2212 uN )\n\u22122 for a 1-cycle graph\n\u22121 is \u221e and 1 respectively. In these cases, L\u03b1\u22121(G) = L(G) and F is a strictly\nCN . Therefore \u03b1\nconvex function on L(G), because |\u03b2e| < 1 always holds. This reproduces the results shown in [11].\nIn general, using theorem 8.1.22 of [12], we have mini\u2208V di \u2212 1 \u2264 \u03b1 \u2264 maxi\u2208V di \u2212 1.\nTheorem 3 is also useful to show non-convexity.\nCorollary 2. Let {mi(t) := 0, \u03c7ij(t) := t} \u2208 L(G) for t < 1. Then we have\n\n\u22121M) = 1 ([12] Theorem 8.1.18). Therefore\n\n\u22121, we have \u03c1(BM) < \u03c1(\u03b1\n\nlim\nt\u21921\n\ndet(\u22072F (t))(1 \u2212 t)M +N\u22121 = \u22122\n\n\u2212M\u2212N +1(M \u2212 N )\u03ba(G),\nwhere \u03ba(G) is the number of spanning trees in G. In particular, F is never convex on L(G) for any\nconnected graph with at least two linearly independent cycles, i.e. M \u2212 N \u2265 1.\nProof. The equation (16) is obtained by Hashimoto\u2019s theorem [13], which gives the u \u2192 1 limit\nof the Ihara zeta function. (See supplementary material for the detail.) If M \u2212 N \u2265 1, the right\nhand side of (16) is negative. As approaches to {mi = 0, \u03c7ij = 1} \u2208 L(G), the determinant of the\nHessian diverges to \u2212\u221e. Therefore the Hessian is not positive de\ufb01nite near the point.\n\n(16)\n\nSummarizing the results in this section, we conclude that F is convex on L(G) if and only if G is a\ntree or a graph with one cycle. To the best of our knowledge, this is the \ufb01rst proof of this fact.\n\n5\n\n\f5 Application to stability analysis\n\nIn this section we discuss the local stability of LBP and the local structure of the Bethe free energy\naround a LBP \ufb01xed point. Heskes [6] shows that a locally stable \ufb01xed point of suf\ufb01ciently damped\nLBP is a local minima of the Bethe free energy. The converse is not necessarily true in general, and\nwe will elucidate the gap between these two properties.\n\nFirst, we regard the LBP update as a dynamical system. Since the model is binary, each message\n\u00b5i\u2192j(xj) is parametrized by one parameter, say \u03b7i\u2192j. The state of LBP algorithm is expressed\n\u2208 C( \u20d7E), and the update rule (2) is identi\ufb01ed with a transform T on C( \u20d7E),\nby (cid:17) = (\u03b7e)e\u2208 \u20d7E\n(cid:17)new = T ((cid:17)). Then, the set of \ufb01xed points of LBP is {(cid:17)\nA \ufb01xed point (cid:17)\ncon-\nverges to (cid:17)\naround the \ufb01xed point. As is\ndiscussed in [14], (cid:17)\nTo suppress oscillatory behaviors of LBP, damping of update T\u03f5 := (1 \u2212 \u03f5)T + \u03f5I is sometimes\nuseful, where 0 \u2264 \u03f5 < 1 is a damping strength and I is the identity. A \ufb01xed point is locally stable\nwith some damping if and only if Spec(T\n\nis called locally stable if LBP starting with a point suf\ufb01ciently close to (cid:17)\n\u221e\n\n. The local stability is determined by the linearizion T\n\n)) \u2282 {\u03bb \u2208 C|Re\u03bb < 1}.\n\n)) \u2282 {\u03bb \u2208 C||\u03bb| < 1}.\n\nis locally stable if and only if Spec(T\n\n\u221e \u2208 C( \u20d7E)|T ((cid:17)\n\n\u221e}.\n\n) = (cid:17)\n\n((cid:17)\n\n((cid:17)\n\n\u221e\n\n\u221e\n\n\u221e\n\n\u221e\n\n\u221e\n\n\u221e\n\n\u2032\n\n\u2032\n\n\u2032\n\n\u2032\n\n\u2032\n\n\u2032\n\n\u221e\n\n\u221e\n\n\u221e\n\n((cid:17)\n\n((cid:17)\n\n((cid:17)\n\n)P\n\n) \u2192 P T\n\ni\u2192j and functions as \u03c8ij \u2192 bij/(bibj) and \u03c8i \u2192 bi, where \u00b5\n\u221e\n\nThere are many representations of the linearization (derivative) of LBP update (see [14, 15]), we\nchoose a good coordinate following Furtlehner et al [16]. In section 4 of [16], they transform mes-\nsages as \u00b5i\u2192j \u2192 \u00b5i\u2192j/\u00b5\n\u221e\ni\u2192j is\nthe message of the \ufb01xed point. This changes only the representations of messages and functions,\n\u22121 with an\nand does not affect LBP essentially. This transformation causes T\ninvertible matrix P . Using this transformation, we see that the following fact holds. (See supple-\nmentary material for the detail.)\n\u221e\nTheorem 5 ([16], Proposition 4.5). Let ui\u2192j be given by (3), (5) and (13) at a LBP \ufb01xed point (cid:17)\n) is similar to UM, i.e. UM = P T\n\u22121 with an invertible matrix P .\nThe derivative T\n)) = det(I \u2212 UM), the formula in theorem 3 implies a direct link between\nSince det(I \u2212 T\n\u221e\n\u221e\nthe linearization T\n) and the local structure of the Bethe free energy. From theorem 4, we have\n)) \u2282 C\\R\u22651.\nthat a \ufb01xed point of LBP is a local minimum of the Bethe free energy if Spec(T\nIt is now clear that the condition for positive de\ufb01niteness, local stability of damped LBP and local\nstability of undamped LBP are given in terms of the set of eigenvalues, C \\ R\u22651, {\u03bb \u2208 C|Re\u03bb < 1}\nand {\u03bb \u2208 C||\u03bb| < 1} respectively. A locally stable \ufb01xed point of suf\ufb01ciently damped LBP is a\nlocal minimum of the Bethe free energy, because {\u03bb \u2208 C|Re\u03bb < 1} is included in C \\ R\u22651. This\nreproduces Heskes\u2019s result [6]. Moreover, we see the gap between the locally stable \ufb01xed points\nwith some damping and the local minima of the Bethe free energy: if Spec(T\n)) is included in\nC \\ R\u22651 but not in {\u03bb \u2208 C|Re\u03bb < 1}, the \ufb01xed point is a local minimum of the Bethe free energy\nthough it is not a locally stable \ufb01xed point of LBP with any damping.\n\n((cid:17)\n\u2032\n\n)P\n\n((cid:17)\n\n((cid:17)\n\n((cid:17)\n\n((cid:17)\n\n\u221e\n\n\u221e\n\n\u221e\n\n\u2032\n\n.\n\n\u2032\n\n\u2032\n\n\u2032\n\nIt is interesting to ask under which condition a local minimum of the Bethe free energy is a stable\n\ufb01xed point of (damped) LBP. While we do not know a complete answer, for an attractive model,\nwhich is de\ufb01ned by Jij \u2265 0, the following theorem implies that if a stable \ufb01xed point becomes\nunstable by changing Jij and hi, the corresponding local minimum also disappears.\nTheorem 6. Let us consider continuously parametrized attractive models {\u03c8ij(t), \u03c8i(t)}, e.g. t\n\u22121hixi). For given t, run LBP\nis a temperature: \u03c8ij(t) = exp(t\nalgorithm and \ufb01nd a (stable) \ufb01xed point. If we continuously change t and see the LBP \ufb01xed point\nbecomes unstable across t = t0, then the corresponding local minimum of the Bethe free energy\nbecomes a saddle point across t = t0.\nProof. From (3), we see bij(xi, xj) \u221d exp(Jijxixj + \u03b8ixi + \u03b8jxj) for some \u03b8i and \u03b8j. From\nJij \u2265 0, we have Covbij [xi, xj] = \u03c7ij \u2212 mimj \u2265 0, and thus ui\u2192j \u2265 0. When the LBP \ufb01xed point\nbecomes unstable, the Perron Frobenius eigenvalue of UM goes over 1, which means det(I \u2212UM)\ncrosses 0. From theorem 3 we see that det(\u22072F ) becomes positive to negative at t = t0.\n\n\u22121Jijxixj) and \u03c8i(t) = exp(t\n\nTheorem 6 extends theorem 2 of [14], which discusses only the case of vanishing local \ufb01elds hi = 0\nand the trivial \ufb01xed point (i.e. mi = 0).\n\n6\n\n\f6 Application to uniqueness of LBP \ufb01xed point\n\nThe uniqueness of LBP \ufb01xed point is a concern of many studies, because the property guarantees that\nLBP \ufb01nds the global minimum of the Bethe free energy if it converges. The major approaches to the\nuniqueness is to consider equivalent minimax problem [5], contraction property of LBP dynamics\n[17, 18], and to use the theory of Gibbs measure [19]. We will propose a different, differential\ntopological approach to this problem.\n\nIn our approach, in combination with theorem 3, the following theorem is the basic apparatus.\nTheorem 7. If det\u22072F (q) \u0338= 0 for all q \u2208 (\u2207F )\n\n\u22121(0) then\n\n{\nif x > 0,\n1\n\u22121 if x < 0.\n\n\u2211\n\n(\n\nq:\u2207F (q)=0\n\n)\n\ndet\u22072F (q)\n\nsgn\n\n= 1,\n\nwhere sgn(x) :=\n\nWe call each summand, which is +1 or \u22121, the index of F at q.\nNote that the set (\u2207F )\n\u22121(0), which is the stationary points of the Bethe free energy, coincides with\nthe \ufb01xed points of LBP. The above theorem asserts that the sum of indexes of all the \ufb01xed points\nmust be one. As a consequence, the number of the \ufb01xed points of LBP is always odd. Note also that\nthe index is a local quantity, while the assertion expresses the global structure of the function F .\n\nFor the proof of theorem 7, we prepare two lemmas. The proof of lemma 1 is shown in the supple-\nmentary material. Lemma 2 is a standard result in differential topology, and we refer [20] theorem\n13.1.2 and comments in p.104 for the proof.\nLemma 1. If a sequence {qn} \u2282 L(G) converges to a point q\u2217 \u2208 \u2202L(G), then \u2225\u2207F (qn)\u2225 \u2192 \u221e,\nwhere \u2202L(G) is the boundary of L(G) \u2282 RN +M .\n\u2211\nLemma 2. Let M1 and M2 be compact, connected and orientable manifolds with boundaries.\nAssume that the dimensions of M1 and M2 are the same. Let f : M1 \u2192 M2 be a smooth map\nsatisfying f (\u2202M1) \u2282 \u2202M2. For a regular value of p \u2208 M2, i.e. det(\u2207f (q)) \u0338= 0 for all q \u2208 f\n\u22121(p),\nq\u2208f\u22121(p) sgn(det\u2207f (q)). Then deg f does not\nwe de\ufb01ne the degree of the map f by deg f :=\ndepend on the choice of a regular value p \u2208 M2.\nSketch of proof. De\ufb01ne a map (cid:8) : L(G) \u2192 RN +M by (cid:8) := \u2207F +\n\u2211\ndepend on h and J as seen from (6). Then it is enough to prove\n\n. Note that (cid:8) does not\n\n\u2211\n\n(\n\n)\n\nh\nJ\n\nsgn(det\u2207(cid:8)(q)) =\n\nsgn(det\u2207(cid:8)(q)),\n\n(17)\n\nq\u2208(cid:8)\u22121((h\nJ))\n\nq\u2208(cid:8)\u22121(0)\n\n)\u2225. From lemma 1, for suf\ufb01ciently large n0, we have (cid:8)\n\nL(G)|\u2211\n\u2211\n\u22121(0) has a unique element {mi = 0, \u03c7ij = 0}, at which \u22072F is positive de\ufb01nite, and\nbecause (cid:8)\nto satisfy K\u2212\u03f5 > \u2225(\nthe right hand side of (17) is equal to one. De\ufb01ne a sequence of manifolds {Cn} by Cn := {q \u2208\n\u2212 log bij \u2264 n}, which increasingly converges to L(G). Take K > 0 and \u03f5 > 0\nij\u2208E\nand (cid:8)(\u2202Cn0) \u2229 B0(K) = \u03d5, where B0(K) is the closed ball of radius K at the origin. Let (cid:5)\u03f5 :\nRN +M \u2192 B0(K) be a smooth map that is the identity on B0(K \u2212 \u03f5), monotonically increasing on\n\u2225x\u2225, and (cid:5)\u03f5(x) = K\u2225x\u2225 x for \u2225x\u2225 \u2265 K. We obtain a map ~(cid:8) := (cid:5)\u03f5 \u25e6 (cid:8) : Cn0\n\u2192 B0(K) such that\n~(cid:8)(\u2202Cn0) \u2282 \u2202B0(K). Applying lemma 2 yields (17).\n\n) \u2282 Cn0\n\n\u22121(0), (cid:8)\n\nxi,xj\nh\nJ\n\n(\n\n\u22121\n\nh\nJ\n\nIf we can guarantee that the index of every \ufb01xed point is +1 in advance of running LBP, we conclude\nthat \ufb01xed point of LBP is unique. We have the following a priori information for \u03b2.\nLemma 3. Let \u03b2ij be given by (15) at any \ufb01xed point of LBP. Then |\u03b2ij| \u2264 tanh(|Jij|) and\nsgn(\u03b2ij) = sgn(Jij) hold.\nProof. From (3), we see that bij(xi, xj) \u221d exp(Jijxixj + \u03b8ixi + \u03b8jxj) for some \u03b8i and\n\u03b8j. With (15) and straightforward computation, we obtain \u03b2ij = sinh(2Jij)(cosh(2\u03b8i) +\ncosh(2Jij))\n\n\u22121/2. The bound is attained when \u03b8i = 0 and \u03b8j = 0.\n\n\u22121/2(cosh(2\u03b8j)+cosh(2Jij))\n\nFrom theorem 7 and lemma 3, we can immediately obtain the uniqueness condition in [18], though\nthe stronger contractive property is proved under the same condition in [18].\n\n7\n\n\fFigure 2: Graph ^G.\n\nFigure 1: Graph of Example 2.\nCorollary 3 ([18]). If \u03c1(J M) < 1, then the \ufb01xed point of LBP is unique, where J is a diagonal\nmatrix de\ufb01ned by Je,e\u2032 = tanh(|Je|)\u03b4e,e\u2032.\nProof. Since |\u03b2ij| \u2264 tanh(|Jij|), we have \u03c1(BM) \u2264 \u03c1(J M) < 1. ([12] Theorem 8.1.18.) Then\ndet(I \u2212 BM) = det(I \u2212 UM) > 0 implies that the index of any LBP \ufb01xed point must be +1.\n\nFigure 3: Two other types.\n\n\u2032\nij = Jijsisj and h\n\nIn the proof of the above corollary, we only used the bound of modulus. In the following case of\ncorollary 4, we can utilize the information of signs. To state the corollary, we need a terminology.\nThe interactions {Jij, hi} and {J\n} are said to be equivalent if there exists (si) \u2208 {\u00b11}V such\n\u2032\ni = hisi. Since an equivalent model is obtained by gauge transformation\nthat J\nxi \u2192 xisi, the uniqueness property of LBP for equivalent models is unchanged.\nCorollary 4. If the number of linearly independent cycle of G is two (i.e. M \u2212 N + 1 = 2), and the\ninteraction is not equivalent to attractive model, then the LBP \ufb01xed point is unique.\n\n\u2032\n\u2032\nij, h\ni\n\nThe proof is shown in the supplementary material. We give an example to illustrate the outline.\nExample 2. Let V := {1, 2, 3, 4} and E := {12, 13, 14, 23, 34}. The interactions are given by\narbitrary {hi} and {\u2212J12, J13, J14, J23, J34} with Jij \u2265 0. See \ufb01gure 1. It is enough to check that\ndet(I \u2212 BM) > 0 for arbitrary 0 \u2264 \u03b213, \u03b223, \u03b214, \u03b234 < 1 and \u22121 < \u03b212 \u2264 0. Since the prime\ncycles of G bijectively correspond to those of ^G (in \ufb01gure 2), we have det(I\u2212BM) = det(I\u2212 ^B ^M),\nwhere ^\u03b2e1 = \u03b212\u03b223, ^\u03b2e2 = \u03b213, and ^\u03b2e3 = \u03b234. We see that det(I \u2212 ^B ^M) = (1 \u2212 ^\u03b2e1\n\u2212\n^\u03b2e3) > 0. In other cases,\n^\u03b2e1\nwe can reduce to the graph ^G or the graphs in \ufb01gure 3 similarly (see the supplementary material).\n\n^\u03b2e3)(1 \u2212 ^\u03b2e1\n\n\u2212 2 ^\u03b2e1\n\n^\u03b2e3 + 2 ^\u03b2e1\n\n^\u03b2e2\n\n\u2212 ^\u03b2e2\n\n^\u03b2e2\n\n\u2212 ^\u03b2e1\n\n^\u03b2e3\n\n\u2212 ^\u03b2e2\n\n^\u03b2e2\n\n^\u03b2e2\n\n^\u03b2e3\n\n^\u03b2e3\n\nFor attractive models, the \ufb01xed point of the LBP is not necessarily unique.\nFor graphs with multiple cycles, all the existing results on uniqueness make assumptions that up-\nperbound |Jij| essentially. In contrast, corollary 4 applies to arbitrary strength of interactions if the\ngraph has two cycles and the interactions are not attractive. It is noteworthy that, from corollary 2,\nthe Bethe free energy is non-convex in the situation of corollary 4, while the \ufb01xed point is unique.\n\n7 Concluding remarks\n\nFor binary pairwise models, we show the connection between the edge zeta function and the Bethe\nfree energy in theorem 3, in the proof of which the multi-variable version of Ihara\u2019s formula (theorem\n2) is essential. After the initial submission of this paper, we found that theorem 3 is extended to a\nmore general class of models including multinomial models and Gaussian models represented by\narbitrary factor graphs. We will discuss the extended formula and its applications in a future paper.\n\nSome recent researches on LBP have suggested the importance of zeta function. In the context of the\nLDPC code, which is an important application of LBP, Koetter et al [21, 22] show the connection\nbetween pseudo-codewords and the edge zeta function. On the LBP for the Gaussian graphical\nmodel, Johnson et al [23] give zeta-like product formula of the partition function. While these are\nnot directly related to our work, pursuing covered connections is an interesting future research topic.\n\nAcknowledgements\n\nThis work was supported in part by Grant-in-Aid for JSPS Fellows 20-993 and Grant-in-Aid for\nScienti\ufb01c Research (C) 19500249.\n\n8\n\n\fReferences\n\n[1] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.\n\nMorgan Kaufmann Publishers, San Mateo, CA, 1988.\n\n[2] K. Murphy, Y. Weiss, and M.I. Jordan. Loopy belief propagation for approximate inference:\n\nAn empirical study. Proc. of Uncertainty in AI, 15:467\u2013475, 1999.\n\n[3] J.S. Yedidia, W.T. Freeman, and Y. Weiss. Generalized belief propagation. Adv. in Neural\n\nInformation Processing Systems, 13:689\u201395, 2001.\n\n[4] Y. Weiss. Correctness of Local Probability Propagation in Graphical Models with Loops.\n\nNeural Computation, 12(1):1\u201341, 2000.\n\n[5] T. Heskes. On the uniqueness of loopy belief propagation \ufb01xed points. Neural Computation,\n\n16(11):2379\u20132413, 2004.\n\n[6] T. Heskes. Stable \ufb01xed points of loopy belief propagation are minima of the Bethe free energy.\n\nAdv. in Neural Information Processing Systems, 15, pages 343\u2013350, 2002.\n\n[7] K. Hashimoto. Zeta functions of \ufb01nite graphs and representations of p-adic groups. Automor-\n\nphic forms and geometry of arithmetic varieties, 15:211\u2013280, 1989.\n\n[8] H.M. Stark and A.A. Terras. Zeta functions of \ufb01nite graphs and coverings. Advances in\n\nMathematics, 121(1):124\u2013165, 1996.\n\n[9] Y. Ihara. On discrete subgroups of the two by two projective linear group over p-adic \ufb01elds.\n\nJournal of the Mathematical Society of Japan, 18(3):219\u2013235, 1966.\n\n[10] H. Bass. The Ihara-Selberg zeta function of a tree lattice. Internat. J. Math, 3(6):717\u2013797,\n\n1992.\n\n[11] P. Pakzad and V. Anantharam. Belief propagation and statistical physics. Conference on\n\nInformation Sciences and Systems, (225), 2002.\n\n[12] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge University Press, 1990.\n[13] K. Hashimoto. On zeta and L-functions of \ufb01nite graphs.\n\nInternat. J. Math, 1(4):381\u2013396,\n\n1990.\n\n[14] JM Mooij and HJ Kappen. On the properties of the Bethe approximation and loopy belief\npropagation on binary networks. Journal of Statistical Mechanics: Theory and Experiment,\n(11):P11012, 2005.\n\n[15] S. Ikeda, T. Tanaka, and S. Amari. Information geometry of turbo and low-density parity-check\n\ncodes. IEEE Transactions on Information Theory, 50(6):1097\u20131114, 2004.\n\n[16] C. Furtlehner, J.M. Lasgouttes, and A. De La Fortelle. Belief propagation and Bethe approxi-\n\nmation for traf\ufb01c prediction. INRIA RR-6144, Arxiv preprint physics/0703159, 2007.\n\n[17] A.T. Ihler, JW Fisher, and A.S. Willsky. Loopy belief propagation: Convergence and effects\n\nof message errors. Journal of Machine Learning Research, 6(1):905\u2013936, 2006.\n\n[18] J. M. Mooij and H. J. Kappen. Suf\ufb01cient Conditions for Convergence of the Sum-Product\n\nAlgorithm. IEEE Transactions on Information Theory, 53(12):4422\u20134437, 2007.\n\n[19] S. Tatikonda and M.I. Jordan. Loopy belief propagation and Gibbs measures. Uncertainty in\n\nAI, 18:493\u2013500, 2002.\n\n[20] B.A. Dubrovin, A.T. Fomenko, S.P. Novikov, and Burns R.G. Modern Geometry: Methods\n\nand Applications: Part 2: the Geometry and Topology of Manifolds . Springer-Verlag, 1985.\n\n[21] R. Koetter, W.C.W. Li, PO Vontobel, and JL Walker. Pseudo-codewords of cycle codes via\n\nzeta functions. IEEE Information Theory Workshop, pages 6\u201312, 2004.\n\n[22] R. Koetter, W.C.W. Li, P.O. Vontobel, and J.L. Walker. Characterizations of pseudo-codewords\n\nof (low-density) parity-check codes. Advances in Mathematics, 213(1):205\u2013229, 2007.\n\n[23] J.K. Johnson, V.Y. Chernyak, and M. Chertkov. Orbit-Product Representation and Correction\nof Gaussian Belief Propagation. Proceedings of the 26th International Conference on Machine\nLearning, (543), 2009.\n\n9\n\n\f", "award": [], "sourceid": 420, "authors": [{"given_name": "Yusuke", "family_name": "Watanabe", "institution": null}, {"given_name": "Kenji", "family_name": "Fukumizu", "institution": null}]}