{"title": "Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation", "book": "Advances in Neural Information Processing Systems", "page_first": 873, "page_last": 880, "abstract": null, "full_text": "Counting Solution Clusters in Graph Coloring\n\nProblems Using Belief Propagation\n\nLukas Kroc\n\nAshish Sabharwal\n\nBart Selman\n\nDepartment of Computer Science\n\nCornell University, Ithaca NY 14853-7501, U.S.A.\n{kroc,sabhar,selman}@cs.cornell.edu \u2217\n\nAbstract\n\nWe show that an important and computationally challenging solution space feature\nof the graph coloring problem (COL), namely the number of clusters of solutions,\ncan be accurately estimated by a technique very similar to one for counting the\nnumber of solutions. This cluster counting approach can be naturally written in\nterms of a new factor graph derived from the factor graph representing the COL\ninstance. Using a variant of the Belief Propagation inference framework, we can\nef\ufb01ciently approximate cluster counts in random COL problems over a large range\nof graph densities. We illustrate the algorithm on instances with up to 100, 000\nvertices. Moreover, we supply a methodology for computing the number of clus-\nters exactly using advanced techniques from the knowledge compilation literature.\nThis methodology scales up to several hundred variables.\n\n1 Introduction\n\nMessage passing algorithms, in particular Belief Propagation (BP), have been very successful in\nef\ufb01ciently computing interesting properties of succinctly represented large spaces, such as joint\nprobability distributions. Recently, these techniques have also been applied to compute properties\nof discrete spaces, in particular, properties of the space of solutions of combinatorial problems. For\nexample, for propositional satis\ufb01ability (SAT) and graph coloring (COL) problems, marginal prob-\nability information about the uniform distribution over solutions (or similar combinatorial objects)\nhas been the key ingredient in the success of BP-like algorithms. Most notably, the survey propa-\ngation (SP) algorithm utilizes this information to solve very large hard random instances of these\nproblems [3, 11].\n\nEarlier work on random ensembles of Constraint Satisfaction Problems (CSPs) has shown that the\ncomputationally hardest instances occur near phase boundaries, where instances go from having\nmany globally satisfying solutions to having no solution at all (a \u201csolution-focused picture\u201d). In\nrecent years, this picture has been re\ufb01ned and it was found that a key factor in determining the hard-\nness of instances in terms of search algorithm (or sampling algorithm) is the question: how are the\nsolutions spatially distributed within the search space? This has made the structure of the solution\nspace in terms of its clustering properties a key factor in determining the performance of combina-\ntorial search methods (a \u201ccluster-focused picture\u201d). Can BP-like algorithms be used to provide such\ncluster-focused information? For example, how many clusters are there in a solution space? How\nbig are the clusters? How are they organized? Answers to such questions will shed further light into\nour understanding of these hard combinatorial problems and lead to better algorithmic approaches\nfor reasoning about them, be it for \ufb01nding one solution or answering queries of probabilistic infer-\nence about the set of solutions. The study of the solution space geometry has indeed been the focus\n\n\u2217This work was supported by IISI, Cornell University (AFOSR grant FA9550-04-1-0151), DARPA (REAL\n\ngrant FA8750-04-2-0216), and NSF (grant 0514429).\n\n\fof a number of recent papers [e.g. 1, 2, 3, 7, 9, 11], especially by the statistical physics community,\nwhich has developed extensive theoretical tools to analyze such spaces under certain structural as-\nsumptions and large size limits. We provide a purely combinatorial method for counting the number\nof clusters, which is applicable even to small size problems and can be approximated very well by\nmessage passing techniques.\n\nSolutions can be thought of as \u2018neighbors\u2019 if they differ in the value of one variable, and the transitive\nclosure of the neighbor relation de\ufb01nes clusters in a natural manner. Counting the number of clusters\nis a challenging problem. To begin with, it is not even clear what is the best succinct way to represent\nclusters. One relatively crude but useful way is to represent a cluster by the set of \u2018backbone\u2019\nvariables in that cluster, i.e., variables that take a \ufb01xed value in all solutions within the cluster.\nInterestingly, while it is easy (polynomial time) to verify whether a variable assignment is indeed a\nsolution of CSP, the same check is much harder for a candidate cluster represented by the set of its\nbackbone variables.\n\nWe propose one of the \ufb01rst scalable methods for estimating the number of clusters of solutions of\ngraph coloring problems using a belief propagation like algorithm. While the na\u00a8\u0131ve method, based\non enumeration of solutions and pairwise distances, scales to graph coloring problems with 50 or so\nnodes and a recently proposed local search based method provides estimates up to a few hundred\nnode graphs [7], our approach\u2014being based on BP\u2014easily provides fast estimates for graphs with\n100, 000 nodes. We validate the accuracy of our approach by also providing a fairly non-trivial\nexact counting method for clusters, utilizing advanced knowledge compilation techniques. Our\napproach works with the factor graph representation of the graph coloring problem. Yedidia et al.\n[12] showed that if one can write the so-called \u201cpartition function\u201d, Z, for a quantity of interest\nin a factor graph with non-negative weights, then there is a fairly mechanical variational method\nderivation that yields belief propagation equations for estimating Z. Under certain assumptions,\nwe derive a partition function style quantity, Z(\u22121), to count the number of clusters. We then use\nthe variational method to obtain BP equations for estimating Z(\u22121). Our experiments with random\ngraph coloring problems show that Z(\u22121) itself is an extremely accurate estimate of the number of\nclusters, and so is its approximation, ZBP(\u22121) , obtained from our BP equations.\n\n2 Preliminaries\n\nThe graph coloring problem can be expressed in the form of a factor graph, a bipartite graph with\ntwo kinds of nodes. The variable nodes, ~x = (x1, . . . , xn), represent the variables in the problem (n\nvertices to be colored) with their discrete domain Dom = {c1, . . . , ck} (k colors). The factor nodes,\n\u03b1, . . ., with associated factor functions f\u03b1, . . . , represent the constrains of the problem (no two\nadjacent vertices have the same color). Each factor function is a Boolean function with arguments\n~x\u03b1 (a subset of variables from ~x) and range {0, 1}, and evaluates to 1 if and only if (iff) the associated\nconstraint is satis\ufb01ed. An edge connects a variable xi with factor f\u03b1 iff the variable appears in the\nconstraint represented by the factor node, which we denote by i \u2208 \u03b1. In the graph coloring problem,\neach factor function has exactly two variables.\n\nIn the factor representation, each variable assignment ~x is thought of as having a weight equal to the\n\nproduct of the values that all factors evaluate to. We denote this product by F (~x) := Q\u03b1 f\u03b1(~x\u03b1).\n\nIn our case, the weight of an assignment ~x is 1 if all of the factors have value of 1, and 0 otherwise.\nThe assignments with weight 1 correspond precisely to legal colorings, or solutions to the problem.\nThe number of solutions can thus be expressed as the weighted sum across all possible assignments.\nWe denote this quantity by Z, the so-called partition function:\n\nZ := X\n\n~x\u2208Domn\n\nF (~x) = X\n\n~x\u2208Domn\n\nY\n\n\u03b1\n\nf\u03b1(~x\u03b1)\n\n(1)\n\nWe de\ufb01ne the solution space of a graph coloring problem to be the set of all its legal colorings. Two\nlegal colorings (or solutions) are called neighbors if they differ in the color of one vertex.\nDe\ufb01nition 1 (Solution Cluster). A set of solutions C \u2286 S of a solution space S is a cluster if it is\na maximal subset such that any two solutions in C can be connected by a sequence from C where\nconsecutive solutions are neighbors.\nIn other words, clusters are connected components of the \u201csolution graph\u201d which has solutions as\nnodes and an edge between two solutions if they differ in the value of exactly one variable.\n\n\f3 A Partition Function Style Expression for Counting Clusters\n\nIn this section we consider a method for estimating the number of solution clusters of a graph\ncoloring problem. We brie\ufb02y describe the concepts here; a more in-depth treatment, including\nformal results, may be found in [8]. First let us extend the de\ufb01nition of the function F so that it\nmay be evaluated on an extended domain DomExt := P({c1, . . . , ck}) \\ \u2205 where c1, . . . , ck are\nthe k domain values (colors) of each of the problem variables, and P is the power set operator\n(so |DomExt| = 2k \u2212 1). Each generalized assignment ~y \u2208 DomExtn thus associates a (non-\nempty) set of values with each original variable, de\ufb01ning a hypercube in the search space for F .\nWe generalize F and f\u03b1 to this extended domain in the natural way, F \u2032(~y) := Q~x\u2208~y F (~x), and\n\u03b1(~y\u03b1) := Q~x\u03b1\u2208~y\u03b1\nf \u2032\nf\u03b1(~x\u03b1), where the relation \u2208 is applied point-wise, as will be the case with any\nrelational operators used on vectors in this text. This means that F \u2032 evaluates to 1 on a hypercube\niff F evaluates to 1 on all points within that hypercube.\n\nLet us \ufb01rst assume that the solution space we work with decomposes into a set of separated hy-\npercubes, so clusters correspond exactly to the hypercubes; by separated hypercubes, we mean\nthat points in one hypercube differ from points in others in at least two values. E.g., ~y1 =\n({c1} , {c1} , {c1}) and ~y2 = ({c2} , {c3} , {c1, c2}) are separated hypercubes in three dimensions.\nThis allows us to develop a surprisingly simple expression for counting the number of clusters, and\nwe will later see that the same expression applies with high precision also to solution spaces of much\nmore complex instances of graph coloring problems. Consider the indicator function \u03c7(~y) for the\nproperty that ~y \u2208 DomExtn is a maximal solution hypercube contained in the solution space:\n\n\u03c7(~y) := F \u2032(~y)\n| {z }\n\n~y is legal\n\ni\n\n\u00b7Y\n|\n\nY\n\nvi /\u2208yi\n\n(cid:16)1 \u2212 F \u2032(~y[yi \u2190 yi \u222a {vi}])(cid:17)\n}\n\nno point-wise generalization is legal\n\n{z\n\ni] denotes the substitution of y\u2032\n\nHere ~y[yi \u2190 y\u2032\ni into yi in ~y. Note that if the solution clusters are in\nfact hypercubes, then variable values that can be \u201cextended\u201d independently can also be extended all\nat once, that is, F \u2032(~y[yi \u2190 yi \u222a {vi}]) = 1 and F \u2032(~y[yj \u2190 yj \u222a {vj}]) = 1 implies F (~y[yi \u2190\nyi \u222a {vi} , yj \u2190 yj \u222a {vj}]) = 1. Moreover, any F \u2032(~y[yi \u2190 yi \u222a {vi}]) implies F (~y). Using these\nobservations, \u03c7(~y) can be reformulated by factoring out the product as follows. Here #o(~y) denotes\nthe number of odd-size elements of ~y, and #e(~y) the number of even-size ones.\n\n\u03c7(~y) = F \u2032(~y)(cid:16) X\n\n~y\u2032\u2208(P(Dom))n\\~y\n\n(\u22121)#o(~y\u2032) Y\n|\n\ni\n\nF \u2032(~y[yi \u2190 yi \u222a {vi}])\n\nY\n\nvi\u2208y\u2032\ni\n\n(cid:17)\n\n=F \u2032(~y\u222a~y\u2032) by hypercube assumption\n\n{z\n\n}\n\n~z:=~y\u222a~y\u2032\n\n= X\n\n~z\u2287~y\n\n(\u22121)#o(~z\\~y)F \u2032(~z) = (\u22121)#e(~y) X\n\n(\u22121)#e(~z)F \u2032(~z)\n\n~z\u2287~y\n\nFinally, to count the number of maximal hypercubes \ufb01tting into the set of solutions, we sum the\nindicator function \u03c7(~y) across all vectors ~y \u2208 DomExtn:\n(\u22121)#e(~z)F \u2032(~z) = X\n\n(\u22121)#e(~y)(cid:17)\n\nX\n\n~y\n\n\u03c7(~y) = X\n= X\n\n~y\n\n(\u22121)#e(~y) X\n(\u22121)#e(~z)F \u2032(~z)(cid:16)Y\n\n~z\u2287~y\n\n~z\n\n~z\n\n(\u22121)#e(~z)F \u2032(~z)(cid:16) X\n(cid:17) = X\n\n\u2205 /\u2208~y\u2286~z\n\n(\u22121)#e(~z)F \u2032(~z)\n\n~z\n\nX\n\n\u2205 /\u2208~yi\u2286~zi\n\ni\n\n(\u22121)\u03b4e(yi)\n\n|\n\n{z\n\n=1\n\n}\n\nThe expression above is important for our study, and we denote it by Z(\u22121):\n\nZ(\u22121) := X\n\n~z\u2208DomExtn\n\n(\u22121)#e(~z)F \u2032(~z) = X\n\n~y\u2208DomExtn\n\n(\u22121)#e(~y) Y\n\n\u03b1\n\nf \u2032\n\u03b1(~y\u03b1)\n\n(2)\n\nThe notation Z(\u22121) is chosen to emphasize its relatedness to the partition function (1) denoted by\nZ, and indeed the two expressions differ only in the (\u22121) term. It is easily seen that if the solution\nspace consists of a set of separated hypercubes, then Z(\u22121) exactly captures the number of clusters\n(each separated hypercube is a cluster). Surprisingly, this number is remarkably accurate even for\nrandom coloring problems as we will see in Section 6, Figure 1.\n\n\f4 Exact Computation of the Number of Clusters and Z(\u22121)\n\nObtaining the exact number of clusters for reasonable size problems is crucial for evaluating our\nproposed approach based on Z(\u22121) and the corresponding BP equations to follow in Section 5. A\nna\u00a8\u0131ve way is to explicitly enumerate all solutions, compute their pairwise Hamming distances, and\ninfer the cluster structure. Not surprisingly, this method does not scale well because the number of\nsolutions typically grows exponentially as the number of variables of the graph coloring problems\nincreases. We discuss here a much more scalable approach that uses two advanced techniques to\nthis effect: disjunctive negation normal form (DNNF) and binary decision diagrams (BDDs). Our\nmethod scales to graph coloring problems with a few hundred variables (see experimental results)\nfor computing both the exact number of clusters and the exact value of Z(\u22121).\n\nBoth DNNF [6] and BDD [4] are graph based data structures that have proven to be very effective\nin \u201cknowledge compilation\u201d, i.e., in converting a 0-1 function F into a (potentially exponentially\nlong, but often reasonably sized) standard form from which various interesting properties of F can\nbe inferred easily, often in linear time in the size of the DNNF formula or BDD. For our purposes,\nwe use DNNF to succinctly represent all solutions of F and a set of BDDs to represent solution\nclusters that we create as we traverse the DNNF representation. The only relevant details for us of\nthese two representations are the following: (1) DNNF is represented as an acyclic directed graph\nwith variables and their negations at the leaves and two kinds of internal nodes, \u201cor\u201d and \u201cand\u201d; \u201cor\u201d\nnodes split the set of solutions such that they differ in the value of the variable labeling the node but\notherwise have identical variables; \u201cand\u201d nodes partition the space into disjoint sets of variables; (2)\nBDDs represent arbitrary sets of solutions and support ef\ufb01cient intersection and projection (onto a\nsubset of variables) operations on these sets.\n\nWe use the compiler c2d [5] to obtain the DNNF form for F . Since c2d works on Boolean formulas\nand our F often has non-Boolean domains, we \ufb01rst convert F to a Boolean function F \u2032 using a\nunary encoding, i.e., by replacing each variable xi of F with domain size t with t Boolean variables\nx\u2032\ni,j, 1 \u2264 j \u2264 t, respecting the semantics: xi = j iff xi,j = 1. In order to ensure that F and F \u2032 have\nsimilar cluster structure of solutions, we relax the usual condition that only one of xi,1, . . . , xi,t\nmay be 1, thus effectively allowing the original xi to take multiple values simultaneously. This\nyields a generalized function: the domains of the variables of F \u2032 correspond to the power sets of the\ndomains of the respective variables of F . This generalization has the following useful property: if\ntwo solutions ~x(1) and ~x(2) are neighbors in the solution space of F , then the corresponding solutions\n~x\u2032(1) and ~x\u2032(2) are in the same cluster in the solution space of F \u2032.\n\nComputing the number of clusters. Given F \u2032, we run c2d on it to obtain an implicit representation\nof all solutions as a DNNF formula F \u2032\u2032. Next, we traverse F \u2032\u2032 from the leaf nodes up, creating\nclusters as we go along. Speci\ufb01cally, with each node U of F \u2032\u2032, we associate a set SU of BDDs,\none for each cluster in the sub-formula contained under U . The set of BDDs for the root node of\nF \u2032\u2032 then corresponds precisely to the set of solution clusters of F \u2032, and thus of F . These BDDs are\ncomputed as follows. If U is a leaf node of F \u2032\u2032, it represents a Boolean variable or its negation and\nSU consists of the single one-node BDD corresponding to this Boolean literal. If U is an internal\nnode of F \u2032\u2032 labeled with the variable xU and with children L and R, the set of BDDs SU is computed\nas follows. If U is an \u201cor\u201d node, then we consider the union SL \u222a SR of the two sets of BDDs and\nmerge any two of these BDDs if they are adjacent, i.e., have two solutions that are neighbors in the\nsolution space (since the DNNF form guarantees that the BDDs in SL and SR already must differ\nin the value of the variable xU labeling U , the adjacency check is equivalent to testing whether the\ntwo BDDs, with xU projected out, have a solution in common; this is a straightforward projection\nand intersection operation for BDDs); in the worst case, this leads to |SL| + |SR| cluster BDDs\nin SU . Similarly, if U is an \u201cand\u201d node, then SU is constructed by considering the cross product\n{bLand bR | bL \u2208 SL, bR \u2208 SR} of the two sets of BDDs and merging adjacent resulting BDDs as\nbefore; in the worst case, this leads to |SL| \u00b7 |SR| cluster BDDs in SU .\nEvaluating Z(\u22121). The exact value of Z(\u22121) on F \u2032 can also be evaluated easily once we have the\nDNNF representation F \u2032\u2032. In fact, as is re\ufb02ected in our experimental results, evaluation of Z(\u22121)\nis a much more scalable process than counting clusters because it requires a simple traversal of F \u2032\u2032\nwithout the need for maintaining BDDs. With each node U of F \u2032\u2032, we associate a value VU which\nequals precisely the difference between the number of solutions below U with an even number\nof positive literals and those with an odd number of positive literals; Z(\u22121) then equals (\u22121)N\n\n\ftimes the value thus associated with the root node of F \u2032\u2032. These values are computed bottom-\nIf U is a leaf node labeled with a positive (or negative) literal, then VU = \u22121\nup as follows.\n(or 1, resp.).\nIf U is an \u201cor\u201d node with children L and R, then VU = VL + VR. This works\nbecause L and R have identical variables. Finally, if U is an \u201cand\u201d node with children L and R,\nthen VU = VLVR. This last computation works because L and R are on disjoint sets of variables\nand because of the following observation. Suppose L has V e\nL solutions with an even number of\npositive literals and V o\nL solutions with an odd number of positive literals; similarly for R. Then\nVU = (V e\nL V o\nR) \u2212 (V e\n\nR) = VLVR.\n\nR) = (V e\n\nL V o\n\nR + V o\n\nR + V o\n\nL V e\n\nL V e\n\nL \u2212 V o\n\nL )(V e\n\nR \u2212 V o\n\n5 Belief Propagation Inference for Clusters\n\nWe present a version of the Belief Propagation algorithm that allows us to deal with the alternating\nsigns of Z(\u22121). The derivation follows closely the one given by Yedidia et al. [12] for standard BP,\ni.e., we will write equations for a stationary point of KL divergence of two sequences (not necessarily\nprobability distributions in our case). Since the Z(\u22121) expression involves both positive and negative\nterms, we must appropriately generalize some of the steps.\n\nGiven a function p(~y) (the target function, with real numbers as its range) on DomExtn that is\nknown up to a normalization constant but with unknown marginal sums, we seek a function b(~y)\n(the trial function) to approximate p(~y), such that b\u2019s marginal sums are known. The target function\np(~y) is de\ufb01ned as p(~y) := 1\n\u03b1(~y\u03b1). We adopt previously used notation [12]:\n~y\u03b1 are values in ~y of variables that appear in factor (i.e. vertex) f \u2032\n\u03b1; ~y\u2212i are values of all variables\nin ~y except yi. The marginal sums can be extended in a similar way to allow for any number of\nvariables \ufb01xed in ~y, speci\ufb01ed by the subscript. When convenient, we treat the symbol \u03b1 as a set of\nindices of variables in f \u2032\n\u03b1, to be able to index them. We begin by listing the assumptions used in the\nderivation, both the ones that are used in the \u201cstandard\u201d BP, and two additional ones needed for the\ngeneralization. An assumption on b(~y) is legitimate if the corresponding condition holds for p(~y).\n\n(\u22121)#e(~y) Q\u03b1 f \u2032\n\nZ(\u22121)\n\nAssumptions: The standard assumptions, present in the derivation of standard BP [12], are:\n\n\u2022 Marginalization: bi(yi) = P~y\u2212i\n\nb(~y). This condition is legitimate,\nbut cannot be enforced with a polynomial number of constraints. Moreover, it might happen that\nthe solution found by BP does not satisfy it, which is a known problem with BP [10].\n\nb(~y) and b\u03b1(~y\u03b1) = P~y\u2212\u03b1\n\n\u2022 Normalization: Pyi\n\u2022 Consistency: \u2200\u03b1, i \u2208 \u03b1, yi : bi(yi) = P~y\u03b1\\i\n\nbi(yi) = P~y\u03b1\n\nb\u03b1(~y\u03b1) = 1. This is legitimate and explicitly enforced.\n\nb\u03b1(~y\u03b1). This is legitimate and explicitly enforced.\n\n\u2022 Tree-like decomposition: says that the weights b(~y) of each con\ufb01guration can be obtained from\nthe marginal sums as follows (di is the degree of the variable node yi in the factor graph):\n|b(~y)| = Q\u03b1 |b\u03b1(~y\u03b1)|\nQi |bi(yi)|di\u22121 . (The standard assumption is without the absolute values.) This assump-\ntion is not legitimate, and it is built-in, i.e., it is used in the derivation of the BP equations.\n\nTo appropriately handle the signs of b(~y) and p(~y), we have two additional assumptions. These are\nnecessary for the BP derivation applicable to Z(\u22121), but not for the standard BP equations.\n\n\u2022 Sign-correspondence: For all con\ufb01gurations ~y, b(~y) and p(~y) have the same sign (zero, being a\nsingular case, is treated as having a positive sign). This is a built-in assumption and legitimate.\n\u2022 Sign-alternation: bi(yi) is negative iff |yi| is even, and b\u03b1(~y\u03b1) is negative iff #e(~y\u03b1) is odd.\nThis is also a built-in assumption, but not necessarily legitimate; whether or not it is legitimate\ndepends on the structure of the solution space of a particular problem.\n\nThe Sign-alternation assumption can be viewed as an application of the inclusion-exclusion prin-\nciple, and is easy to illustrate on a graph coloring problem with only two colors. In this case, if\nF \u2032(~y) = 1, then yi = {c1} means that yi can have color 1, yi = {c2} that yi can have color 2,\nand yi = {c1, c2} that yi can have both colors. The third event is included in the \ufb01rst two, and its\nprobability must thus appear with a negative sign if the sum of probabilities is to be 1.\n\nKullback-Leibler divergence: The KL-divergence is traditionally de\ufb01ned for probability distribu-\ntions, for sequences of non-negative terms in particular. We need a more general measure, as our\nsequences p(~y) and b(~y) have alternating signs. But using the Sign-correspondence assumption, we\nobserve that the usual de\ufb01nition of KL-divergence is still applicable, since the term in the logarithm\n\n\fis non-negative: D(b k p) := P~y\u2208DomExtn b(~y) log b(~y)\n\n|p(~y)| . More-\nover, the following Lemma shows that the two properties of KL-divergence that make it suitable for\ndistance-minimization are still valid.\n\np(~y) = P~y\u2208DomExtn b(~y) log |b(~y)|\n\nLemma 1. Let b(.) and p(.) be (possibly negative) weight functions on the same domain D, with the\nproperty that they agree on signs for all states (i.e., \u2200~y \u2208 D : sign(b(~y)) = sign(p(~y))), and that\n\nthey sum to the same constant (i.e., P~y b(~y) = P~y p(~y) = c). Then the KL-divergence D(b k p)\n\nsatis\ufb01es D(b k p) \u2265 0 and D(b k p) = 0 \u21d4 b \u2261 p.\n\nThe proof is essentially identical to the equivalent statement made about KL-divergence of proba-\nbility distributions. We omit it here for lack of space.\n\nMinimizing D(b k p): We write p(~y) = sign(p(~y)) \u00b7 |p(~y)|, and analogously for b(~y). This allows\nus to isolate the signs, and the minimization follows exactly the steps of standard BP derivation,\nnamely we write a set of equations characterizing stationary points of D(b k p). At the end, using\nthe Sign-alternation assumption, we are able to implant the signs back.\n\nBP equations: The resulting modi\ufb01ed BP updates (denoted BP(\u22121) ) are, for yi \u2208 DomExt:\n\nni\u2192\u03b1(yi) = Y\n\n\u03b2\u220bi\\\u03b1\n\nm\u03b2\u2192i(yi)\n\nm\u03b1\u2192i(yi) \u221d\n\nX\n\n~y\u03b1\\i\u2208DomExt|\u03b1|\u22121\n\n\u03b1(~y\u03b1) Y\nf \u2032\n\nj\u2208\u03b1\\i\n\n(\u22121)\u03b4(|yj | is even)nj\u2192\u03b1(yj)\n\n(3)\n\n(4)\n\n(Almost equivalent to standard BP, except for the (\u22121) term.) One would iterate these equations\nfrom a suitable starting point to \ufb01nd a \ufb01xed point, and then obtain the beliefs bi(yi) and b\u03b1(~y\u03b1) (i.e.,\nestimates of marginal sums) using the Sign-alternation assumption and the standard BP relations:\n\nbi(yi) \u221d(\u22121)\u03b4(|yi| is even) Y\n\n\u03b1\u220bi\n\nm\u03b1\u2192i(yi)\n\nb\u03b1(~y\u03b1) \u221d(\u22121)#e(~y\u03b1)f \u2032\n\n\u03b1(~y\u03b1)Y\n\ni\u2208\u03b1\n\nni\u2192\u03b1(yi)\n\n(5)\n\nTo approximately count the number of clusters in large problems for which exact cluster count or\nexact Z(\u22121) evaluation is infeasible, we employ the generic BP(\u22121) scheme derived above. We sub-\nstitute the extended factors f \u2032(~y\u03b1) into Equations (3) and (4), iterate from a random initial starting\npoint to \ufb01nd a \ufb01xed point, and then use Equations (5) to compute the beliefs. The actual estimate of\nZ(\u22121) is obtained with the standard BP formula (with signs properly taken care of), where di is the\ndegree of the variable node yi in the factor graph:\n\nlog ZBP(\u22121) := \u2212X\n\n\u03b1\n\nX\n\n~y\u03b1\n\nb\u03b1(~y\u03b1) log |b\u03b1(~y\u03b1)| + X\n\ni\n\n(di \u2212 1)X\n\nyi\n\nbi(yi) log |bi(yi)|\n\n(6)\n\n6 Experimental Evaluation\n\nWe empirically evaluate the accuracy of our Z(\u22121) and ZBP(\u22121) approximations on an ensemble of\nrandom graph 3-coloring instances. The results are discussed in this section.\n\nZ(\u22121) vs. the number of clusters. The left panel of Figure 1 compares the number of clusters (on the\nx-axis, log-scale) with Z(\u22121) (on the y-axis, log-scale) for 2, 500 colorable random 3-COL instances\non graphs with 20, 50, and 100 vertices with average vertex degree ranging between 1.0 and 4.7 (the\nthreshold for 3-colorability). As can be seen, the Z(\u22121) expression captures the number of clusters\nalmost exactly. The inaccuracies come mostly from low graph density regions; in all instances we\ntried with density > 3.0, the Z(\u22121) expression was exact. We remark that although uncolorable\ninstances were not considered in this comparison, Z(\u22121) = 0 = num-clusters by construction.\n\nIt is worth noting that for tree-structured graphs (with more than one vertex), the Z(\u22121) expression\ngives 0 for any k \u2265 3 colors although there is exactly one solution cluster. Moreover, given a\ndisconnected graph with at least one tree component, Z(\u22121) also evaluates to 0 as it is the product\nof Z(\u22121) values over different components. We have thus removed all tree components from the\ngenerated graphs prior to computing Z(\u22121); tree components are easily identi\ufb01ed and removing\nthem does not change the number of clusters. For low graph densities, there are still some instances\n\n\f|V|= 20\n|V|= 50\n|V|= 100\n\n0\n0\n0\n5\n\n0\n0\n0\n1\n\n0\n0\n2\n\n0\n5\n\n0\n2\n\n5\n\n)\n1\n\u2212\n(\nZ\n\ni\n\nl\n\ns\na\nn\ng\nr\na\nm\n\u2212\n)\n1\n\u2212\n(\nZ\n\n0\n3\n0\n\n.\n\n0\n2\n\n.\n\n0\n\n0\n1\n0\n\n.\n\n0\n0\n0\n\n.\n\nN\n\n/\n)\nZ\n(\ng\no\n\nl\n \n\ne\ng\na\nr\ne\nv\nA\n\n0\n2\n\n.\n\n0\n\n5\n1\n0\n\n.\n\n0\n1\n0\n\n.\n\n5\n0\n\n.\n\n0\n\n0\n0\n0\n\n.\n\nZBP(\u22121), |V|=100K\nZBP(\u22121), |V|=100\nZ(\u22121), |V|=100\n\n5\n\n20 50\n\n200\n\n1000\n\n5000\n\n0.00\n\n0.10\n\n0.20\n\n0.30\n\nNumber of clusters\n\nCluster marginals\n\n1\n\n2\n\n3\n\n4\n\nAverage vertex degree\n\nFigure 1: Left: Z(\u22121) vs. number of clusters in random 3-COL\nproblems with 20, 50 and 100 vertices, and average vertex degree\nbetween 1.0 \u2212 4.7. Right: cluster marginals vs. Z(\u22121)-marginals\nfor one instance of random 3-COL problem with 100 vertices.\n\nFigure 2: Average ZBP(\u22121)\nand Z(\u22121) for 3-COL vs. av-\nerage vertex degrees for small\nand large random graphs.\n\nfor which Z(\u22121) evaluates to 0; these instances are not visible in Figure 1 due to the log-log scale.\nIn fact, all our instances with fewer than 5 clusters have Z(\u22121) = 0. This is because of other\nsubstructures for which Z(\u22121) evaluates to 0, e.g., cordless cycles of length not divisible by 3 (for\nk = 3 coloring) with attached trees. These structures, however, become rare as the density increases.\n\nZ(\u22121) marginals vs. clusters marginals. For a given problem instance, we can de\ufb01ne the cluster\nmarginal of a variable xi to be the fraction of solution clusters in which xi only appears with one\nparticular value (i.e., xi is a backbone of the cluster). Since Z(\u22121) counts well the number of clusters,\nit is natural to ask whether it is also possible to obtain the marginals information from it. Indeed,\nZ(\u22121) does provide an estimate of the cluster marginals, and we call them Z(\u22121)-marginals. Recall\nthat the semantics of factors in the extended domain is such that a variable can assume a set of values\nonly if every value in the set yields a solution to the problem. This extends to the Z(\u22121) estimate of\nthe number of clusters, and one can therefore use the principle of inclusion-exclusion to compute the\nnumber of clusters where a variable can only assume one particular value. The de\ufb01nition of Z(\u22121)\nconveniently provides for correct signs, and the number of clusters where xi is \ufb01xed to vi is thus\nestimated by Pyi\u220bvi\nZ(\u22121)(yi), where Z(\u22121)(yi) is the marginal sum of Z(\u22121). The Z(\u22121)-marginal\nis obtained by dividing this quantity by Z(\u22121).\n\nThe right panel of Figure 1 shows the results on one random 3-COL problem with 100 vertices. The\nplot shows cluster marginals and Z(\u22121)-marginals for one color; the points correspond to individ-\nual variables. The Z(\u22121)-marginals are close to perfect. This is a typical situation, although it is\nimportant to mention that Z(\u22121)-marginals are not always correct, or even non-negative. They are\nmerely an estimate of the true cluster marginals, and how well they work depends on the solution\nspace structure at hand. They are exact if the solution space decomposes into separated hypercubes\nand, as the \ufb01gure shows, remarkably accurate also for random coloring instances.\n\nThe number of clusters vs. ZBP(\u22121) . Figure 3 depicts a comparison between ZBP(\u22121) and Z(\u22121)\nfor the 3-COL problem on colorable random graphs of various sizes and graph densities. It com-\npares Z(\u22121) (on the x-axis, log-scale) with ZBP(\u22121) (y-axis, log-scale) for 1, 300 colorable 3-COL\ninstances on random graphs with 50, 100, and 200 vertices, with average vertex degree ranging from\n1.0 to 4.7. The plots shows that BP is quite accurate in estimating Z(\u22121) for individual instances,\nwhich in turn captures the number of clusters. Instances which are not 3-colorable are not shown,\nand BP in general incorrectly estimates a non-zero number of clusters for them.\n\nEstimates on very large graphs and for various graph densities. Figure 2 shows similar data\nfrom a different perspective: what is shown is a rescaled average estimate of the number of clusters\n(y-axis) for average vertex degrees 1.0 to 4.7 (x-axis). The average is taken across different colorable\ninstances of a given size, and the rescaling assumes that the number of clusters = exp(|V |\u00b7\u03a3) where\n\u03a3 is a constant independent of the number of vertices [3]. The three curves show, respectively, BP\u2019s\nestimate for graphs with 100, 000 vertices, BP\u2019s estimate for graphs with 100 vertices, and Z(\u22121) for\nthe same graphs of size 100. The averages are computed across 3, 000 instances of the small graphs,\nand only 10 instances of the large ones where the instance-to-instance variability is practically non-\nexistent. The fact that the curves nicely overlay shows that BP(\u22121) computes Z(\u22121) very accurately\n\n\f|V|= 50\n\n)\n1\n\u2212\n(\nP\nB\nZ\n\n9\n0\n+\ne\n1\n\n6\n0\n+\ne\n1\n\n3\n0\n+\ne\n1\n\n0\n0\n+\ne\n1\n\n|V|= 100\n\n)\n1\n\u2212\n(\nP\nB\nZ\n\n9\n0\n+\ne\n1\n\n6\n0\n+\ne\n1\n\n3\n0\n+\ne\n1\n\n0\n0\n+\ne\n1\n\n|V|= 200\n\n)\n1\n\u2212\n(\nP\nB\nZ\n\n9\n0\n+\ne\n1\n\n6\n0\n+\ne\n1\n\n3\n0\n+\ne\n1\n\n0\n0\n+\ne\n1\n\n1e+00\n\n1e+03\n\n1e+06\nZ(\u22121)\n\n1e+09\n\n1e+00\n\n1e+03\n\n1e+06\nZ(\u22121)\n\n1e+09\n\n1e+00\n\n1e+03\n\n1e+06\nZ(\u22121)\n\n1e+09\n\nFigure 3: ZBP(\u22121) compared to Z(\u22121) for 3-COL problem on random graphs with 50, 100 and 200\nvertices and average vertex degree in the range 1.0 \u2212 4.7.\n\non average for colorable instances (where we can compare it with exact values), and that the esti-\nmate remains accurate for large problems. Note that the Survey Propagation algorithm developed\nby Braunstein et al. [3] also aims at computing the number of certain clusters in the solution space.\nHowever, SP counts only the number of clusters with a \u201ctypical size\u201d, and would show non-zero\nvalues in Figure 2 only for average vertex degrees between 4.42 and 4.7. Our algorithm counts\nclusters of all sizes, and is very accurate in the entire range of graph densities.\n\n7 Conclusion\n\nWe discuss a purely combinatorial construction for estimating the number of solution clusters in\ngraph coloring problems with very high accuracy. The technique uses a hypercube-based inclusion-\nexclusion argument coupled with solution counting, and lends itself to an application of a modi\ufb01ed\nbelief propagation algorithm. This way, the number of clusters in huge random graph coloring\ninstances can be accurately and ef\ufb01ciently estimated. Our preliminary investigation has revealed\nthat it is possible to use combinatorial arguments to formally prove that the cluster counts estimated\nby Z(\u22121) are exact on certain kinds of solution spaces (not necessarily only for graph coloring). We\nhope that such insights and the cluster-focused picture will lead to new techniques for solving hard\ncombinatorial problems and for bounding solvability transitions in random problem ensembles.\n\nReferences\n\n[1] D. Achlioptas and F. Ricci-Tersenghi. On the solution-space geometry of random constraint satisfaction\n\nproblems. In 38th STOC, pages 130\u2013139, Seattle, WA, May 2006.\n\n[2] J. Ardelius, E. Aurell, and S. Krishnamurthy. Clustering of solutions in hard satis\ufb01ability problems. J.\n\nStatistical Mechanics, P10012, 2007.\n\n[3] A. Braunstein, R. Mulet, A. Pagnani, M. Weigt, and R. Zecchina. Polynomial iterative algorithms for\n\ncoloring and analyzing random graphs. Physical Review E, 68:036702, 2003.\n\n[4] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions on Com-\n\nputers, 35(8):677\u2013691, 1986.\n\n[5] A. Darwiche. New advances in compiling CNF into decomposable negation normal form. In 16th Euro-\n\npean Conf. on AI, pages 328\u2013332, Valencia, Spain, Aug. 2004.\n\n[6] A. Darwiche. Decomposable negation normal form. J. ACM, 48(4):608\u2013647, 2001.\n[7] A. Hartmann, A. Mann, and W. Radenback. Clusters and solution landscapes for vertex-cover and SAT\n\nproblems. In Workshop on Physics of Distributed Systems, Stockholm, Sweden, May 2008.\n\n[8] L. Kroc, A. Sabharwal, and B. Selman. Counting solution clusters of combinatorial problems using belief\n\npropagation, 2008. (in preparation).\n\n[9] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, and L. Zdeborova. Gibbs states and the set\n\nof solutions of random constraint satisfaction problems. PNAS, 104(25):10318\u201310323, June 2007.\n\n[10] D. Mackay, J. Yedidia, W. Freeman, and Y. Weiss. A conversation about the Bethe free energy and\n\nsum-product, 2001. URL citeseer.ist.psu.edu/mackay01conversation.html.\n\n[11] M. M\u00b4ezard, G. Parisi, and R. Zecchina. Analytic and algorithmic solution of random satis\ufb01ability prob-\n\nlems. Science, 297(5582):812\u2013815, 2002.\n\n[12] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized\n\nbelief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282\u20132312, 2005.\n\n\f", "award": [], "sourceid": 3512, "authors": [{"given_name": "Lukas", "family_name": "Kroc", "institution": null}, {"given_name": "Ashish", "family_name": "Sabharwal", "institution": null}, {"given_name": "Bart", "family_name": "Selman", "institution": null}]}