{"title": "Comparing Beliefs, Surveys, and Random Walks", "book": "Advances in Neural Information Processing Systems", "page_first": 49, "page_last": 56, "abstract": null, "full_text": "Comparing Beliefs, Surveys and Random Walks\n\n\n\n Erik Aurell\n SICS, Swedish Institute of Computer Science\n P.O. Box 1263, SE-164 29 Kista, Sweden\n and Dept. of Physics,\n KTH Royal Institute of Technology\n AlbaNova SCFAB SE-106 91 Stockholm, Sweden\n eaurell@sics.se\n\n\n Uri Gordon and Scott Kirkpatrick\n School of Engineering and Computer Science\n Hebrew University of Jerusalem\n 91904 Jerusalem, Israel\n {guri,kirk}@cs.huji.ac.il\n\n\n\n Abstract\n\n Survey propagation is a powerful technique from statistical physics that\n has been applied to solve the 3-SAT problem both in principle and in\n practice. We give, using only probability arguments, a common deriva-\n tion of survey propagation, belief propagation and several interesting hy-\n brid methods. We then present numerical experiments which use WSAT\n (a widely used random-walk based SAT solver) to quantify the complex-\n ity of the 3-SAT formulae as a function of their parameters, both as ran-\n domly generated and after simplication, guided by survey propagation.\n Some properties of WSAT which have not previously been reported make\n it an ideal tool for this purpose its mean cost is proportional to the num-\n ber of variables in the formula (at a xed ratio of clauses to variables) in\n the easy-SAT regime and slightly beyond, and its behavior in the hard-\n SAT regime appears to reect the underlying structure of the solution\n space that has been predicted by replica symmetry-breaking arguments.\n An analysis of the tradeoffs between the various methods of search for\n satisfying assignments shows WSAT to be far more powerful than has\n been appreciated, and suggests some interesting new directions for prac-\n tical algorithm development.\n\n\n1 Introduction\n\nRandom 3-SAT is a classic problem in combinatorics, at the heart of computational com-\nplexity studies and a favorite testing ground for both exactly analyzable and heuristic so-\nlution methods which are then applied to a wide variety of problems in machine learning\nand articial intelligence. It consists of a ensemble of randomly generated logical expres-\nsions, each depending on N Boolean variables xi, and constructed by taking the AND of\nM clauses. Each clause a consists of the OR of 3 \"literals\" yi,a. yi,a is taken to be either xi\n\n\f\nor xi at random with equal probability, and the three values of the index i in each clause\nare distinct. Conversely, the neighborhood of a variable xi is Vi, the set of all clauses in\nwhich xi or xi appear. For each such random formula, one asks whether there is some\nset of xi values for which the formula evaluates to be TRUE. The ratio = M/N controls\nthe difculty of this decision problem, and predicts the answer with high accuracy, at least\nas both N and M tend to innity, with their ratio held constant. At small , solutions are\neasily found, while for sufciently large there are almost certainly no satisfying congu-\nrations of the xi, and compact proofs of this fact can be constructed. Between these limits\nlies a complex, spin-glass-like phase transition, at which the cost of analyzing the problem\nwith either exact or heuristic methods explodes.\n\nA recent series of papers drawing upon the statistical mechanics of disordered materials\nhas not only claried the nature of this transition, but also lead to a thousand-fold increase\nin the size of the concrete problems that can be solved [1, 2, 3] This paper provides a\nderivation of the new methods using nothing more complex than probabilities, suggests\nsome generalizations, and reports numerical experiments that disentangle the contributions\nof the several component heuristics employed. For two related discussions, see [4, 5].\n\nAn iterative \"belief propagation\" [6] (BP) algorithm for K-SAT can be derived to evaluate\nthe probability, or \"belief,\" that a variable will take the value TRUE in variable cong-\nurations that satisfy the formula considered. To calculate this, we rst dene a message\n(\"transport\") sent from a variable to a clause:\n\n tia is the probability that variable xi satises clause a\n\nIn the other direction, we dene a message (\"inuence\") sent from a clause to a variable:\n\n iai is the probability that clause a is satised by another variable than xi\n\nIn 3-SAT, where clause a depends on variables xi, xj and xk, BP gives the following\niterative update equation for its inuence.\n\n i(l) = t(l) + t(l) - t(l) t(l) (1)\n ai ja ka ja ka\n\n\nThe BP update equations for the transport tia involve the products of inuences acting\non a variable from the clauses which surround xi, forming its \"cavity,\" Vi, sorted by which\nliteral (xi or xi) appears in the clause:\n\n A0i = ibi and A1i = ibi (2)\n bVi, yi,b=xi bVi, yi,b=xi\n\nThe update equations are then\n\n i(l-1)A1\n ai i if y\n i,a = xi\n i(l-1)A1+A0\n ai i i\n t(l) = \n (3)\n ia\n i(l-1)A0\n a i\n i if yi,a = xi\n i(l-1)A0+A1\n ai i i\n\n\nThe superscripts (l) and (l - 1) denote iteration. The probabilistic interpretation is the\nfollowing: suppose we have i(l) for all clauses b connected to variable i. Each of these\n bi\nclauses can either be satised by another variable (with probability i(l) ), or not be satised\n bi\n\nby another variable (with probability 1 - i(l) ), and also be satised by variable i itself.\n bi\nIf we set variable xi to 0, then some clauses are satised by xi, and some have to be satised\nby other variables. The probability that they are all satised is i(l) . Similarly,\n b=a,yi,b=xi bi\n\n\f\nif xi is set to 1 then all these clauses b are satised with probability i(l) .\n b=a,yi,b=xi bi\nThe products in (3) can therefore be interpreted as joint probabilities of independent events.\nVariable xi can be 0 or 1 in a solution if the clauses in which xi appears are either satised\ndirectly by xi itself, or by other variables. Hence\n\n A0 A1\n Prob(x i i\n i) = and Prob(x (4)\n A0 + A1 i) = A0 + A1\n i i i i\n\n\nA BP-based decimation scheme results from xing the variables with largest probability to\nbe either true or false. We then recalculate the beliefs for the reduced formula, and repeat.\n\nTo arrive at SP we introduce a modied system of beliefs: every variable falls into one of\nthree classes: TRUE in all solutions (1); FALSE in all solutions(0); and TRUE in some and\nFALSE in other solutions (f ree). The message from a clause to a variable (an inuence)\nis then the same as in BP above. Although we will again only need to keep track of one\nmessage from a variable to a clause (a transport), it is convenient to rst introduce three\nancillary messages:\n\n\n ^\n Tia(1) is the probability that variable xi is true in clause a in all solutions\n\n ^\n Tia(0) is the probability that variable xi is false in clause a in all solutions\n\n ^\n Tia(f ree) is the probability that variable xi is true in clause a in some solutions\n and false in others.\n\n\nNote that there are here three transports for each directed link i a, from a variable to\na clause, in the graph. As in BP, these numbers will be functions of the inuences from\nclauses to variables in the preceeding update step. Taking again the incoming inuences\nindependent, we have\n\n ^\n T (l) (f ree) i(l-1)\n ia bVi\\a bi\n ^\n T (l) (0) + ^\n T (l) (f ree) i(l-1) (5)\n ia ia bVi\\a,yi,b=xi bi\n ^\n T (l) (1) + ^\n T (l) (f ree) i(l-1)\n ia ia bVi\\a,yi,b=xi bi\n\n\nThe proportionality indicates that the probabilities are to be normalized. We see that the\nstructure is quite similar to that in BP. But we can make it closer still by introducing tia\nwith the same meaning as in BP. In SP it will then, as the case might be, be equal to to\nTia(f ree) + Tia(0) or Tia(f ree) + Tia(1). That gives (compare (3)):\n\n i(l-1)A1\n ai i if y\n i,a = xi\n i(l-1)A1+A0-A1A0\n ai i i i i\n t(l) = \n (6)\n ia\n i(l-1)A0\n a i\n i if yi,a = xi\n i(l-1)A0+A1-A1A0\n ai i i i i\n\n\nThe update equations for tia are the same in SP as in BP, .e. one uses (1) in SP as well.\nSimilarly to (4), decimation now removes the most xed variable, i.e. the one with the\nlargest absolute value of (A0 - A1)/(A0 + A1 - A1A0)\n i i i i i i . Given the complexity of the\noriginal derivation of SP [1, 2], it is remarkable that the SP scheme can be interpreted as\na type of belief propagation in another belief system. And even more remarkable that the\nnal iteration formulae differ so little.\n\nA modication of SP which we will consider in the following is to interpolate between BP\n\n\f\n Fraction of sites remaining after decimation\n 1.2\n\n\n\n\n\n 1\n\n\n\n\n =1.05\n 0.8\n\n =1\n\n =0.95\n 0.6\n\n =0\n\n\n\n 0.4\n\n\n\n\n\n 0.2\n\n\n\n\n\n 0\n\n\n\n\n\n 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4\n = M/N\n\n\n\n Figure 1: Dependence of decimation depth on the interpolation parameter .\n\n\n( = 0) and SP ( = 1) 1 by considering equations\n\n i(l-1)A1\n ai i if y\n i,a = xi\n i(l-1)A1+A0-A1A0\n ai i i i i\n t(l) \n (7)\n ia\n i(l-1)A0\n a i\n i if yi,a = xi\n i(l-1)A0+A1-A1A0\n ai i i i i\n\n\nWe do not have an interpretation of the intermediate cases of as belief systems.\n\n\n2 The Phase Diagram of 3-SAT\n\nEarly work on developing 3-SAT heuristics discovered that as is increased, the problem\nchanges from being easy to solve to extremely hard, then again relatively easy when the\nformulae are almost certainly UNSAT. It was natural to expect that a sharp phase boundary\nbetween SAT and UNSAT phases in the limit of large N accompanies this \"easy-hard-easy\"\nobserved transition, and the nite-size scaling results of [7] conrmed this. Their work\nplaced the transition at about = 4.2. Monasson and Zecchina [8] soon showed, using the\nreplica method from statistical mechanics, that the phase transition to be expected had un-\nusual characteristics, including \"frozen variables\" and a highly nonuniform distribution of\nsolutions, making search difcult. Recent technical advances have made it possible to use\nsimpler cavity mean eld methods to pinpoint the SAT/UNSAT boundary at = 4.267 and\nsuggest that the \"hard-SAT\" region in which the solution space becomes inhomogeneous\nbegins at about = 3.92. These calculations also predicted a specic solution structure\n(termed 1-RSB for \"one step replica symmetry-breaking\") [1, 2] in which the satisable\ncongurations occur in large clusters, maximally separated from each other. Two types\nof frozen variables are predicted, one set which take the same value in all clusters and a\nsecond set whose value is xed within a particular cluster. The remaining variables are\n\"paramagnetic\" and can take either value in some of the states of a given cluster. A careful\nanalysis of the 1-RSB solution has subsequently shown that this extreme structure is only\nstable above = 4.15. Between 3.92 and 4.15 a wider range of cluster sizes, and wide\nrange of inter-cluster Hamming distances are expected.[9] As a result, we expect the values\n = 3.9, 4.15 and 4.267 to separate regions in which the nature of the 3-SAT decision\nproblem is distinctly different.\n\n\n\n 1This interpolation has also been considered and implemented by R. Zecchina and co-workers.\n\n\f\n\"Survey-induced decimation\" consists of using SP to determine the variable most likely to\nbe frozen, then setting that variable to the indicated frozen value, simplifying the formula\nas a result, updating the SP calculation, and repeating the process. For < 3.9 we expect\nSP to discover that all spins are free to take on more than one value in some ground state,\nso no spins will be decimated. Above 3.9, SP ideally should identify frozen spins until\nall that remain are paramagnetic. The depth of decimation, or fraction of spins reminaing\nwhen SP sees only paramagnetic spins, is thus an important characteristic. We show in\nFig. 1 the fraction of spins remaining after survey-induced decimation for values of from\n3.85 to 4.35 in hundreds of formulae with N = 10, 000. The error bars show the standard\ndeviation, which becomes quite large for large values of . To the left of = 4.2, on the\ndescending part of the curves, SP reaches a paramagnetic state and halts. On the right, or\nascending portion of the curves, SP stops by simply failing to converge.\n\nFig 1 also shows how different the behavior of BP and the hybrids between BP and SP are\nin their decimation behavior. We studied BP ( = 0), underrelaxed SP ( = 0.95), SP, and\noverrelaxed SP ( = 1.05). BP and underrelaxed SP do not reach a paramagnetic state, but\ncontinue until the formula breaks apart into clauses that have no variables shared between\nthem. We see in Fig. 1 that BP stops working at roughly = 3.9, the point at which SP\nbegins to operate. The underrelaxed SP behaves like BP, but can be used well into the RSB\nregion. On the rising parts of all four curves in Fig 1, the scheme halted as the surveys\nceased to converge. Overrelaxed SP in Fig. 1 may give reasonable recommendations for\nsimplication even on formulae which are likely to be UNSAT.\n\n\n3 Some Background on WSAT\n\nNext we consider WSAT, the random walk-based search routine used to nish the job of\nexhibiting a satisfying conguration after SP (or some other decimation advisor) has sim-\nplied the formula. The surprising power exhibited by SP has to some extent obscured the\nfact that WSAT is itself a very powerful tool for solving constraint satisfaction problems,\nand has been widely used for this. Its running time, expressed in the number of walk steps\nrequired for a successful search is also useful as an informal denition of the complexity of\na logical formula. Its history goes back to Papadimitriou's [10] observation that a subtly bi-\nased random walk would with high probability discover satisfying solutions in the simpler\n2-SAT problem after, at worst, O(N 2) steps. His procedure was to start with an arbitary\nassignment of values to the binary variables, then reverse the sign of one variable at a time\nusing the following random process:\n\n select an unsatised clause at random\n select at random a variable that appears in the clause\n reverse that variable\n\nThis procedure, sometimes called RWalkSAT, works because changing the sign of a vari-\nable in an unsatised clause always satises that clause and, at rst, has no net effect\non other clauses. It is much more powerful than was proven initially. Two recent pa-\npers [12, 13]. have argued analytically and shown experimentally that Rwalksat nds sat-\nisfying congurations of the variables after a number of steps that is proportional to N for\nvalues of up to roughly 2.7. after which this cost increases exponentially with N .\n\nThe second trick in WSAT was introduced by Kautz and Selman [11]. They also choose\nan unsatised clause at random, but then reverse one of the \"best\" variables, selected at\nrandom, where \"best\" is dened as causing the fewest satised clauses to become unsatis-\ned. For robustness, they mix this greedy move with random moves as used in RWalkSAT,\nrecommending an equal mixture of the two types of moves. Barthel et al.[13] used these\ntwo moves in numerical experiments, but found little improvement over RWalkSAT.\n\n\f\n Median Cost per variable 15 Variance of Cost per variable x N\n 10\n N=1000 N=1000\n 4\n 10 N=2000 N=2000\n N=5000 10\n 10 N=5000\n N=10000 N=10000\n N=20000\n 2 N=20000\n 10 5\n 10\n\n\n 0\n 0\n 10 10\n\n \n -5\n 1 2 3 4 10\n 0 1 2 3 4 5\n\n\nFigure 2: (a) Median of WSAT cost per variable in 3-SAT as a function of . (b) Variance\nof WSAT cost, scaled by N .\n\n\n\n\n\nThere is a third trick in the most often used variant of WSAT, introduced slightly later [14].\nIf any variable in the selected unsatised clause can be reversed without causing any other\nclauses to become unsatised, this \"free\" move is immediately accepted and no further\nexploration is required. Since we shall show that WSAT works well above = 2.7, this\nthird move apparently gives WSAT its extra power. Although these moves were chosen\nby the authors of WSAT after considerable experiment, we have no insight into why they\nshould be the best choices.\n\nIn Fig. 2a, we show the median number of random walk steps per variable taken by the\nstandard version of WSAT to solve 3-SAT formulas at values of ranging from 0.5 to 4.3\nand for formulae of sizes ranging from N = 1000 to N = 20000. The cost of WSAT\nremains linear in N well above = 3.9. WSAT cost distributions were collected on at\nleast 1000 cases at each point. Since the distributions are asymmetric, with strong tails\nextending to higher cost, it is not obvious that WSAT cost is, in the statistical mechanics\nlanguage, self-averaging, or concentrated about a well-dened mean value which domi-\nnates the distribution as N . To test this, we calculated higher moments of the WSAT\ncost distribution and found that they scale with simple powers of N. For example, in Fig.\n2b, we show that the variance of the WSAT cost per variable, scaled up by N, is a well-\ndened function of up to almost 4.2. The third and fourth moments of the distribution\n(not shown) also are constant when multiplied by N and by N 2, respectively. The WSAT\ncost per variable is thus given by a distribution which concentrates with increasing N in\nexactly the way that a process governed by the usual laws of large numbers is expected to\nbehave, even though the typical cost increases by six orders of magnitude as we move from\nthe trivial cases to the critical regime.\n\nA detailed analysis of the cost distributions which we observed will be published elsewhere\nbut we conclude that the median cost of solving 3-SAT using the WSAT random walk\nsearch, as well as the mean cost if that is well-dened, remains linear in N up to =\n4.15, coincidentally the onset of 1-RSB. In the 1-RSB regime, the WSAT cost per variable\ndistributions shift to higher values as N increases, and an exponential increase in cost\nwith N is likely. Is 4.15 really the endpoint for WSAT's linearity, or will the search cost\nper variable converge at still larger values of N which we could not study? We dene a\nrough estimate of Nonset() by study of the cumulative distributions of WSAT cost as the\nvalue of N for a given above which the distributions cross at a xed percentile. Plotting\nlog(Nonset) against log(4.15 - ) in Fig. 3, we nd strong indication that 4.15 is indeed\nan asymptote for WSAT.\n\n\f\n Onset for linear WSAT cost per variable 5 Median WalkSat Cost \n 10\n 100000 N=1000\n N=2000\n N=5000\n N=10000\n 10000 N=20000\n\n\n N onset 1000\n\n\n\n 100 0\n 0.01 0.1 1 10 103.4 3.6 3.8 4 4.2\n (4.15 - M/N) \n\n\nFigure 3: Size N at which WSAT cost is linear in N as function of 4.15 - .\nFigure 4: WSAT cost, before and after SP-guided decimation.\n\n\n\n\n4 Practical Aspects of SP + WSAT\n\n\nThe power of SP comes from its use to guide decimation by identifying spins which can\nbe frozen while minimally reducing the number of solutions that can be constructed. To\nassess the complexity of the reduced formulae that decimation guided in this way produces\nwe compare, in Fig. 4, the median number of WSAT steps required to nd a satisfying\nconguration of the variables before and after decimation. To a rough approximation, we\ncan say that SP caps the cost of nding a solution to what it would be at the entry to the\ncritical regime. There are two factors, the reduction in the number of variables that have\nto be searched, and the reduction of the distance the random walk must traverse when it is\nrestricted to a single cluster of solutions. In Fig. 2c the solid lines show the WSAT costs\ndivided by N, the original number of variables in each formula. If we instead divide the\nWSAT cost after decimation by the number of variables remaining, the complexity measure\nthat we obtain is only a factor of two larger, as shown by the dotted lines. The relative cost\nof running WSAT without benet of decimation is 3-4 decades larger.\n\nWe measured the actual compute time consumed in survey propagation and in WSAT. For\nthis we used the Zecchina group's version 1.3 survey propagation code, and the copy of\nWSAT (H. Kautz's release 35, see [15]) that they have also employed. All programs were\nrun on a Pentium IV Xeon 3GHz dual processor server with 4GB of memory, and only one\nprocessor busy. We compare timings from runs on the same 100 formulas with N = 10000\nand = 4.1 and 4.2 (the formulas are simply extended slightly for the second case). In the\nrst case, the 100 formulas were solved using WSAT alone in 921 seconds. Using SP to\nguide decimation one variable at a time, with the survey updates performed locally around\neach modied variable, the same 100 formulas required 6218 seconds to solve, of which\nonly 31 sec was spent in WSAT.\n\nWhen we increase alpha to 4.2, the situation is reversed. Running WSAT on 100 formulas\nwith N = 10000 required 27771 seconds on the same servers, and would have taken\neven longer if about half of the runs had not been stopped by a cutoff without producing a\nsatisfying conguration. In contrast, the same 100 formulas were solved by SP followed\nwith WSAT in 10,420 sec, of which only 300 seconds were spent in WSAT. The cost of\nSP does not scale linearly with N , but appears to scale as N 2 in this regime. We solved\n100 formulas with N = 20, 000 using SP followed by WSAT in 39643 seconds, of which\n608 sec was spent in WSAT. The cost of running SP to decimate roughly half the spins has\nquadrupled, while the cost of the nal WSAT runs remained proportional to N .\n\nDecimation must stop short of the paramagnetic state at the highest values of , to avoid\nhaving SP fail to converge. In those cases we found that WSAT could sometimes nd\n\n\f\nsatisfying congurations if started slightly before this point. We also explored partial deci-\nmation as a means of reducing the cost of WSAT just below the 1-RSB regime, but found\nthat decimation of small fractions of the variables caused the WSAT running times to be\nhighly unpredictable, in many cases increasing strongly. As a result, partial decimation\ndoes not seem to be a useful approach.\n\n\n5 Conclusions and future work\n\nThe SP and related algorithms are quite new, so programming improvements may modify\nthe practical conclusions of the previous section. However, a more immediate target for\nfuture work could be the WSAT algorithms. Further directing its random choices to incor-\nporate the insights gained from BP and SP might make it an effective algorithm even closer\nto the SAT/UNSAT transition.\n\n\nAcknowledgments\n\nWe have enjoyed discussions of this work with members of the replica and cavity the-\nory community, especially Riccardo Zecchina, Alfredo Braunstein, Marc Mezard, Remi\nMonasson and Andrea Montanari. This work was performed in the framework of EU/FP6\nIntegrated Project EVERGROW (www.evergrow.org), and in part during a Thematic Insti-\ntute supported by the EXYSTENCE EU/FP5 network of excellence. E.A. acknowledges\nsupport from the Swedish Science Council. S.K. and U.G. are partially supported by a\nUS-Israeli Binational Science Foundation grant.\n\n\nReferences\n\n [1] Mezard M., Parisi G. & Zecchina R.. (2002) Analytic and Algorithmic Solutions of Random\n Satisability Problems. Science, 297:812-815\n\n [2] Mezard M. & Zecchina R. (2002) The random K-satisability problem: from an analytic solu-\n tion to an efcient algorithm. Phys. Rev. E 66: 056126.\n\n [3] Braunstein A., Mezard M. & Zecchina R., \"Survey propagation: an algorithm for satisability\",\n arXiv:cs.CC/0212002 (2002).\n\n [4] Parisi G. (2003), On the probabilistic approach to the random satisability problem, Proc. SAT\n 2003 and arXiv:cs:CC/0308010v1 .\n\n [5] Braunstein A. and Zecchina R., (2004) Survey Propagation as Local Equilibrium Equations.\n arXiv:cond-mat/0312483 v5.\n\n [6] Pearl J. (1988) Probabilistic Reasoning in Intelligent Systems, 2nd Edition, Kauffmann.\n\n [7] Kirkpatrick S. & Selman B. (1994) Critical Behaviour in the Satiability of Random Boolean\n Expressions. Science 264: 1297-1301.\n\n [8] Monasson R. & Zecchina R. (1997) Statistical mechanics of the random K-Sat problem. Phys.\n Rev. E 56: 13571361.\n\n [9] Montanari A., Parisi G. & Ricci-Tersenghi F. (2003) Instability of one-step replica-symmetric-\n broken phase in satisability problems. cond-mat/0308147.\n\n[10] Papadimitriou C.H. (1991). In FOCS 1991, p. 163.\n\n[11] Selman B. & Kautz H.A. (1993) In Proc. AAAI-93 26, pp. 46-51.\n\n[12] Semerjian G. & Monasson R. (2003). Phys Rev E 67: 066103.\n\n[13] Barthel W., Hartmann A.K. & Weigt M. (2003). Phys. Rev. E 67: 066104.\n\n[14] Selman B., Kautz K. & Cohen B. (1996) Local Search Strategies for Satisability Testing.\n DIMACS Series in Discrete Mathematics and Theoretical Computer Science 26.\n\n[15] http://www.cs.washington.edu/homes/kautz/walksat/\n\n\f\n", "award": [], "sourceid": 2575, "authors": [{"given_name": "Erik", "family_name": "Aurell", "institution": null}, {"given_name": "Uri", "family_name": "Gordon", "institution": null}, {"given_name": "Scott", "family_name": "Kirkpatrick", "institution": null}]}