{"title": "Streamlining Variational Inference for Constraint Satisfaction Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 10556, "page_last": 10566, "abstract": "Several algorithms for solving constraint satisfaction problems are based on survey propagation, a variational inference scheme used to obtain approximate marginal probability estimates for variable assignments. These marginals correspond to how frequently each variable is set to true among satisfying assignments, and are used to inform branching decisions during search; however, marginal estimates obtained via survey propagation are approximate and can be self-contradictory. We introduce a more general branching strategy based on streamlining constraints, which sidestep hard assignments to variables. We show that streamlined solvers consistently outperform decimation-based solvers on random k-SAT instances for several problem sizes, shrinking the gap between empirical performance and theoretical limits of satisfiability by 16.3% on average for k = 3, 4, 5, 6.", "full_text": "Streamlining Variational Inference for Constraint\n\nSatisfaction Problems\n\nAditya Grover, Tudor Achim, Stefano Ermon\n\n{adityag, tachim, ermon}@cs.stanford.edu\n\nComputer Science Department\n\nStanford University\n\nAbstract\n\nSeveral algorithms for solving constraint satisfaction problems are based on survey\npropagation, a variational inference scheme used to obtain approximate marginal\nprobability estimates for variable assignments. These marginals correspond to\nhow frequently each variable is set to true among satisfying assignments, and are\nused to inform branching decisions during search; however, marginal estimates\nobtained via survey propagation are approximate and can be self-contradictory.\nWe introduce a more general branching strategy based on streamlining constraints,\nwhich sidestep hard assignments to variables. We show that streamlined solvers\nconsistently outperform decimation-based solvers on random k-SAT instances\nfor several problem sizes, shrinking the gap between empirical performance and\ntheoretical limits of satis\ufb01ability by 16.3% on average for k = 3, 4, 5, 6.\n\n1\n\nIntroduction\n\nConstraint satisfaction problems (CSP), such as boolean satis\ufb01ability (SAT), are useful modeling\nabstractions for many arti\ufb01cial intelligence and machine learning problems, including planning [13],\nscheduling [27], and logic-based probabilistic modeling frameworks such as Markov Logic Net-\nworks [30]. More broadly, the ability to combine constraints capturing domain knowledge with\nstatistical reasoning has been successful across diverse areas such as ontology matching, information\nextraction, entity resolution, and computer vision [15, 4, 32, 29, 33]. Solving a CSP involves \ufb01nding\nan assignment to the variables that renders all of the problem\u2019s constraints satis\ufb01ed, if one exists.\nSolvers that explore the search space exhaustively do not scale since the state space is exponential\nin the number of variables; thus, the selection of branching criteria for variable assignments is the\ncentral design decision for improving the performance of these solvers [5].\nAny CSP can be represented as a factor graph, with variables as nodes and the constraints between\nthese variables (known as clauses in the SAT case) as factors. With such a representation, we can\ndesign branching strategies by inferring the marginal probabilities of each variable assignment.\nIntuitively, the variables with more extreme marginal probability for a particular value are more likely\nto assume that value across the satisfying assignments to the CSP. In fact, if we had access to an\noracle that could perform exact inference, one could trivially branch on variable assignments with\nnon-zero marginal probability and ef\ufb01ciently \ufb01nd solutions (if one exists) to hard CSPs such as SAT\nin time linear in the number of variables. In practice however, exact inference is intractable for even\nmoderately sized CSPs and approximate inference techniques are essential for obtaining estimates of\nmarginal probabilities.\nVariational inference is at the heart of many such approximate inference techniques. The key idea\nis to cast inference over an intractable joint distribution as an optimization problem over a family\nof tractable approximations to the true distribution [6, 34, 38]. Several such approximations exist,\ne.g., mean \ufb01eld, belief propagation etc. In this work, we focus on survey propagation. Inspired from\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fa\n\ni\n\nb\n\nl\n\nc\n\nk\n\nj\n\nm\n\nFigure 1: Factor graph for a 3-SAT instance with 5 variables (circles) and 3 clauses (squares). A\nsolid (dashed) edge between a clause and a variable indicates that the clause contains the variable\nas a positive (negative) literal. This instance corresponds to (\u00acxi _ xk _\u00ac xl) ^ (xi _ xj _\u00ac xk) ^\n(xk _ xl _\u00ac xm), with the clauses a, b, c listed in order.\n\nstatistical physics, survey propagation is a message-passing algorithm that corresponds to belief\npropagation in a \u201clifted\u201d version of the original CSP and underlines many state-of-the-art solvers for\nrandom CSPs [24, 22, 21].\nExisting branching rules for survey propagation iteratively pick variables with the most con\ufb01dent\nmarginals and \ufb01x their values (by adding unary constraints on these variables) in a process known as\ndecimation. This heuristic works well in practice, but struggles with a high variance in the success of\nbranching, as the unary constraints leave the survey inspired decimation algorithm unable to recover\nin the event that a contradictory assignment (i.e., one that cannot be completed to form a satisfying\nassignment) is made. Longer branching predicates, de\ufb01ned over multiple variables, have lower\nvariance and are more effective both in theory and practice [14, 1, 2, 36, 19, 18].\nIn this work, we introduce improved branching heuristics for survey propagation by extending this\nidea to CSPs; namely, we show that branching on more complex predicates than single-variable\nconstraints greatly improves survey propagation\u2019s ability to \ufb01nd solutions to CSPs. Appealingly, the\nmore complex, multi-variable predicates which we refer to as streamlining constraints, can be easily\nimplemented as additional factors (not necessarily unary anymore) in message-passing algorithms\nsuch as survey propagation. For this reason, branching on more complex predicates is a natural\nextension to survey propagation.\nUsing these new branching heuristics, we develop an algorithm and empirically benchmark it on\nfamilies of random CSPs. Random CSPs exhibit sharp phase transitions between satis\ufb01able and\nunsatis\ufb01able instances and are an important model to analyze the average hardness of CSPs, both in\ntheory and practice [25, 26]. In particular, we consider two such CSPs: k-SAT where constraints are\nrestricted to disjunctions involving exactly k (possibly negated) variables [3] and XORSAT which\nsubstitutes disjunctions in k-SAT for XOR constraints of \ufb01xed length. On both these problems, our\nproposed algorithm outperforms the competing survey inspired decimation algorithm that branches\nbased on just single variables, increasing solver success rates.\n\n2 Preliminaries\n\nEvery CSP can be encoded as a boolean SAT problem expressed in Conjunctive Normal Form\n(CNF), and we will use this representation for the remainder of this work. Let V and C denote index\nsets for n Boolean variables and m clauses respectively. A literal is a variable or its negation; a\nclause is a disjunction of literals. A CNF formula F is a conjunction of clauses, and is written as\n(l11 _ . . . _ l1k1) ^ . . . ^ (lm1 _ . . . _ lmkm). Each (lj1 _ . . . _ ljkj ) is a clause with kj literals. For\nnotational convenience, the variables will be indexed with letters i, j, k, . . . and the clauses will be\nindexed with letters a, b, c, . . .. Each variable i is Boolean, taking values xi 2{ 0, 1}. A formula is\nsatis\ufb01able is there exists an assignment to the variables such that all the clauses are satis\ufb01ed, where a\nclause is satis\ufb01ed if at least one literal evaluates to true.\nAny SAT instance can be represented as an undirected graphical model where each clause corresponds\nto a factor, and is connected to the variables in its scope. Given an assignment to the variables in its\nscope, a factor evaluates to 1 if the corresponding clause evaluates to True, and 0 otherwise. The\ncorresponding joint probability distribution is uniform over the set of satisfying assignments. An\nexample factor graph illustrating the use of our notation is given in Figure 1.\n\n2\n\n\fk-SAT formulas are ones where all clauses (lj1 _ . . . _ ljkj ) have exactly k literals, i.e., kj = k\nfor j = 1,\u00b7\u00b7\u00b7 , m. Random k-SAT instances are generated by choosing each literal\u2019s variable and\nnegation independently and uniformly at random in each of the m clauses. It has been shown that\nthese instances have a very distinctive behavior where the probability of an instance having a solution\nhas a phase transition explained as a function of the constraint density, \u21b5 = m/n, for a problem with\nm clauses and n variables for large enough k. These instances exhibit a sharp crossover at a threshold\ndensity \u21b5s(k): they are almost always satis\ufb01able below this threshold, and they become unsatis\ufb01able\nfor larger constraint densities [12, 10]. Empirically, random instances with constraint density close to\nthe satis\ufb01ability threshold are dif\ufb01cult to solve [23].\n\n2.1 Survey propagation\nThe base algorithm used in many state-of-the-art solvers for constraint satisfaction problems such\nas random k-SAT is survey inspired decimation [7, 24, 16, 23]. The algorithm employs survey\npropagation, a message passing procedure that computes approximate single-variable marginal\nprobabilities for use in a decimation procedure. Our approach uses the same message passing\nprocedure, and we review it here for completeness.\nSurvey propagation is an iterative procedure for estimating variable marginals in a factor graph. In the\ncontext of a factor graph corresponding to a Boolean formula, these marginals represent approximately\nthe probability of a variable taking on a particular assignment when sampling uniformly from the set\nof satisfying assignments of the formula. Survey propagation considers three kinds of assignments\nfor a variable: 0, 1, or unconstrained (denoted by \u21e4). A high value for marginals corresponding to\neither of the \ufb01rst two assignments indicates that the variables assuming the particular assignment\nmake it likely for the overall formula to be satis\ufb01able, whereas a high value for the unconstrained\nmarginal indicates that satis\ufb01ablility is likely regardless of the variable assignment.\nIn order to estimate these marginals from a factor graph, we follow a message passing protocol where\nwe \ufb01rst compute survey messages for each edge in the graph. There are two kinds of survey messages:\nmessages {\u2318i!a}i2V,a2C(i) from variable nodes i to clauses a, and messages {\u2318a!i}a2C,i2V (a)\nfrom clauses to variables. These messages can be interpreted as warnings of unsatis\ufb01ability.\n\n1. If we let V (a) to be the set of variables appearing in clause a, then the message sent from a\nclause a to variable i, \u2318a!i, is intuitively the probability that all variables in V (a)\\{i} are\nin the state that violates clause a. Hence, clause a is issuing a warning to variable i.\n2. The reverse message from variable i to clause a for some value xi, \u2318i!a, is interpreted as\n\nthe probability of variable i assuming the value xi that violates clause a.\n\nAs shown in Algorithm 1, the messages from factors (clauses) to variables \u2318a!i are initialized\nrandomly [Line 2] and updated until a prede\ufb01ned convergence criteria [Lines 5-7]. Once the messages\nconverge to \u2318\u21e4a!i, we can estimate the approximate marginals \u00b5i(0), \u00b5i(1), \u00b5i(\u21e4) for each variable i.\nIn case survey propagation does not converge even after repeated runs, or a contradiction is found, the\nalgorithm output is UNSAT. The message passing updates SP-Update [Line 6] and the marginalization\nprocedure Marginalize [Line 9] are deferred to Appendix A for ease of presentation. We refer the\nreader to [24] and [7] for a detailed analysis of the algorithm and connections to statistical physics.\n\n2.2 Decimation and Simpli\ufb01cation\nThe magnetization of a variable i, de\ufb01ned as M (i) := |\u00b5i(0) \u00b5i(1)|, is used as a heuristic bias\nto determine how constrained the variable is to take a particular value. The magnetization can be a\nmaximum of one which occurs when either of the marginals is one and a minimum of zero when\nthe estimated marginals are equal.1 The decimation procedure involves setting the variable(s) with\nthe highest magnetization(s) to their most likely values based on the relative magnitude of \u00b5i(0) vs.\n\u00b5i(1) [Lines 12-13].\nThe algorithm then branches on these variable assignments and simpli\ufb01es the formula by unit\npropagation [Line 15]. In unit propagation, we recursively iterate over all the clauses that the\ndecimated variable appears in. If the polarity of the variable in a literal matches its assignment, the\nclause is satis\ufb01ed and hence, the corresponding clause node and all its incident variable edges are\n\n1Other heuristic biases are also possible. For instance, [23] use the bias 1 min(\u00b5i(1), \u00b5i(0)).\n\n3\n\n\fAlgorithm 1 SurveyInspiredDecimation(V, C)\n1: Initialize V V and C C\nInitialize messages {\u2318a!i}a2C,i2V (a) at random\n2:\n3: while (Pi |\u00b5i(0) \u00b5i(1)| >\u270f ) do\n. Message passing inference\nrepeat\nuntil Convergence to {\u2318\u21e4a!i}\nfor i = 1, . . . ,|V| do\nend for\n. Branching (Decimation)\nChoose i\u21e4 arg maxi2V |\u00b5i(0) \u00b5i(1)|\nSet y\u21e4 arg maxy2{0,1} \u00b5i\u21e4(y)\n. Simpli\ufb01cation\nUpdate V,C UnitPropagate(V,C[{ xi\u21e4 = y\u21e4})\n\n4:\n5:\n6:\n7:\n8:\n9:\n10:\n11:\n12:\n13:\n14:\n15:\n16: end while\n17: return LocalSearch(V,C)\n\n{\u2318a!i} SP-Update(V,C,{\u2318a!i})\n\n\u00b5i(0), \u00b5i(1), \u00b5i(\u21e4) Marginalize(V,C,{\u2318a!i)\n\nremoved from the factor graph. If the polarity in the literal does not match the assignment, only the\nedge originating from this particular variable node incident to the clause node is removed from the\ngraph. For example, setting variable k to 0 in Figure 1 leads to removal of edges incident to k from a\nand c, as well as all outgoing edges from b (because b is satis\ufb01ed).\n\n2.3 Survey Inspired Decimation\n\nThe full iterative process of survey propagation (on the simpli\ufb01ed graph from the previous iteration)\nfollowed by decimation is continued until a satisfying assignment is found, or a stopping condition is\nreached beyond which the instance is assumed to be suf\ufb01ciently easy for local search using a standard\nalgorithm such as WalkSAT [31]. Note that when the factor graph is a tree and survey propagation\nconverges to the exact warning message probabilities, Algorithm 1 is guaranteed to select good\nvariables to branch on and to \ufb01nd a solution (assuming one exists).\nHowever, the factor graphs for CSPs are far from tree-like in practice and thus, the main factor\naffecting the success of survey inspired decimation is the quality of the estimated marginals. If\nthese estimates are inaccurate, it is possible that the decimation procedure chooses to \ufb01x variables in\ncontradictory con\ufb01gurations. To address this issue, we propose to use streamlining constraints.\n\n3 Streamlining survey propagation\n\nCombinatorial optimization algorithms critically depend on good heuristics for deciding where\nto branch during search [5]. Survey propagation provides a strong source of information for the\ndecimation heuristic. As discussed above, the approximate nature of message-passing implies that\nthe \u201csignal\" might be misleading. We now describe a more effective way to use the information from\nsurvey propagation.\nWhenever we have a combinatorial optimization problem over X = {0, 1}n and wish to \ufb01nd a\nsolution s 2 S \u2713 X, we may augment the original feasibility problem with constraints that partition\nthe statespace X into disjoint statespaces and recursively search the resulting subproblems. Such\npartitioning constraints can signi\ufb01cantly simplify search by exploiting the structure of the solution\nset S and are known as streamlining constraints [17]. Good streamlining constraints will provide a\nbalance between yielding signi\ufb01cant shrinkage of the search space and safely avoiding reductions\nin the solution density of the resulting subproblems. Partitioning the space based on the value of a\nsingle variable (like in decimation) performs well on the former at the cost of the latter. We therefore\nintroduce a different constraining strategy that strives to achieve a more balanced trade-off.\n\n4\n\n\f3.1 Streamlining constraints for constraint satisfaction problems\nThe success of survey inspired decimation relies on the fact that marginals carry some signal about\nthe likely assignments of variables. However, the factor graph becomes more dense as the constraint\ndensity approaches the phase transition threshold, making it harder for survey propagation to converge\nin practice. This suggests that the marginals might provide a weaker signal to the decimation procedure\nin early iterations. Instead of selecting a variable to freeze in some con\ufb01guration as in decimation,\ne.g., xi = 1, we propose a strictly more general streamlining approach where we use disjunction\nconstraints between subsets of highly magnetized variables, e.g., (xi _ xj) = 1.\nThe streamlined constraints can cut out smaller regions of the search space while still making use\nof the magnetization signal. For instance, introducing a disjunction constraint between any pair\nof variables reduces the state-space by a factor of 4/3 (since three out of four possible variable\nassignments satisfy the clause), in contrast to the decimation procedure in Algorithm 1 which reduces\nthe state space by a factor of 2. Intuitively, when branching with a length-2 clause such as (xi _ xj)\nwe make an (irreversible) mistake only if we guess the value of both variables wrong. Decimation can\nalso be seen as a special case of streamlining for the same choice of literal. To see why, we note that\nin the above example the acceptable variable assignments for decimation (xi, xj) = {(1, 0), (1, 1)}\nare a subset of the valid assignments for streamlining (xi, xj) = {(1, 0), (1, 1), (0, 1)}.\nThe success of the streamlining constraints is strongly governed by the literals selected for participat-\ning in these added disjunctions. Disjunctions could in principle involve any number of literals, and\nlonger disjunctions result in more conservative branching rules. But there are diminishing returns\nwith increasing length, and so we restrict ourselves to disjunctions of length at most two in this paper.\nLonger clauses can in principle be handled by the inference procedure used by message-passing\nalgorithms, and we leave an exploration of this extension to future work.\n\n3.2 Survey Inspired Streamlining\nThe pseudocode for survey inspired streamlining is given in Algorithm 2. The algorithm replaces the\ndecimation step of survey inspired decimation with a streamlining procedure that adds disjunction\nconstraints to the original formula [Line 16], thereby making the problem increasingly constrained\nuntil the search space can be ef\ufb01ciently explored by local search.\nFor designing disjunctions, we consider candidate variables with the highest magnetizations, similar\nto decimation. If a variable i is selected, the polarity of the literal containing the variable is positive if\n\u00b5i(1) > \u00b5i(0) and negative otherwise [Lines 12-15].\nDisjunctions use the signal from the survey propagation messages without overcommitting to a\nparticular variable assignment too early (as in decimation). Speci\ufb01cally, without loss of generality,\nif we are given marginals \u00b5i(1) > \u00b5i(0) and \u00b5j(1) > \u00b5j(0) for variables i and j, the new update\nadds the streamlining constraint xi _ xj to the problem instead of overcommitting by constraining\ni or j to its most likely state. This approach leverages the signal from survey propagation, namely\nthat it is unlikely for \u00acxi ^\u00ac xj to be true, while also allowing for the possibility that one of the two\nmarginals may have been estimated incorrectly. As long as streamlined constraints and decimation\nuse the same bias signal (such as magnetization) for ranking candidate variables, adding streamlined\nconstraints through the above procedure is guaranteed to not degrade performance compared with the\ndecimation strategy in the following sense.\nProposition 1. Let F be a formula under consideration for satis\ufb01ability, Fd be the formula obtained\nafter one round of survey inspired decimation, and Fs be the formula obtained after one round of\nsurvey inspired streamlining. If Fd is satis\ufb01able, then so is Fs.\n\nProof. Because unit-propagation is sound, the formula obtained after one round of survey inspired\ndecimation is satis\ufb01able if and only if (F ^ `i\u21e4) is satis\ufb01able, where the literal `i\u21e4 denotes either xi\u21e4\nor \u00acxi\u21e4. By construction, the formula obtained after one round of streamlining is F ^ (`i\u21e4 _ `j\u21e4). It\nis clear that if (F ^ `i\u21e4) is satis\ufb01able, so is F ^ (`i\u21e4 _ `j\u21e4). Clearly, the converse need not be true.\n3.3 Algorithmic design choices\nA practical implementation of survey inspired streamlining requires setting some design hyperparam-\neters. These hyperparameters have natural interpretations as discussed below.\n\n5\n\n\f{\u2318a!i} SP-Update(V,C,{\u2318a!i})\n\n\u00b5i(0), \u00b5i(1), \u00b5i(\u21e4) Marginalize(V,C,{\u2318a!i)\n\nAlgorithm 2 SurveyInspiredStreamlining(V, C, T )\n1: Initialize V V and C C\n2: Initialize messages {\u2318a!i}a2C,i2V (a) at random\n3: whilePi |\u00b5i(0) \u00b5i(1)| \u270f do\nrepeat\n4:\n5:\nuntil Convergence to {\u2318\u21e4a!i}\n6:\nfor i = 1, . . . ,|V| do\n7:\n8:\nend for\n9:\nif t < T then\n10:\n11:\n12:\n13:\n14:\n15:\n16:\n17:\n18:\n19:\n20:\nend if\n21:\n22: end while\n23: return LocalSearch(V,C)\n\n. Add Streamlining Constraints\nChoose i\u21e4 arg maxi2V |\u00b5i(0) \u00b5i(1)|\nChoose j\u21e4 arg maxi2V,i6=i\u21e4 |\u00b5i(0) \u00b5i(1)|\nSet y\u21e4 arg maxy2{0,1} \u00b5i\u21e4(y)\nSet w\u21e4 arg maxy2{0,1} \u00b5j\u21e4(y)\nxi\u21e4 = y\u21e4 _ xj\u21e4 = w\u21e4}\nC C[{\nChoose i\u21e4 arg maxi2V |\u00b5i(0) \u00b5i(1)|\nSet y\u21e4 arg maxy2{0,1} \u00b5i\u21e4(y)\nV,C UnitPropagate(V,C[{ xi\u21e4 = y\u21e4})\n\nelse\n\nDisjunction pairing. Survey inspired decimation scales to large instances by taking the top R\nvariables as decimation candidates at every iteration instead of a single candidate (Line 13 in\nAlgorithm 1). The parameter R is usually set as a certain fraction of the total number of variables n\nin the formula, e.g., 1%. For the streamlining constraints, we take the top 2 \u00b7 R variables, and pair the\nvariables with the highest and lowest magnetizations as a disjunction constraint. We remove these\nvariables from the candidate list, repeating until we have added R disjunctions to the original set of\nconstraints. For instance, if v1,\u00b7\u00b7\u00b7 , v2R are our top decimation candidates (with signs) in a particular\nround, we add the constraints (v1 _ v2R) ^ (v2 _ v2R1) ^\u00b7\u00b7\u00b7^ (vR _ vR+1). Our procedure for\nscaling to top R decimation candidates ensures that Proposition 1 holds, because survey inspired\ndecimation would have added (v1) ^ (v2) ^\u00b7\u00b7\u00b7^ (vR) instead.\nOther pairing mechanisms are possible, such as for example (v1 _ vR+1)^ (v2 _ vR+2)^\u00b7\u00b7\u00b7^ (vR _\nvR+R). Our choice is motivated by the observation that v2R is the variable we are least con\ufb01dent\nabout - we therefore choose to pair it with the one we are most con\ufb01dent about (v1). We have found\nour pairing scheme to perform slightly better in practice.\nConstraint threshold. We maintain a streamlining constraint counter for every variable which is\nincremented each time the variable participates in a streamlining constraint. When the counter reaches\nthe constraint threshold, we no longer consider it as a candidate in any of the subsequent rounds. This\nis done to ensure that no single variable dominates the constrained search space.\nIteration threshold. The iteration threshold T determines how many rounds of streamlining con-\nstraints are performed. While streamlining constraints smoothly guide search to a solution cluster, the\ntrade-off being made is in the complexity of the graph. With every round of addition of streamlining\nconstraints, the number of edges in the graph increases which leads to a higher chance of survey\npropagation failing to converge. To sidestep the failure mode, we perform T rounds of streamlining\nbefore switching to decimation.\n\n4 Empirical evaluation\nWe streamlining constraints for random k-SAT instances for k = {3, 4, 5, 6} with n = {5 \u21e5 104, 4 \u21e5\n104, 3 \u21e5 104, 104} variables respectively and constraint densities close to the theoretical predictions\nof the phase transitions for satis\ufb01ability.\n\n6\n\n\fFigure 2: Random k-SAT solver rates (with 95% con\ufb01dence intervals) for k 2{ 3, 4, 5, 6}, for varying\nconstraint densities \u21b5. The red line denotes the theoretical prediction for the phase transition of\nsatis\ufb01ability. Survey inspired streamlining (SIS) drastically outperforms survey inspired decimation\n(SID) for all values of k.\n\n4.1 Solver success rates\n\nIn the \ufb01rst set of experiments, we compare survey inspired streamlining (SIS) with survey inspired\ndecimation (SID). In line with [7], we \ufb01x R = 0.01n and each success rate is the fraction of 100\ninstances solved for every combination of \u21b5 and k considered. The constraint threshold is \ufb01xed to\n2. The iteration threshold T is a hyperparameter set as follows. We generate a set of 20 random\nk-SAT instances for every \u21b5 and k. For these 20 \u201ctraining\" instances, we compute the empirical\nsolver success rates varying T over {10, 20, . . . , 100}. The best performing value of T on these train\ninstances is chosen for testing on 100 fresh instances. All results are reported on the test instances.\nResults. As shown in Figure 2, the streamlining constraints have a major impact on the solver success\nrates. Besides the solver success rates, we compare the algorithmic thresholds which we de\ufb01ne to\nbe the largest constraint density for which the algorithm achieves a success rate greater than 0.05.\nThe algorithmic thresholds are pushed from 4.25 to 4.255 for k = 3, 9.775 to 9.8 for k = 4, 20.1 to\n20.3 for k = 5, and 39 to 39.5 for k = 6, shrinking the gap between the algorithmic thresholds and\ntheoretical limits of satis\ufb01ability by an average of 16.3%. This is signi\ufb01cant as there is virtually no\nperformance overhead in adding streamlining constraints.\nDistribution of failure modes. Given a satis\ufb01able instance, solvers based on survey propagation\ncould fail for two reasons. First, the solver could fail to converge during message passing. Second,\nthe local search procedure invoked after simpli\ufb01cation of the original formula could timeout which is\nlikely to be caused due to a pathological simpli\ufb01cation that prunes away most (or even all) of the\nsolutions. In our experiments, we \ufb01nd that the percentage of failures due to local search timeouts in\nSID and SIS are 36% and 24% respectively (remaining due to non-convergence of message passing).\nThese observations can be explained by observing the effect of decimation and streamlining on the\ncorresponding factor graph representation of the random k-SAT instances. Decimation simpli\ufb01es the\nfactor graph as it leads to the deletion of variable and factor nodes, as well as the edges induced by\nthe deleted nodes. This typically reduces the likelihood of non-convergence of survey propagation\nsince the graph becomes less \u201cloopy\u201d, but could lead to overcon\ufb01dent (incorrect) branching decisions\nespecially in the early iterations of survey propagation. On the other hand, streamlining takes smaller\nsteps in reducing the search space (as opposed to decimation) and hence are less likely to make\ninconsistent variable assignments. However, a potential pitfall is that these constraints add factor\nnodes that make the graph more dense, which could affect the convergence of survey propagation.\n\n7\n\n4.234.244.254.264.270.00.20.40.60.81.0SolverSuccessRatek=39.709.759.809.859.909.950.00.20.40.60.81.0SolverSuccessRatek=4PhaseTransitionSIDSIS20.020.220.420.620.821.021.20.00.20.40.60.81.0SolverSuccessRatek=5383940414243440.00.20.40.60.81.0SolverSuccessRatek=6\fFigure 3: Marginal prediction calibration (blue) and sampled solution distances (green) during solver\nrun on 3-SAT with 5000 variables, \u21b5 = 4.15, T = 90.\n\nFigure 4: Top: Correlation between magnetization and estimated marginal probabilities for the same\nproblem instance as we add streamlining constraints. Bottom: Histogram of variables magnetizations.\nAs streamlining constraints are added, the average con\ufb01dence of assignments increases.\n\n4.2 Solution cluster analysis\n\nFigures 3 and 4 reveal the salient features of survey inspired streamlining as it runs on an instance\nof 3-SAT with a constraint density of \u21b5 = 4.15, which is below the best achievable density but is\nknown to be above the clustering threshold \u21b5d(3) \u21e1 3.86. The iteration threshold, T was \ufb01xed to 90.\nAt each iteration of the algorithm we use SampleSAT [35] to sample 100 solutions of the streamlined\nformula. Using these samples we estimate the marginal probabilities of all variables i.e., the fraction\nof solutions where a given variable is set to true. We use these marginal probabilities to estimate the\nmarginal prediction calibration i.e., the frequency that a variable which survey propagation predicts\nhas magnetization at least 0.9 has an estimated marginal at least as high as the prediction.\nThe increase in marginal prediction calibrations during the course of the algorithm (Figure 3, blue\ncurve) suggests that the streamlining constraints are selecting branches that preserve most of the\nsolutions. This might be explained by the decrease in the average Hamming distance between pairs\nof sampled solutions over the course of the run (green curve). This decrease indicates that the\nstreamlining constraints are guiding survey propagation to a subset of the full set of solution clusters.\nOver time, the algorithm is also \ufb01nding more extreme magnetizations, as shown in the bottom three\nhistograms of Figure 4 at iterations 0, 50, and 95. Because magnetization is used as a proxy for\nhow reliably one can branch on a given variable, this indicates that the algorithm is getting more\nand more con\ufb01dent on which variables it is \u201csafe\u201d to branch on. The top plots of Figure 4 show the\nempirical marginal of each variable versus the survey propagation magnetization. These demonstrate\nthat overall the survey propagation estimates are becoming more and more risk-averse: by picking\nvariables with high magnetization to branch on, it will only select variables with (estimated) marginals\nclose to one.\n\n8\n\n020406080100Iteration0.700.750.800.850.900.951.00MarginalPredictionCalibration02004006008001000AverageSolutionDistances1.51.00.50.00.51.01.5SurveyPropagationMagnetization1.51.00.50.00.51.01.5VariableMagnetizationIteration01.00.50.00.51.0SurveyPropagationMagnetization0100200300400500600700NumberofVariables1.51.00.50.00.51.01.5SurveyPropagationMagnetization1.51.00.50.00.51.01.5VariableMagnetizationIteration501.00.50.00.51.0SurveyPropagationMagnetization0100200300400500600700NumberofVariables1.51.00.50.00.51.01.5SurveyPropagationMagnetization1.51.00.50.00.51.01.5VariableMagnetizationIteration951.00.50.00.51.0SurveyPropagationMagnetization020040060080010001200NumberofVariables\f(a)\n\n(b)\n\n(c)\n\nFigure 5: (a, b) Random k-SAT solver rates (with 95% con\ufb01dence intervals) for k 2{ 5, 6} testing\nintegration with Dimetheus. (c) XORSAT solver rates (with 95% con\ufb01dence intervals).\n\nIntegration with downstream solvers\n\n4.3\nThe survey inspired streamlining algorithm provides an easy \u201cblack-box\" integration mechanism\nwith other solvers. By adding streamlining constraints in the \ufb01rst few iterations as a preprocessing\nroutine, the algorithm carefully prunes the search space and modi\ufb01es the original formula that can be\nsubsequently fed to any external downstream solver. We tested this procedure with Dimetheus [16]\n\u2013 a competitive ensemble solver that won two recent iterations of the SAT competitions in the\nrandom k-SAT category. We \ufb01xed the hyperparameters to the ones used previously. We did not \ufb01nd\nany statistically signi\ufb01cant change in performance for k = 3, 4; however, we observe signi\ufb01cant\nimprovements in solver rates for higher k (Figure 5a,5b).\n\n4.4 Extension to other constraint satisfaction problems\nThe survey inspired streamlining algorithm can be applied to any CSP in principle. Another class\nof CSPs commonly studied is XORSAT. An XORSAT formula is expressed as a conjunction of\nXOR constraints of a \ufb01xed length. Here, we consider constraints of length 2. An XOR operation\n between any two variables can be converted to a conjunction of disjunctions by noting that\nxi xj = (\u00acxi _\u00ac xj) ^ (xi _ xj), and hence, any XORSAT formula can be expressed in CNF\nform. Figure 5c shows the improvements in performance due to streamlining. While we note that\nthe phase transition is not as sharp as the ones observed for random k-SAT (in both theory and\npractice [11, 28]), including streamlining constraints can improve the solver performance.\n\n5 Conclusion\n\nVariational inference algorithms based on survey propagation achieve impressive performance for\nconstraint satisfaction problems when employing the decimation heuristic. We explored cases where\ndecimation failed, motivating a new branching procedure based on streamlining constraints over\ndisjunctions of literals. Using these constraints, we developed survey inspired streamlining, an\nimproved algorithm for solving CSPs via variational approximations. Empirically, we demonstrated\nimprovements over the decimation heuristic on random CSPs that exhibit sharp phase transitions\nfor a wide range of constraint densities. Our solver is available publicly at https://github.com/\nermongroup/streamline-vi-csp.\nAn interesting direction for future work is to integrate streamlining constraints with backtracking.\nBacktracking expands the search space, and hence it introduces a computational cost but typically\nimproves statistical performance. Similar to the backtracking procedure proposed for decimation [23],\nwe can backtrack (delete) streamlining constraints that are unlikely to render the original formula\nsatis\ufb01able during later iterations of survey propagation. Secondly, it would be interesting to perform\nsurvey propagation on clusters of variables and to use the joint marginals of the clustered variables to\ndecide which streamlining constraints to add. The current approach makes the simplifying assumption\nthat the variable magnetizations are independent of each other. Performing survey propagation on\nclusters of variables could greatly improve the variable selection while incurring only a moderate\ncomputational cost. Finally, it would be interesting to extend the proposed algorithm for constraint\nsatisfaction in several real-world applications involving combinatorial optimization such as planning,\nscheduling, and probabilistic inference [20, 8, 9, 39, 37].\n\n9\n\n20.020.220.420.620.821.021.20.00.20.40.60.81.0SolverSuccessRatek=5PhaseTransitionDimetheusStreamlining+Dimetheus383940414243440.00.20.40.60.81.0SolverSuccessRatek=60.070.140.210.280.350.420.490.560.630.00.20.40.60.81.0SolverSuccessRatek=2PhaseTransitionSIDSIS\fAcknowledgments\n\nThis research was supported by NSF (#1651565, #1522054, #1733686) and FLI. AG is supported by\na Microsoft Research PhD Fellowship and a Stanford Data Science Scholarship. We are grateful to\nNeal Jean for helpful comments on early drafts.\n\nReferences\n[1] T. Achim, A. Sabharwal, and S. Ermon. Beyond parity constraints: Fourier analysis of hash\n\nfunctions for inference. In International Conference on Machine Learning, 2016.\n\n[2] D. Achlioptas and P. Jiang. Stochastic integration via error-correcting codes. In Uncertainty in\n\nArti\ufb01cial Intelligence, 2015.\n\n[3] D. Achlioptas and Y. Peres. The threshold for random-SAT is 2k log 2 O(k). Journal of the\n[4] M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information\n\nAmerican Mathematical Society, 17(4):947\u2013973, 2004.\n\nextraction for the web. In International Joint Conference on Arti\ufb01cial Intelligence, 2007.\n\n[5] A. Biere, M. Heule, H. van Maaren, and T. Walsh. Handbook of satis\ufb01ability. Frontiers in\n\narti\ufb01cial intelligence and applications, vol. 185, 2009.\n\n[6] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational inference: A review for statisticians.\n\nJournal of the American Statistical Association, 112(518):859\u2013877, 2017.\n\n[7] A. Braunstein, M. M\u00e9zard, and R. Zecchina. Survey propagation: An algorithm for satis\ufb01ability.\n\nRandom Structures & Algorithms, 27(2):201\u2013226, 2005.\n\n[8] M. Chen, Z. Zhou, and C. J. Tomlin. Multiplayer reach-avoid games via low dimensional\n\nsolutions and maximum matching. In American Control Conference, 2014.\n\n[9] M. Chen, Z. Zhou, and C. J. Tomlin. Multiplayer reach-avoid games via pairwise outcomes.\n\nIEEE Transactions on Automatic Control, 62(3):1451\u20131457, 2017.\n\n[10] A. Coja-Oghlan and K. Panagiotou. The asymptotic k-SAT threshold. Advances in Mathematics,\n\n288:985\u20131068, 2016.\n\n[11] H. Daud\u00e9 and V. Ravelomanana. Random 2-XORSAT at the satis\ufb01ability threshold. In Latin\n\nAmerican Symposium on Theoretical Informatics, pages 12\u201323. Springer, 2008.\n\n[12] J. Ding, A. Sly, and N. Sun. Proof of the satis\ufb01ability conjecture for large k. In Symposium on\n\nTheory of Computing, 2015.\n\n[13] M. B. Do and S. Kambhampati. Planning as constraint satisfaction: Solving the planning graph\n\nby compiling it into csp. Arti\ufb01cial Intelligence, 132(2):151\u2013182, 2001.\n\n[14] S. Ermon, C. P. Gomes, A. Sabharwal, and B. Selman. Low-density parity constraints for\nhashing-based discrete integration. In International Conference on Machine Learning, 2014.\n\n[15] J. Euzenat, P. Shvaiko, et al. Ontology matching, volume 333. Springer, 2007.\n[16] O. Gableske, S. M\u00fcelich, and D. Diepold. On the performance of CDCL-based message passing\n\ninspired decimation using \u21e2--PMP-i. Pragmatics of SAT, POS, 2013.\n\n[17] C. Gomes and M. Sellmann. Streamlined constraint reasoning. In Principles and Practice of\n\nConstraint Programming, pages 274\u2013289. Springer, 2004.\n\n[18] C. P. Gomes, W. J. van Hoeve, A. Sabharwal, and B. Selman. Counting CSP solutions using\n\ngeneralized XOR constraints. In AAAI Conference on Arti\ufb01cial Intelligence, 2007.\n\n[19] A. Grover and S. Ermon. Variational bayes on Monte Carlo steroids. In Advances in Neural\n\nInformation Processing Systems, 2016.\n\n[20] J. Hromkovi\u02c7c. Algorithmics for hard problems: introduction to combinatorial optimization,\n\nrandomization, approximation, and heuristics. Springer Science & Business Media, 2013.\n\n[21] L. Kroc, A. Sabharwal, and B. Selman. Message-passing and local heuristics as decimation\n\nstrategies for satis\ufb01ability. In Symposium on Applied Computing, 2009.\n\n[22] E. Maneva, E. Mossel, and M. J. Wainwright. A new look at survey propagation and its\n\ngeneralizations. Journal of the ACM, 54(4):17, 2007.\n\n10\n\n\f[23] R. Marino, G. Parisi, and F. Ricci-Tersenghi. The backtracking survey propagation algorithm\n\nfor solving random k-SAT problems. Nature communications, 7:12996, 2016.\n\n[24] M. M\u00e9zard, G. Parisi, and R. Zecchina. Analytic and algorithmic solution of random satis\ufb01ability\n\nproblems. Science, 297(5582):812\u2013815, 2002.\n\n[25] D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions of SAT problems. In\n\nAAAI Conference on Arti\ufb01cial Intelligence, 1992.\n\n[26] M. Molloy. Models for random constraint satisfaction problems. SIAM Journal on Computing,\n\n32(4):935\u2013949, 2003.\n\n[27] W. Nuijten. Time and resource constrained scheduling: a constraint satisfaction approach. PhD\n\nthesis, TUE: Department of Mathematics and Computer Science, 1994.\n\n[28] B. Pittel and G. B. Sorkin. The satis\ufb01ability threshold for k-XORSAT. Combinatorics, Proba-\n\nbility and Computing, 25(2):236\u2013268, 2016.\n\n[29] H. Ren, R. Stewart, J. Song, V. Kuleshov, and S. Ermon. Adversarial constraint learning for\n\nstructured prediction. International Joint Conference on Arti\ufb01cial Intelligence, 2018.\n\n[30] M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107\u2013136,\n\n2006.\n\n[31] B. Selman, H. A. Kautz, and B. Cohen. Noise strategies for improving local search. In AAAI\n\nConference on Arti\ufb01cial Intelligence, 1994.\n\n[32] P. Singla and P. Domingos. Entity resolution with Markov logic. In International Conference\n\non Data Mining, 2006.\n\n[33] R. Stewart and S. Ermon. Label-free supervision of neural networks with physics and domain\n\nknowledge. In AAAI Conference on Arti\ufb01cial Intelligence, 2017.\n\n[34] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational\n\ninference. Foundations and Trends in Machine Learning, 1(1-2):1\u2013305, 2008.\n\n[35] W. Wei, J. Erenrich, and B. Selman. Towards ef\ufb01cient sampling: exploiting random walk\n\nstrategies. In AAAI Conference on Arti\ufb01cial Intelligence, 2004.\n\n[36] S. Zhao, S. Chaturapruek, A. Sabharwal, and S. Ermon. Closing the gap between short and long\n\nXORs for model counting. In AAAI Conference on Arti\ufb01cial Intelligence, 2016.\n\n[37] Z. Zhou, J. Ding, H. Huang, R. Takei, and C. Tomlin. Ef\ufb01cient path planning algorithms in\n\nreach-avoid problems. Automatica, 89:28\u201336, 2018.\n\n[38] Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, and P. W. Glynn. Stochastic mirror descent in\nvariationally coherent optimization problems. In Advances in Neural Information Processing\nSystems, 2017.\n\n[39] Z. Zhou, M. P. Vitus, and C. J. Tomlin. Convexity veri\ufb01cation for a hybrid chance constrained\n\nmethod in stochastic control problems. In American Control Conference, 2014.\n\n11\n\n\f", "award": [], "sourceid": 6749, "authors": [{"given_name": "Aditya", "family_name": "Grover", "institution": "Stanford University"}, {"given_name": "Tudor", "family_name": "Achim", "institution": "Helm.ai"}, {"given_name": "Stefano", "family_name": "Ermon", "institution": "Stanford"}]}