Reviews: Learning Local Search Heuristics for Boolean Satisfiability

Originality People have tried local search for some combinatorial optimization problems before, but I could not find any references to boolean satisfiability in particular. Doing curriculum training when there is a measure of difficulty, is not new. Clarity I found it clear. Quality The authors do the first step in replacing CDCL SAT Solvers, by having a model that decides which variable to flip next. The findings support the claims, although the evaluation could have been on slightly more difficult instances (e.g. that WalkSAT doesn't solve) Significance The value of SAT Solving is finding solutions to problem in the real world (plus human modelling), industrial instances were not evaluated or trained on, so it is diffcult for me to measure this.

Reviewer 2

1) Ablation studies. It would be really interesting which algorithmic choices were necessary to get the reported results eg: What about longer unrolls of the graph net T>2? How important is it to hard code that the policy with p=1/2 flips random variables of unsatisfied clauses? 2) Details on curriculum. I did not find the details to what criterion was used to decide when to ratchet up the task difficulty in curriculum learning. Could the authors clarify (or did I miss it somewhere)? 3) No value function? I’m surprised to see that the authors used straight-up REINFORCE without even a value-baseline (as explained in the appendix), bc in my experience this almost always makes a big difference in learning speed. Have the authors tried just adding a different “head” to the graph net torso that predicts expected return and use that as a baseline? 4) Figure 4: It would be interesting to see the walk-SAT baseline results here in the graph. 5) l228-231: Why is the policy trained on multiple episode on the same problem? Is this to reduce computational overhead due to initial checking for satisfiability with MiniSAT? I’m asking because attempting to solve the same problem multiple times can cause strong correlation in policy gradient. It might give a significant boost to write feasible instances into a replay buffer and then sample random formulas afresh for each gradient update.

Reviewer 3

This work is original in its use of deep reinforcement learning and graph neural networks to learn novel search control heuristics for SAT solving. While the techniques used are not novel themselves, the application domain is. The authors do a good job of surveying related work in this area and situating their contributions in this landscape. The paper is well-written and I found it very easy to follow the details of the proposed approach and the authors' results. Technically, the work presented is solid, though I have a few comments/suggestions here. Firstly, while I take the authors' point about not trying to compete on solution time with state-of-the-art SLS solvers that have been engineered and fine-tuned for decades, I think it would be worth comparing the performance of their algorithm to that of survey propagation (SP). SP, besides being one of the strongest SLS solvers on random 3-SAT, actually uses a message-passing approach as well -- and thus would make for a particularly interesting comparison with the authors' GNN-based algorithm. I laud the authors' direct and honest consideration of the strengths and drawbacks of their approach. I appreciate that they embrace the scientific interest of this work (over simply trying to advance the state-of-the-art in solver performance). I don't think they should be coy about reporting other things that may be of interest to the broader community: for example, the solution time, the gap to SP etc. Such performance numbers, even if they compare unfavorably to WalkSAT or SP, would give the community an idea of what the current gap is, and a way to measure future progress in this direction. Generalization from rand_3 (lines 287--297): interestingly, my intuition reading this section was the exact opposite! My feeling is that a heuristic that works well in formulas with no structure would *not* translate well to domains with structure -- indeed, the SAT contests traditionally have three "tracks" for this reason: random instances, "crafted" instances (like the graph problems), and industrial instances that come from applications. I would consider reframing this discussion. Table 4 raises an intriguing question: how does this divide in performance between WalkSAT and the GNN approach grow as the problem size increases? For example, it may be that the GNN+SLS combination zooms in faster on the solution, so you might even beat WalkSAT on time for large enough instances, where WalkSAT's meandering may lead to long runtimes. Finally, a couple of minor comments: - The caption for figure 3 could be clearer (which curve is which?) - Table 4: are we only looking at flips that are recommended by the heuristic, or are random flips included as well?

Paper ID:	4384
Title:	Learning Local Search Heuristics for Boolean Satisfiability

Reviewer 1

Reviewer 2

Reviewer 3