NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5575
Title:G2SAT: Learning to Generate SAT Formulas

Reviewer 1

In this paper, a deep generative model that learns to generate SAT formulae similar to a set of formulae provided as input for the training time is proposed. The approach is novel as far as I can tell. The paper is well written and provides sufficient level of details. The experimental evaluation is brief, but shows promise. indeed, G2SAT seems to generate CNFs which are similar to those provided as input, and which can be used as additional benchmarks. In the training, 10 smallest SATLIB/SAT competitions benchmarks are used, resulting in CNFs with 82 to 1122 variables and 327 to 4555 clauses. Would G2SAT scale to using larger instances? In the generation of new instances, are these of similar size to the instances used in the training time? I did not find a mention about this. Furthermore, it would be interesting to know about the the time wise scaling of G2SAT (this is essential to demonstrate the stated efficiency of G2SAT). How long does it take to generate new instances? Does this scale to larger instances? Pros: - A way to generate SAT formulas with similar properties as the formulas provided as input the the training of the model - Experiments suggest that the instances generated indeed resemble the original ones provided as input in terms of several measures Cons: - No mention of how does the approach scale - No access to code during reviewing ======================== After response period: The author's response clarified many of the concerns raised in the reviews. The additional results provided should definitely be included in the paper.

Reviewer 2

The procedure for generating SAT instances is original as far as the reviewer knows (although the reviewer is not very familiar with related literature). The proposed method seems technically sound. The empirical evaluation seems fair: the results are compared to several relevant baselines from the literature and the generated SAT instances seems to align better with the data they are trying to mimic on most of the metrics, including the ranking of industrial vs random SAT solver performance. It would be nice to see actual run-time numbers for the SAT solvers - even though the argument in lines 263-264 suggests that post-processing can significantly change these numbers, the reviewer thinks that the results should be still useful to the reader. The application of tuning SAT solver hyperparameters seems useful. It is unclear how much learning contributes to the success of the method - comparison of variations of the method with a higher or lower capacity GCN would be useful to gauge how much learning occurs on the rather small dataset of SAT instances. The paper is clearly written, although the reviewer didn't find exact description of the networks and optimization procedures used for training in the submission. The reviewer thinks that the paper explores an interesting problem at the intersection of SAT solving and machine learning, and future work is likely to build on the ideas presented in the paper.

Reviewer 3

That these types of techniques are really essential for "developing better and faster SAT solvers" is quite questionable, since the SAT community has been making quite good progress in pushing solvers forward and still are, and the benchmark situation has considerably improved over the years. Nevertheless, from an academic perspective building generators for real-world like formulas in interesting. Perhaps the biggest issue with the work is in that it does not really seem to scale. The authors use the 10 *smallest* "real-world" instances in their experiments. Small instances are really not interesting from the perspective of SAT solver development, as most real-world instances are quite huge. In my opinion the authors would need to address at the very least the scalability issues to warrant publication at a major venue. Regarding the comparison of graph statistics of the generated instances, I do not see what to make of these. It is unclear to me to what extent the instances are *different* from the instances started from. In fact, one could also interpret the fact that solvers rankings do not change as being an artifact of the potential fact that the instances do not really change much. The authors should convincingly argue that the generated instances would be in some sense really (in an interesting way) also *different* from the original instances (slightly confusing a formula can be done without any machine learning, most definitely!) The "application" of "developing better SAT solvers" seems somewhat absurd to me, as I do not understand what we could really take away from the presented observations, and what is the actual supposed role of the generation technique presented to this. Finally there appears to be very closely related work presented at SoCS'19: The authors should cite this work and explain their own contributions in relations to that work.