NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 2323 Learning dynamic polynomial proofs

### Reviewer 1

The work seems to instantiate the general setting of combining ML/RL with proof guidance. The instantiation however seems quite different from existing theorem provers and applied to an area typically not considered by the core ATP community. Perhaps the discussion of related work could mention and compare with theorem proving in algebra with ATPs such as Otter, Prover9, Waldmeister and E, and the work on their ML-based guidance. It is far from correct that ML-based guidance focuses on tactical systems. Most of the work so far has been done on tableau system and saturation-style systems (leanCoP/rlCoP, E/ENIGMA) that learn guidance of elementary inference rules in their calculi. Some of the ideas could be explained in a more high-level way. For example, the intuition behind the reward function. The evaluation could also be made more convincing by including more problems than just stable set. --- Reply: I have read the reviews and the authors feedback. I will keep my score - the method and results are interesting and worth publishing. I have some additional questions/comments that could be taken into account in the final version: 1. The architecture of the neural net and alternative ML methods (raised also by Reviewer 3): 1a. Nonlinearity is not an automatic argument for neural nets. For example, gradient-boosted trees have been used successfully in a number of nonlinear and symbolic domains, also involving RL. 1b. Comparison/discussion of the custom architecture used here with graph neural nets (GNN) would be interesting. Also because GNNs have been used in neural SAT solving (cited here), and SAT could be in used to attack the stable set problem. 2. Would a neural-SAT/RL architecture be a viable alternative for stable set? 3. In general, more ablation studies showing the influence of the neural/RL hyperparameters would be useful. 4. ATP References: The common inspiration for the recent multitude of ML-guided tactical systems is Gauthier's TacticToe - http://arxiv.org/abs/1804.00596 .

### Reviewer 2

As described above, I am quite positive about the paper. I consider the proposed dynamic proof technique as a valuable contribution for several problems in optimization and machine learning. The paper considers an optimization problem for which solvers are dominated by static approaches. Of course, for static approach, we may always construct examples for which the worst case run-time is attained. On the contrary, the dynamic approach can take a shortcut to the final result. The main drawback of the approach is the following doubt: Does the proposed dynamic approach always find a solution (e.g. a fully verifiable proof)? ------------------ I have read all reviews and the authors responses. The answers support the good quality of the paper. I recommend to accept it.

### Reviewer 3

The authors tackle the general problem of searching for proofs of polynomial inequalities, which includes many computational problems of interest. They design a ML agent to search for dynamic semi-algebraic proofs; this appears to be the first successful attempt of its kind. The agent is a DQN whose neural network architecture handles polynomial inequalities in a way that takes into account the symmetries of the problem, and is trained in an unsupervised environment. The authors demonstrate its effectiveness on the maximum stable set problem (NP-hard in general), compared to existing static methods, and based on experiments, argue for the efficiency of searching for a dynamic rather than a static proof. A dynamic proof is also more interpretable, and more similar to a human-constructed proof. This is complementary to work on learning strategies for automated theorem proving. More precisely, the agent works as follows. It contains a memory of polynomials that are known to be nonnegative, and those known to be zero. At each time step $t$, the agent solves a LP to find the current best bound $\gamma_t$ certified using a linear combination of polynomials in the memory. It uses the DQN to select an action (multiply one of the polynomials in memory by $x_i$ or $1-x_i$). The reward is the relative improvement at each step, which requires no supervision. The architecture impose two symmetries: (1) that the Q-function is invariant to the order of polynomials in memory, and (2) that it is invariant to relabeling of variables. The authors make a large improvement over previous methods. Personally I find this line of work very promising. The paper also has a clear exposition on both the theoretical and engineering aspects, especially on explaining static vs. dynamic proofs. Questions and suggestions: + The maximum stable set problem is the only problem considered in the paper. Have the authors tried their method on other problems and found improvements? (The fact that static proofs of level l cannot certify bounds better than n/l seems to make this problem particularly favorable for showing the success of the dynamic method. What about other problems where there is no clear bound like this?) + How essential is it that the agent uses a neural network? How much of the improvement is due to using the reinforcement learning framework (perhaps with a simple classifier), vs. due to specifically using a neural network? + I would very much like the source code to be made available. --- Reply: Thanks for the authors' response. It is good to hear of promising results for other combinatorial problems despite the differences in distribution.