Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proposes a new end-to-end method that can solve a range of combinatorial optimization (CP) problems. Although such RL approaches have been proposed before, a novel two-stage approach is devised in this paper and extensive experiments demonstrate its effectiveness. Weakness: 1. The improvement of the proposed method over existing RL method is not impressive. 2. Compared to OR tools and RL baselines, the time and computational cost should be reported in detail to fairly compare different methods. Comment after feedback: The authors have addressed the concerns of running time. Since the applying RL in combinatorial optimization is not new, the lack of comparisons between existing RL methods makes it less convincing. Reinforcement Learning for Solving the Vehicle Routing Problem, Mohammadreza Nazari. ATTENTION, LEARN TO SOLVE ROUTING PROBLEMS!, Max Welling. Exact Combinatorial Optimization with Graph Convolutional Neural Networks, Maxime Gasse. Learning Combinatorial Optimization Algorithms over Graphs, Le Song.
Originality: The work proposes an interesting approach with a factorized policy to perform RL by performing iterative improvement over the current solution until convergence is achieved. Using RL for combinatorial optimization is not new, even though the specific idea that the authors propose differs from previous works that I am aware of. Quality: I believe the work is technically sound. It uses standard policy gradient theorems and the well-known actor-critic framework. Clarity: The paper is clearly written with sufficient experimental details. On the other hand, I would like to see more about connection to local search and some further improvements (see point 5 below). Significance: Given that (1) solving NP-hard combinatorial problems is a very important task and (2) the experimental results outperform standard solvers, I think that the current work is fairly significant due to its practicability. UPDATE I have read both the rebuttal and the other reviews. I appreciate the authors' feedback, particularly with regard to adding more information on the connections to local search, and the wider applicability of this work. For this reason, I am willing to increase my score to 7. I would just encourage the authors to ensure that, if accepted, the camera-ready will clarify all points raised in the rebuttal.
After rebuttal: The discussion of the method applicability in the rebuttal is convinced for me. I upgrade my score to 7. This paper proposes a learning-based approach for combinatorial optimization problems. Starting from an initial complete solution of the problem, several local rewriting updates are applied to the solution iteratively. In each rewriting step, a local region and an updating rule are picked to update the solution and two networks are trained by reinforcement learning to pick local regions and updating rules. Experiments on expression simplification, job scheduling and vehicle routing problems are conducted to validate the effectiveness of the proposed approach. The main concern of this paper is the applicable range of the proposed method is unclear. I think the local rewriting approach stands under several assumptions, the complete and legal solution is easily initialized, the reasonable local regions and updating rules can be manually defined, the rewards are densely distributed for solutions. It seems this approach may not work in the problem where we need to find a specific target and therefore the state transition is important. For example, in the problem of theorem proving, I don't think it could work to rewrite the proofs directly if we think proofs are the expected solutions, as stated in line 63. Also, how to design local regions and updating rules may be important and ambiguous in specific problems. I agree that local rewriting should be applicable to a wide range of problems, but I think this range and underlying assumptions should be formally and explicitly clarified in the paper. For example, to demonstrate the targeted problems in the introduction. It could help to clarify the contributions of this work and explain the choices of tasks in experiments.