NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 2620 End to end learning and optimization on graphs

### Reviewer 1

Usually research on graph embeddings will first learn general-purpose graph embeddings. Then the learned embeddings will be used to solve various tasks regarding the graph. This paper notices that if we combine embedding learning and task optimization in an end-to-end pipeline, better results can be achieved for the tasks. The proposed method makes intuitive sense. Instead of learning general embeddings, the method basically learns task-specific embeddings. It is not surprising that better empirical results can be achieved. But I would argue that the proposed method sacrifices the ability to learn general-purpose embeddings. The new embeddings not longer capture general properties of the graph, but just properties related to whatever optimization task at hand, such as clustering. Also, because the embeddings are no longer general-purpose, I wonder whether it is still necessary to have the explicit graph embedding layer. Would it be possible to just have a single system specifically designed for the task optimization?

### Reviewer 2

In general, the idea of this paper is pretty simple and neat. Although there are closely related ideas in the literature (and the authors cited those), this paper delivers a completely differentiable approach to the problems it is to solve. - As this paper has promised, it only works for those combinatorial problems that involve a hidden grouping or (its dual) partitioning. Many other combinatorial problems on graphs remain untouched. - They rely on efficient computation of the expected training loss, $\ell$. Although one can compute such loss for a group of problems, it is not likely to compute such an expectation in every case. It is not clear how one can approximate such differentials. - Line 243: no need for the Trace. - Line 267: the 2-approximation is not optimal, as it is (1 - 1/e) for Facility Location. - The intuitions on Section 5 are nice but are not backed up with mathematical reasoning. It is hard to reason about DNNs in general, though.

### Reviewer 3

This work builds on a strong foundation of related work; but does so in a novel way. The paper is very clearly written, which is much appreciated! As for quality, the authors perform a thorough analysis comparing against a variety of expert baselines on a variety of data sets. Their supplemental material is very comprehensive. As for significance, NeurIPS attendees will be interested in the pipeline presented and the novel incorporation of a differentiable proxy graph optimization; but they might not be as interested in the problem of graph optimization more broadly.