The paper proposes new theoretical results regarding universal approximation property of graph convolutional neural networks and uses and trains them for (approximately) solving optimization problems defined on graphs, in particular arising from a discretization of PDEs. The solver is trained directly from the model energy. The paper was recognized by reviewers as having an interesting contribution and meeting the quality standards. The authors are invited to submit the final version including the rebuttal points, addressing all minor revision issues and the literature connections mentioned. Showing the applicability boundaries by studying failure cases is also highly appreciated. Some additional comments from discussion: The applicability of the method seems quite limited the model must be already known (not learned, in contrast to common way of applying ML). Interestingly, the work demonstrates solving problems with constraints by introducing constraints as penalties during learning. In this case it would be a natural concern that for more complex problems the learning will be dominated by trying to satisfy the constraints, as the cost is taken in the expectation. Generalization to other topologies for problems harder that just smoothing has also not been shown. Perhaps the method may find more trustworthy applications in learning preconditioners or starting solutions for existing solvers with guarantees. Simple fixed point iteration solvers may look very similar to GNN. Learning a preconditioner would have advantage of keeping guarantees of a solver while speeding it up on average. On the literature side, the idea is not completely novel. In vision, unsupervised stereo and optical flow neural networks have been learned from a hand-engineered total variation cost models (trying to minimize the cost, not knowing the optimal solution). In [Vogel and Pock: "A Primal Dual Network for Low-Level Vision Problems" (2017) Appendix Section 3] the NN is trained with the objective to minimize the ROF functional for image restoration. These networks are not graph-based, but nevertheless, this key idea has been used before. On the side of message-passing algorithms like belief propagation, there've been many works training generalized solvers in a supervised setting. I.e. learning the inference (solver) directly, bypassing the model.