__ Summary and Contributions__: The authors propose a novel linear-sized SDP for MAP inference in Potts models that the also extend to a sampling strategy in the low temperature limit. Experimental results on synthetic and real data sets are used to validate the proposed scheme.

__ Strengths__: The SDP construction appears novel to me and the problem of MAP estimation in such models is of significant interest.

__ Weaknesses__: - Experiments don't really compare against anything but AIS (which is a reasonable baseline). There are lots of other (approximate) inference techniques that should be compared against.
- For image segmentation, it might be good to place this in the context of the state-of-the-art for this task (even though, of course the goals isn't necessarily to beat this).
- The partition function approximation scheme only really makes sense to me very close to the zero temperature limit. I don't think that this approach would really be super interesting beyond that. For comparison, how does importance sampling with say a naive mean field (fit) + plus uniform as a proposal do for these problems. I'd be surprised if it is too different. Again, I think it is important to have more baselines here.

__ Correctness__: The presented results appear to be correct, though I did not check all of the details in the supplementary material.

__ Clarity__: The paper is easy to read but is a bit too colloquial in parts.

__ Relation to Prior Work__: The paper is well-positioned with respect to the SDP approximations of interest, but could probably use some additional references on approximate inference in Potts models.

__ Reproducibility__: Yes

__ Additional Feedback__: With more detailed experimental results, I would be happier to increase my score. The experiments are, in my opinion, a little on the low end for NeurIPS.

__ Summary and Contributions__: The authors propose an approximation to the MAP state and the partition function of k-state undirected models, based on SDP relaxation and importance sampling. To this end, a new alternate relaxation to the MAP problem is proposed which reduces the number of constraints to be linear in the number of variables.

__ Strengths__: + Accurate MAP inference and partition function approximation belong to the hardest and most important problems in ML and AI
+ The formal presentation of the proposed method is very good and it is easy to follow the authors reasoning
+ Experiments show that the proposed method delivers very good quality and is faster than competing methods

__ Weaknesses__: - Theoretical guarantees which are provided for the approximate MAP-state and the partition function are very weak (compared to other state-of-the-art methods)
- Some important related work has not been cited
- The core contribution relies mostly on existing ideas
First, the proposed approach for solving the MAP problem is novel an very interesting. However, the last step employs a randomized rounding procedure with unknown quality. It should be possible to prove something here, but such a result is not provided in the paper. Other works in this field provide much nicer insights into the problem structure. See, e.g.,
David A. Sontag, Talya Meltzer, Amir Globerson, Tommi S. Jaakkola, Yair Weiss: Tightening LP Relaxations for MAP using Message Passing. UAI 2008: 503-510
Second, the proposed approach for approximating the partition function guarantees that the estimate is asymptotically unbiased. While this is nice, even the simplest approximation methods to the partition function deliver unbiased estimates. State-of-the-art methods require bounds on the variance or the approximation error. See, e.g.,
Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman: Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization. ICML (2) 2013: 334-342

__ Correctness__: All results are theoretically sound and are carefully derived. The same holds for the experimental evaluation.

__ Clarity__: The paper is very well written and it is easy to follow the authors reasoning.

__ Relation to Prior Work__: Some important prior work on discrete integration by hashing and quadrature based methods are not cited, but the authors asserted to include them in the camera ready version.

__ Reproducibility__: Yes

__ Additional Feedback__: The authors frequently mention a "general k-state MRF". But in fact, a "general" MRF usually means that high-order factors are supported. The authors shall use the term "pairwise k-state MRF" instead, to make clear that their method does not support high-order factors.

__ Summary and Contributions__: The paper presents a new semidefinite programming relaxation that can be used for inference with a binary multiclass (potts-model) MRFs. The approach is based on the classical k-way max cut approximation algorithm that uses SDP relaxation and randomized rounding.
The algorithm presented here uses a similar randomized rounding procedure but gives an alternative convex relaxation that is more efficient when the number of variables is higher than the number of values/labels/states.

__ Strengths__: The new SDP relaxation is useful because it only uses O(n) constraints where n is the number of random variables. In contrast the standard SDP relaxation of max k-way cut uses O(n^2) constraints. Therefore the new SDP relaxation can be much more efficient if n is much larger than k, such as in image segmentation.

__ Weaknesses__: It does not appear that the theoretical guarantees of the classical k-way max cut algorithm translate to this new SDP relaxation.
The new SDP relaxation appears to be weaker than the original one. Although the discrete optimization problem (5) is identical to the discrete optimization problem (8), the relaxation of (5) is not identical to the relaxation of (8).
It appears that a feasible solution to the relaxation of (5) leads to a feasible solution to the relaxation of (8), but not vice versa, so the original k-way SDP relaxation is tighter than the new one.
The importance sampling approach is a heuristic and the theoretical results (unbiased estimation for the partition function) are not surprising since this can be done with a generic proposal distribution. The mixing of the randomized rounding with uniform sampling is an ad-hoc heuristic that points to shortcomings of the approach.
The empirical results are anecdotal and there does not appear to be discussion of running time in the image segmentation experiments. I am not convinced the method is useful in practice -- together with the lack of strong theoretical results I find the approach promising but not convincing.

__ Correctness__: The specific claims appear to be correct, except for in the intro in line 26 where the authors incorrectly state that an exponentially large number of configurations imply hardness of optimization. Obviously this is wrong as optimization can be done for example for tree-structured MRFs via dynamic programming, etc. Similarly the authors claims that computing Z is #P complete *because* there is an exponentially number of terms (line 31). Although the problem is #P complete this does not follow simply from the number of terms to be summed. (see for example the matrix tree theorem, etc). So the authors are sloppy with their complexity theory.

__ Clarity__: generally yes but I found discussion of "theoretical guarantees" to be misleading. Some discussion the paper needs to be clarified. For example, the authors point out the classical k-way max cut approximation method has strong theoretical guarantees on the quality of the approximation but do not point out their new relaxation does not have the same approximation guarantees.

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__: I have read the authors response and that did not change my assessment. I think this is a fine piece of work but the paper needs to be much more clear about the theoretical guarantees of the approach.