Export Reviews, Discussions, Author Feedback and Meta-Reviews

Paper ID:	183
Title:	Smooth and Strong: MAP Inference with Linear Convergence

Current Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)

Overview: This paper studies the benefits of augmenting the linear programming relaxation of the maximum a-posteriori (MAP) inference problem in graphical models with a quadratic term, thereby achieving strong convexity.

Such augmented formulations are obtained both from the original primal and dual formulations, and in each case the resulting primal-dual relationship is studied.

Prior work has mostly focused on smoothing the LP formulation using a softmax/entropy term, with a few notable exceptions, such as [5], [17] and [18].

Rather than those previous approaches, which employ a quadratic term in the sub-problems of either a *proximal* or a *alternating direction* scheme, in the present manuscript, the quadratic smoothing term is added directly.

This can in some way be seen as a naive approach: In comparison to proximal or alternating direction schemes, convergence to the global optimum of the original problem is no longer guaranteed, and the approximation quality directly depends on the strength of the augmentation term.

The aforementioned approaches adjust the strength in an adaptive manner and thus find the global optimum, at least in the limit.

However, the formulations derived from this direct augmentation are very insightful and reveal several interesting connections to other formulations.

Moreover, the authors demonstrate in several experiments (both synthetic and real-world), that a particular optimization scheme for the primal form of the augmented dual (based on block-coordinate Frank-Wolfe) compares very favorably to other practical schemes if a commensurate strength of the quadratic penalty is chosen.

The Frank-Wolfe method has recently seen numerous useful applications in machine learning, and this is yet another interesting example.

Positive points: + The paper is exceptionally well-written and organized. Related work is cited and the authors do an excellent job at connecting the various existing approaches and putting them into perspective. + The suggested optimization scheme (FW) seems practical.

It is reasonably efficient and has the advantage that only maximization (as opposed to max-marginalization, or even softmax-marginalization) is needed, which allows for the use of certain structured potentials. + The experimental evaluation is unusually thorough for a paper of this type, involving even inference experiments based on real-world learned structured prediction models. + The paper seems to be technically correct as far as I can tell. + Some of the derivations are novel to my knowledge, and the established optimization scheme employes the recently suggested block-coordinate Frank-Wolfe scheme for simplex-constrained quadratic optimization in a novel context. + Table 1 is extremely helpful in connecting the dots.

Negative points: - From a convex optimization point of view, the proposed scheme (FW) is less advanced than either of [5], [17], and [18]. - In fact, as the authors point out, their augmented formulation is also not entirely novel, but very closely related to the sub-problem in [17] (or the one in [5], if the primal is directly augmented).

As such, some of the derivations and bounds in the manuscript also build on material developed in previous work.

A comment: The quadratic programming relaxation suggested by Kumar [4] is also a simplex-constrained convex quadratic program.

It might be interesting to extend the discussion in the paper to highlight which, if any, similarities exist between the soft-constrained primal formulation suggested in this manuscript and Kumar's convex QP relaxation - in particular from an approximation quality point of view. (Note that by adding a quadratic augmentation term, as suggested in the present manuscript, the approximation also becomes one that differs from the original LP relaxation.)