Paper ID: | 4676 |
---|---|

Title: | Graph Structured Prediction Energy Networks |

This paper addresses weaknesses in structured prediction energy networks, which estimate a scalar score for a set of structured variables, by allowing multiple separate factors to be added together to produce such a score. The extension is non-trivial, requiring different inference algorithms to (locally) optimize the corresponding inference problem while satisfying constraints on marginals shared over factors. It also unifies existing work at the intersections of deep learning and structured prediction. The extensive experiments in the paper are a strong point. Care has been taken to provide good comparisons. The OCR datasets are constructed to show the benefits of modeling the structured output. Should the x-axis label of Fig 1a be “iterations” instead of training examples? The variability in test performance based on optimization algorithm in Fig 1a is somewhat alarming. Does inference optimization method choice become a “black art” in this approach? Despite this, the performance of GSPEN is at least very similar to (image tagging) or better than (other experiments) the comparison methods. Overall, this is an impressive paper that addresses key gaps in structured prediction for neural network/deep learning methods that reside between having explicit structural assumptions and allowing flexibility.

Based on structured SVM, the authors combine the structured prediction and the learning using hinge loss, the results is a novel model, Graph Structured Prediction Energy Networks. Overall the model is novel and the theory is mostly solid. However, I have some concerns about the inference part. 1. Marginal Polytope. The relaxation of the marginal polytope is always tricky for structured prediction. A loose relaxation might result in an efficient algorithm, but the bad quality of the solution. Tight relaxation is often unaffordable in terms of complexity. In the Inference algorithm, it is not clear what kind of relaxation of M is used. 2. Running time. The inference of a structured prediction model can always be slow. Can the authors provide a detailed running time of the proposed model on each application? ====== After Rebuttal ====== In the rebuttal, the authors still fail to give a detailed description of the local marginal polytope. It is true that GSPEN allows the practitioner to select a structured prediction algorithm (associated with some relaxation of M). However, in practice the choice of relaxation is tricky. The authors claim that they are using the local marginal polytope in [43]. However, there are also different versions of local marginal polytope in [43], e.g. Bethe approximation and Kikuchi approximation. It is also important that different approximation may result in different time complexity of inference algorithms. A tighter relaxation (e.g. Kikuchi approximation) may have a large complexity for each iteration but requires fewer iterations to reach a higher accuracy than a looser relaxation (e.g. Bethe approximation). Based on this, the complexity provided in the rebuttal might be too rough.

+ This paper is clearly motivated. The paper aims to address the limitation of structured output prediction methods, which struggle with explicitly modeling correlations between a large number of output variables. To this aim, the paper makes an incremental methodological contribution by extending a recent approach, Structured Prediction Energy Networks, by Belanger and McCallum (refs. [4,5]). + The paper demonstrably advances state of the art. Authors compare inference and learning settings and report results on the tasks of optical character recognition, image tagging, multilabel classification, and named entity recognition. - While empirical results support the claims of this paper, they are not supported by theoretical analysis. The GSPEN model description in section 3.1 seems rather ad-hoc; no intuition or theoretical insights are given. - Strengths and weaknesses of GSPEN model are not evaluated. Since GSPEN is an extension of SPEN model, it would be great to understand better the settings in which GSPEN is preferred over SPEN and vice-versa.