Reviews: Implicit Generation and Modeling with Energy Based Models

Originality: Many, if not, all of the techniques have been previously proposed. However, to the best of my knowledge, combining these techniques to scale energy-based models to modern deep network architectures is a novel contribution. Energy-based models are far less popular than other forms of generative models. While some recent energy-based approaches exist (e.g. auto-regressive energy machines), this paper demonstrates a new degree of empirical success. Quality: The paper is thorough. The authors present a comprehensive set of experiments, with multiple downstream applications. Various metrics are evaluated for quantitative comparisons in each case. This is above and beyond what is expected. The one aspect that I feel is lacking is a larger discussion about the drawbacks of this approach. In the supplementary material, the authors discuss the training time and inability to evaluate log-likelihoods. It would be helpful if some of this discussion also appeared in the main paper. Clarity: Overall, the paper is clear. The discussion of energy-based models and training techniques (Section 3) is clear and helpful. Some additional technical details would have been useful. For instance, the paper would benefit from a comparison of the number of parameters, sampling time, training time, etc. Some details regarding the experiments were also unclear, particularly the set-up / training procedure for the robot hand trajectory experiments. I’m also not very familiar with the adversarial examples literature, and I found the adversarial robustness section (4.3) somewhat difficult to follow. Significance: The empirical demonstration of good generative performance with energy-based models on modern image datasets is significant. This may help to re-open this family of models as another option for generative modeling. Currently, these models are not very popular within the community. The other most significant aspect of this paper is the demonstration of improved out-of-distribution evaluation as compared with other methods. This is a recently discovered phenomenon, and the fact that these models do not suffer as much as other families of models could open directions for further study. --- Updates: I'm satisfied with the author response and increase my score to 9.

Reviewer 2

Originality: EBMs are a relatively less popular class of generative models when compared to other model families such as VAEs or GANs. Although EBMs themselves are not new, the authors proposed the use of Langevin dynamics for training as well as a sample replay buffer for stabilization, which was an original contribution. The authors’ insights into tips/tricks for stabilizing training was also a nice combination of existing techniques. Quality: The extensiveness of the empirical results demonstrated the high quality of the paper. The authors showed that their approach outperformed or were comparable to state of the art GAN and autoregressive models (e.g. SNGAN) based on FID/IS, achieved SOTA on adversarial robustness, were able to perform trajectory modeling, did well on an online learning task, and showed experiments with compositional generation. Clarity: The paper was relatively clear and easy to follow, barring minor typos/grammatical errors (a few of which I have listed in the Improvements section). Significance: I expect that the contributions of this paper will be very beneficial to the generative modeling community and will open up a new avenue for research in exploring the uses of EBMs in generative modeling, so this paper has high significance. ------------------------------- UPDATE: I appreciate the authors' responses to my questions, particularly with regards to highlighting the role of the online learning experiments in the paper and tying everything together with a better discussion. I will keep my decision to accept.

Reviewer 3

Quantitative evaluation of mode coverage using experiments on augmented MNIST, similar to Metz et al. (2017, Unrolled Generative Adversarial Networks), better shows the quality of the trained energy function (rather than qualitative evaluations). Why in the inpainting and salt and pepper experiments, some of the recovered images are completely different from the given images. For example, in the last row, the recovered plane is a different plane. Moreover, it seems that the recovered images have saturated colors (white background on row three and five, and saturated blue on the last row). Although the code is not provided with the submission, I saw the repository online and played with provided trained model and inference procedure. I think the saturated colors are an important problem of the proposed algorithm. I would like to hear the authors' comments on that. The previous methods by Xie et al. (2016, A Theory of Generative ConvNet) and Ingraham et al. (2019, Learning Protein Structure with a Differentiable Simulator), which use similar Langevin dynamics for sampling from energy-based models should be addressed in the paper. Formatting issue/typo: -Some figures and tables do not have a figure or table number associated with them, while they are referred to in the text with their numbers. -The column's title says "Salt and paper" === I've read and considered the author feedback and other reviews.

Paper ID:	1955
Title:	Implicit Generation and Modeling with Energy Based Models

Reviewer 1

Reviewer 2

Reviewer 3