Review for NeurIPS paper: Strictly Batch Imitation Learning by Energy-based Distribution Matching

NeurIPS 2020

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Meta Review

All reviewers unanimously agree that the paper makes a nice contribution to imitation learning in the batch setting. That said, the paper has two major weaknesses: 1. The incorrectness of Lemma 1. During the discussion, the reviewers expressed confidence that the authors understand the mistake and know how to address it (see e.g., the post-rebuttal update of R4). Therefore, we are recommending acceptance conditioned on that the authors take this issue seriously, correct the technical mistake, and remove any incorrect or misleading claims associated with it. 2. Learning a model and then doing occupancy matching in the learned model is a simple and likely very strong baseline which the paper does not compare to. The authors are strongly recommended to add such a comparison in the camera-ready version. On a related note, while the algorithm only uses (s,a) pairs as data, trajectory data is often available, from which one can extract (s,a,r,s') pairs. In fact, one of the baselines (VDICE [41]) does require (s,a,r,s') data, and so does the model-based baseline mentioned above. What kind of data is available in practice should be discussed more clearly early in the paper when the problem is set up.