NeurIPS 2020

Toward the Fundamental Limits of Imitation Learning


Meta Review

The paper considers the theoretical study of imitation learning algorithms, with a focus on behavior cloning (BC) and the non-interacting setting (where the data is assumed fixed and no further data-collection is possible). The theoretical analysis focuses on minimax bounds, leading to a best-case and worst case analysis, the results are interesting (even if some of the theorems seem natural to hold) and make a contribution towards the understanding of the purely 'offline'/non-interaction setting; which is an active topic of research in the NeurIPS community. The reviewers all agreed that the setting is important, that the paper is well written, and that the theoretical results are new and interesting (even if there was some discussion about the strength of the bounds). There was a discussion among the reviewers regarding the derived bounds (and how informative they are in practice) that did not end in complete agreement among the reviewers (I summarize below). This leaves the paper in a somewhat borderline position. I am however of the opinion that, while the criticisms raised by reviewer #1 are valid, the paper makes enough of a contribution to be interesting to a large part of the NeurIPS community. As a result I recommend acceptance. Reasoning for decision and some points that should be clarified in the final version: 1) The provided presented bounds are worst and best-case analysis and are in terms of the state-space |S|, I agree with Reviewer #1 that this is limiting and I think this should be mentioned a bit more prominently in the paper, e.g. in the section where it is established that BC is optimal in terms of worst-case performance, and even dagger cannot improve upon this worst case bound, it would be good to explicitly note that this does not entail that no improvement is possible in expectation. Despite this, I think the results in Section 4, while intuitive, are interesting to the community as they neatly unify existing results and together with the lower bounds from Section 6 give optimality results for BC that I and the reviewers had not seen proven before. Irregardless of whether we believe these bounds to be very strong I do believe they will encourage new research and serve as a good reference for the community. 2) The section on the active-setting is somewhat less thorough in the paper. It feels as if part could be slightly de-emphasized to give the rest of the paper more room to breathe (especially the part on using a transition model, which has intersting results but is fairly short) 3) It should be clarified that MIMIC-MD is not a computationally efficient algorithm, and that developing such an efficient procedure from the general strategy outlined by it is an important step for future work. I do not think the fact that the algorithm is not efficient hampers the paper in any way, but it should be stated clearly.