NeurIPS 2020

Online Planning with Lookahead Policies

Meta Review

This paper sparked lots of discussion, and an extreme spread in review scores. Concerns were raised about the significance of the results and the potential practical usefulness, especially because of concerns about the computational aspects of the algorithm, which were not deemed sufficiently addressed in either the paper or in the author response. For instance, the paper lacks numerical experiments (e.g., examples) that show in practice how such algorithms behave, and which could have aided the intuitive understanding, as well as avoid uncertainties. However, I want to give this paper the benefit of the doubt, mainly because of the inclusion in the results of analyses of the approximate cases, which expands the scope of settings for which the analysis is appropriate a great deal. Furthermore, I believe the topic of real-time planning to be an interesting direction for algorithm development, and that even if the current paper may not have proposed the best possible algorithm for that case I believe similar algorithms (e.g., with cheaper compute in finding suitable h-lookahead policies) could be developed by others in future work, inspired by - and learning from - this paper. In short, I hope the research community would benefit from being able to read and discuss this work, which is why I'm recommending to accept this paper.