NeurIPS 2020

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Meta Review

The paper focuses on efficiently exploring MDPs with high dimensional state representations, by combining an unsupervised algorithm for learning a low-dimensional representation and then solving the problem in this low-dimensional space. The paper is largely theoretic and show that in certain conditions, near-optimal policies can be found with polynomial complexity in the number of latent states. The reviewers mostly agreed on the following points. The paper is considered well-written, and presents theoretically strong results that are sound, novel, and non-trivial. As weaknesses of the paper the reviewers mentioned the lack of empirical results in more realistic settings and restrictive assumptions. The reviewers had a couple of doubts about the paper, which mostly were addressed in the rebuttal (e.g. is improvement due to exploration, how is a discrete set of latent states, errorbars). Everything considered, in accordance with the opinion of the reviewer, I recommend acceptance for this paper. I would like to ask the authors to make sure the clarifications given in the rebuttal are reflected in the final version of the manuscript.