NeurIPS 2020

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces


Meta Review

This paper studies the exploration problem in episodic reinforcement learning with kernel and neural network function approximations. The authors propose a novel algorithm which is an optimistic version of least-squares value iteration, where the solution to the standard LSVI is further added by a bonus function for exploration. They derive regret bounds for this algorithm for two different function classes: RKHS and NTK. Overall, the technical contribution in this paper seems solid. Some reviewers had some concerns about the assumptions made for the analysis, especially regarding the one assuming that the Bellman optimality update lies in the RKHS. There was a rather long discussion between two reviewers, one of them being more negative about this assumption and what it implies. Although the AC and senior AC agree with some of the points raised by this reviewer, this work still had some novelty to it and has the potential of encouraging further research in this direction. Overall, we recommend acceptance as a poster.