NeurIPS 2020

Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory

Meta Review

The paper presents some new results regarding the convergence of TD and Q-learning when the action-value function is represented by overparameterized neural networks. The theoretical contribution made by this paper is seen as solid. The weakness described by the reviewers are not major and can be addressed in a minor revision and I therefore recommend accepting this paper.