NeurIPS 2020

Zap Q-Learning With Nonlinear Function Approximation

Meta Review

The reviewers are generally supportive of the paper. They have provided some very useful feedback, and I highly encourage the authors to incorporate that feedback. Primarily, it would be ideal to complete the paper reorganization as discussed, explain the limitations in the assumption on boundedness of the iterates, provide a toy example where the boundness assumption is not on its own enough to prevent divergence of Q-learning (i.e, even under that assumption, Q-learning diverges but Zap-Q does not) and finally to sweep over the parameters in the empirical comparison (even if that means the outcome is less positive for Zap-Q).