NeurIPS 2020

A new convergent variant of Q-learning with linear function approximation

Meta Review

This paper presents a new objective and an algorithm, which is similar to DQN, that optimises for that objective. Similar to prior work (GTD, TDC, GQ), the algorithm is shown to be convergent under linear function approximation. Because the objective is different, the paper could have better illustrated what this means in terms of the quality of the fixed point the new algorithm converges to - this is only discussed in detail in a special case of a diagonal feature covariance matrix. The author response did not lift this concern, and it remains unclear whether the new algorithm has major benefits over existing related work. The experiments were deemed somewhat insufficient to fully convince the reviewers of this. Overall the reviewers found the paper to be interesting, and therefore recommend (though non-unanimously) to accept the paper. The authors are encouraged to provide more understanding and/or validation in the final version of the paper.