NeurIPS 2020

On the Convergence of Smooth Regularized Approximate Value Iteration Schemes

Meta Review

This paper analyzes through the lens of approximate dynamic programming the popular techniques of Q-values smoothing (with target networks) and entropy regularization. This analysis provides theoretical insights explaining their empirical success. After author feedback and discussion all reviewers agree that this is a meaningful contribution to the better understanding of existing RL algorithms. This is thus a clear « Accept » decision. That being said, I would like to ask the authors to please add a discussion w.r.t. recent closely related work "Leverage the Average: an Analysis of Regularization in RL", which is current missing from the references.