NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3096
Title:Variance Reduced Policy Evaluation with Smooth Function Approximation

The main contribution of this paper is in solving the finite-sum minimax problem arising from off-line policy evaluation with nonlinear function approximation. The minimax problem is non-convex in the primal variable and strong convexity in the dual subproblem, and a single time-scale algorithm is proposed to find an approximate stationary point. Although it does not address the full stochastic TD learning problem, the progress in the finite-sum off-line version is quite meaningful.