Paper ID:6386
Title:Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

The argumentation defending the proposed approach, and the numerical evaluation of its performance on realistic examples, are convincing. Despite the fact that the reviewers finally agree on the fact that NeurIPS might not be the best venue for this work, because of the quasi-absence of a theoretical part, I recommend to give it a chance it for the quality of the other dimensions of this work. If the paper is finally rejected, I recommend to the authors to follow the suggestions of the reviews, and to either re-submit to a more speciallized conference, or to consider a theoretical analysis (which can be expected to be rather involved).