NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper addresses exploration in actor critic methods where the authors identify 2 main problems: pessimistic under-exploration and Directional uninformedness. The authors propose to use UCB upper and lower bounds based on the uncertainty of the value function. All reviewers appreciated the intuitive idea and the exhaustive evaluation of the approach. The results were also considered to be very promising and the authors provided additional ablation studies with their rebuttal. There was a consensus of all reviewers that the paper is a valueable contribution to the field of reinforcement learning.