Reviews: Better Exploration with Optimistic Actor Critic

The paper addresses exploration in actor critic methods where the authors identify 2 main problems: pessimistic under-exploration and Directional uninformedness. The authors propose to use UCB upper and lower bounds based on the uncertainty of the value function. All reviewers appreciated the intuitive idea and the exhaustive evaluation of the approach. The results were also considered to be very promising and the authors provided additional ablation studies with their rebuttal. There was a consensus of all reviewers that the paper is a valueable contribution to the field of reinforcement learning.

Paper ID:	1035
Title:	Better Exploration with Optimistic Actor Critic