Reviews: Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

The paper introduces a new method for hyperparameter optimization in policy gradient, "HOOF" (hyperparameter optimization on the fly). In the reviews and the discussions, we saw that all reviewers appreciated the method's sample efficiency, computational efficiency and performance vs. simple baselines. All three reviews recommended accept: R2 gave the strongest endorsement, R3 gave a borderline accept, and while R1 also gave a score of 7, the tone of the review was a bit equivocating, with statements such as "I am ultimately lukewarm overall because of questions I have about how to characterize the idea and whether/how it is really better."

Paper ID:	2589
Title:	Fast Efficient Hyperparameter Tuning for Policy Gradient Methods