This paper shows how to automatically optimize hyper-parameters of RL algorithms (specifically IMPALA here) by gradient descent, while the agent is learning. Initial reviews were mixed, with all reviewers seeing it as a borderline paper, but trending towards rejection. However, after taking author feedback into account and discussing the pros and cons of the submission, a consensus towards acceptance emerged. Everyone (myself included) agrees that although this work is mostly incremental, it convincingly demonstrates that hyper-parameter optimization is possible on a wide range of RL tasks. This is a meaningful contribution, given how hyper-parameters of RL algorithms can be challenging (/ computationally intensive) to tweak.