This paper proposes a method to learn shaping rewards in RL to improve learning. The authors clearly explain the problem and their method. The experimental results show clearly their method working as intended. I would expect the authors to update the final draft of their manuscript with the additional experiments provided in the author response and referencing and discussing the relation of their method to crucial pieces of prior work suggested by reviewers, in particular "Human-level performance in 3D multiplayer games with population-based reinforcement learning" which also performs bi-level optimisation of shaping rewards.