All of the reviewers maintained their score of a 6. Though, multiple reviewers considered lowering their score because they were disappointed by the author response and disheartened by the comments from R1 about how the authors did not address some of the comments from the previous round of reviewing. While this paper has flaws, particularly around the formulation and the experimental results, the paper may open up a new research direction of using meta-learned objectives to accelerate off-policy RL, a point that was particularly appreciated by R3. As such, I think that the paper should be accepted. Nonetheless, the authors are strongly encouraged to carefully read through each of the reviewer's comments (including the new comments in the updated reviews) and revise the paper to address the concerns.