Review for NeurIPS paper: Deep Reinforcement and InfoMax Learning

NeurIPS 2020

Deep Reinforcement and InfoMax Learning

Meta Review

This paper proposes a method to apply noise contrastive estimation for future state prediction as an auxiliary task for RL agents. The authors clearly explain their formulation and through toy experiments show it working as intended. There are some empirical improvements in performance in simple continual learning settings and also in Procgen. The author response contains very useful ablation studies and connection to prior work which I hope the authors consider adding to the final draft, as well as acknowledgement of moving theory sections to make exposition clearer.