NeurIPS 2020

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning


Meta Review

The paper introduces the BIRD algorithm, a model-based RL algorithm based on differentiable planning (SVG-like). A key aspect of BIRD is a Mutual Information term in the loss function, which encourages the similarity of the imaginary data and the real observations. Reviewers generally liked this paper, even though there have been some concerns related to the extent of its novelty, especially compared to Dreamer. I summarize some of the concerns here, which should be addressed in the revised version of this work. Please refer to the reviews for more detail, and revise your paper by incorporating their comments. - This paper has some similarities to Dreamer. If we expand the MI, the main difference with Dreamer would be the existence of policy entropy term. It is important that the authors expand on this and clearly state what differentiate this work with Dreamer. - The number of runs (3) in experiments is too few. There is a large overlap between confidence intervals, and in some cases it is difficult to say whether this algorithm is better than alternatives. Please increase the number of runs to a much larger value, such as 10 or 20. - It is encouraged to include the results of other high-dimensional tasks.