Review for NeurIPS paper: Model-based Policy Optimization with Unsupervised Model Adaptation

NeurIPS 2020

Model-based Policy Optimization with Unsupervised Model Adaptation

Meta Review

This paper proposes a method of training and adapting learnt environment models to use for policy optimisation. The authors clearly elucidate on the motivation for the method and the issues of current MBRL and propose a way to adapt their model to minimise distribution mismatch. The experimental results clearly show the benefit in continuous control environments.