All four knowledgeable referees support acceptance for the contributions, mainly the idea of multimodal curiosity and its instantiation which can mitigate some issues of certain future-prediction based curiosity approaches and the obtained promising results, and I also recommend acceptance. However, there were important concerns about parts of the paper before the author response and these were clarified in the rebuttal, but that information is currently missing in the paper. The authors should carefully revise the paper to address these concerns (couch potato issue, discussion of environments in which the proposed approach would fail, baselines).