This is a nice contribution in that it combines several different approaches (efficient coding, neuroscience/neural modeling, MDPs) in a conceptually novel way (R1, R4, R5), with R4 commenting that it’s likely to be of great impact to the wider community. On the other hand, R3 saw limited conceptual novelty and believes that some prior work on policy compression has been understated. In general, I’m inclined to agree with other reviewers that it’s fairly well-positioned with regard to prior work (R1). R4 praised the clarity of the writing, and other reviewers didn’t have any issues with the presentation. R5 expressed concern that the results are mainly qualitative, and not particularly novel, despite the novelty of the approach itself. One major point that came up among reviewers was the lack of a plausible method for learning. R1 argued that it’s difficult to separate the two, and I do have a concern about the applicability of their approach to more general problems requiring learning (as R1 mentions, it’s likely to be intractable). R4 didn’t consider this within the scope of the current paper, but did ask for further comment on how compression impacts further learning, and echoed R3’s concerns about how to generalize to more complex tasks. It’s not clear to me that these were adequately addressed in the rebuttal, and I think that these limitations should be discussed in the paper. Overall seems well-written, and on balance seems to provide an interesting perspective and set of results that link efficient coding with the MDP formalism, backed up by empirical neuroscientific data. Hence I am inclined to recommend accept.