NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center

### Reviewer 1

* Update after reading Author Response * Thanks to the authors for addressing my concerns. After reading the other reviews and the author feedback, I believe the changes the authors propose to make will strengthen the paper. I believe the ideas in this paper have value and it is a positive step to establish some theoretical results in this space. I am still concerned with the clarity of the paper, especially for the broader community that might not be very familiar with the disentangled representation literature. With that in mind, I revise my score upward to 6. * Original * Overall, while the paper attempts to tackle some interesting theoretical questions, it is held back by unclear proofs, limited and inconclusive experiments, and lack of novel insight. The work here on its own does not meet the standard for publication. Below are some more specific comments. There seems to be a fundamental issue here, which is the notion that a symmetry is an action that the agent can take, rather than a geometric property of the state space. It would be helpful to clarify exactly what notion of symmetry the group actions are supposed to capture. The paragraph starting at line 112 is confusing. First, the examples of what the group action might be are a bit vague. Second, in line 117 it seems you require the group action to be the dynamics function? Why should the group action on the observation at time t be equal to the observation at time t+1? A symmetry encodes a geometric constraint or property about the state space; it shouldn't be interpreted as an action applied by an agent. The statement of Theorem 1 is confusing. From your notation, it appears that world states w_i are themselves sets? How are they to be interpreted? What does it mean "using a training set T of still images"? The detailed proof in the appendix is likewise confusing. There appear to be mismatches in notation (cf line 17 in the appendix and line 122 in the main text -- are we considering the cardinality of a world W_i or a world state w_i?). The reference to shuffling the order of states along an axis also doesn't make sense -- there seems to be some major missing details here. The entire proof is difficult to evaluate since it's not at all clear what is being argued. Theorem 2: what is the significance of this result? The experimental results seem inconclusive about which approach is better, since, as the authors note, the task is so simple.

### Reviewer 2

This paper provides theoretical results showing that in order to learn disentangled representations grounded in symmetry transformations, as recently defined by Higgins et al (2018), it is necessary to have access to the actions producing observation changes. It them proposes a new model architecture that is able to exploit this information to learn symmetry based disentangled representations (SBDR). Finally, the paper demonstrates that disentangled representations improve sample efficiency for inverse model learning. The work is a first step building on the recent theoretical paper by Higgins et al (2018) that defines disentangling in terms of symmetry transformations. Hence, this work is an important step that other can build on. The submission is technically sound and provides both theoretical and empirical contributions. However, the empirical contributions are quite limited, since the authors only use one very simple dataset. I would have liked to see the approach evaluated on more challenging datasets too. Saying this, the paper is very well written and I think it makes a significant enough contribution to the field to be published in providing a proof of principle demonstration of how to address a challenging problem (learning SBDR). ---- Post author feedback ------ Thank you for your detailed feedback. I leave my score unchanged given that it was favourable to start with.