NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 6171 Variational Temporal Abstraction

Reviewer 1

Weakness: My doubt mainly lies in the experiments section. 1. There is not enough quantitative evaluation of the model. As the authors claim, the proposed framework should be able to capture long-term temporal dependencies. This ability would result in higher generative likelihood. However, there is not enough quantitative evaluation and comparison to back up this statement. 2. The latent space, especially the temporal abstraction level is not investigated enough. Since the proposed framework should be able to learn high-hierarchical temporal structures, it would be interesting to traverse the temporal abstraction latent variable and visualize what happens. Does it encode different information with the observation abstraction or are they somehow entangled together? Such investigation would provide more insights into the hierarchical latent space learned. 3. Although it is not a big issue, the use of binary indicator with Gumbel-softmax relaxation has been utilized in a lot of previous works. But since it works, and it only serves as part of the contribution, I do not see it as a big issue.

Reviewer 2

1) Main main point of criticism is the experimental validation of the proposed model. 1.1) Sec 5.1: Bouncing balls (BBs) dataset 1.1.1) I think it is indeed good practice to test the algorithm on a simple dataset, but this version of BBs seems quite tailored to the algorithm, as the balls change color on collision. Does the segmentation still yield interpretable results without color change? 1.1.2) There is no quantitative comparison to an reasonable baseline model (eg matched for same network size or similar). This would be required to convince the reader that the inference and learning algorithms are able to identify the model. Also it would be good to see a sample from the baseline. 1.2) Sec 5.2: 3D Maze 1.2.1) My main quibble with this experiment is that the true segmentation is basically almost explicitly given based on the available actions, eg if TURN-LEFT is executed, then a new segment is allocated. This essentially points to the basic dilemma of hierarchical reinforcement learning: If I know good high-level options (here: always follow corridors to the next intersection) then learning the right, high-level state abstraction is easy; and vice-versa. Learning both at the same time is hard. I would be more convinced by these experiments if the authors ran an experiment eg with a model that's not conditioned on actions and see if segmentations still coincide with intersections. 1.2.2) How is the baseline RSSM defined here? How much do training curves vary across runs (let alone hyperparameters)? 2) Smaller comments: 2.1) Sec 2.3 l102-l103: This prior is quite weird as the last segment is different than the other ones. I don't really see the reason for this design choice, as the posterior inference does not make use of the maximum number of segments. 2.2) l135-l138: The assumption of independence of the $m_t$ under the posterior seems quite weak. Imagine in the BBs data set (no color change) it could be quite hard to determine where exactly the change point is (collision), but we can be very certain that there is only one. This situation could not be represented well with an independent posterior. 2.3) l40-l41: Clearly there have be earlier "stochastic sequence model(s) that discover(s) the temporal abstraction structure", eg take any semi-Markov, of Markov-jump-process. However, I agree that this particular version with NN-function approximators / amortized inference is novel and a worthwhile contribution. 2.4) The notation in eqn (1) and (2) looks a bit broken, eg there seems to be $s^i$ missing on the lhs. 2.5) below l81: This process is not exactly the same as the one from eqn (1) and (2) as here the length of the sub-sequence depends on the state as in $p(m_t\vert s_t)$ and not just on the $z_t$.