{"title": "Variational Temporal Abstraction", "book": "Advances in Neural Information Processing Systems", "page_first": 11570, "page_last": 11579, "abstract": "We introduce a variational approach to learning and inference of temporally hierarchical structure and representation for sequential data. We propose the Variational Temporal Abstraction (VTA), a hierarchical recurrent state space model that can infer the latent temporal structure and thus perform the stochastic state transition hierarchically. We also propose to apply this model to implement the jumpy imagination ability in imagination-augmented agent-learning in order to improve the efficiency of the imagination. In experiments, we demonstrate that our proposed method can model 2D and 3D visual sequence datasets with interpretable temporal structure discovery and that its application to jumpy imagination enables more efficient agent-learning in a 3D navigation task.", "full_text": "Variational Temporal Abstraction\n\nTaesup Kim1,3,\u2020, Sungjin Ahn2\u2217, Yoshua Bengio1\u2217\n\n1Mila, Universit\u00e9 de Montr\u00e9al, 2Rutgers University, 3Kakao Brain\n\nAbstract\n\nWe introduce a variational approach to learning and inference of temporally hierar-\nchical structure and representation for sequential data. We propose the Variational\nTemporal Abstraction (VTA), a hierarchical recurrent state space model that can\ninfer the latent temporal structure and thus perform the stochastic state transition\nhierarchically. We also propose to apply this model to implement the jumpy imag-\nination ability in imagination-augmented agent-learning in order to improve the\nef\ufb01ciency of the imagination. In experiments, we demonstrate that our proposed\nmethod can model 2D and 3D visual sequence datasets with interpretable temporal\nstructure discovery and that its application to jumpy imagination enables more\nef\ufb01cient agent-learning in a 3D navigation task.\n\n1\n\nIntroduction\n\nDiscovering temporally hierarchical structure and representation in sequential data is the key to many\nproblems in machine learning. In particular, for an intelligent agent exploring an environment, it\nis critical to learn such spatio-temporal structure hierarchically because it can, for instance, enable\nef\ufb01cient option-learning and jumpy future imagination, abilities critical to resolving the sample\nef\ufb01ciency problem (Hamrick, 2019). Without such temporal abstraction, imagination would easily\nbecome inef\ufb01cient; imagine a person planning one-hour driving from her of\ufb01ce to home with future\nimagination at the scale of every second. It is also biologically evidenced that future imagination is\nthe very fundamental function of the human brain (Mullally & Maguire, 2014; Buckner, 2010) which\nis believed to be implemented via hierarchical coding of the grid cells (Wei et al., 2015).\nThere have been approaches to learn such hierarchical structure in sequences such as the HM-\nRNN (Chung et al., 2016). However, as a deterministic model, it has the main limitation that it\ncannot capture the stochastic nature prevailing in the data. In particular, this is a critical limitation\nto imagination-augmented agents because exploring various possible futures according to the un-\ncertainty is what makes the imagination meaningful in many cases. There have been also many\nprobabilistic sequence models that can deal with such stochastic nature in the sequential data (Chung\net al., 2015; Krishnan et al., 2017; Fraccaro et al., 2017). However, unlike HMRNN, these models\ncannot automatically discover the temporal structure in the data.\nIn this paper, we propose the Hierarchical Recurrent State Space Model (HRSSM) that combines the\nadvantages of both worlds: it can discover the latent temporal structure (e.g., subsequences) while also\nmodeling its stochastic state transitions hierarchically. For its learning and inference, we introduce a\nvariational approximate inference approach to deal with the intractability of the true posterior. We also\npropose to apply the HRSSM to implement ef\ufb01cient jumpy imagination for imagination-augmented\nagents. We note that the proposed HRSSM is a generic generative sequence model that is not tied to\nthe speci\ufb01c application to the imagination-augmented agent but can be applied to any sequential data.\nIn experiments, on 2D bouncing balls and 3D maze exploration, we show that the proposed model\n\n\u2217Equal advising, \u2020work also done while visiting Rutgers University.\n\nCorrespondence to taesup.kim@umontreal.ca and sungjin.ahn@rutgers.edu\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fcan model sequential data with interpretable temporal abstraction discovery. Then, we show that the\nmodel can be applied to improve the ef\ufb01ciency of imagination-augmented agent-learning.\nThe main contributions of the paper are:\n\n1. We propose the Hierarchical Recurrent State Space Model (HRSSM) that is the \ufb01rst stochas-\n\ntic sequence model that discovers the temporal abstraction structure.\n\n2. We propose the application of HRSSM to imagination-augmented agent so that it can\n\nperform ef\ufb01cient jumpy future imagination.\n\n3. In experiments, we showcase the temporal structure discovery and the bene\ufb01t of HRSSM\n\nfor agent learning.\n\n2 Proposed Model\n\n2.1 Hierarchical Recurrent State Space Models\n\n1:li\n\nhas length li such that T = (cid:80)T\n\nIn our model, we assume that a sequence X = x1:T = (x1, . . . , xT ) has a latent structure of temporal\nabstraction that can partition the sequence into N non-overlapping subsequences X = (X1, . . . , XN ).\ni=1 li and L = {li}. Unlike previous\nA subsequence Xi = xi\nworks (Serban et al., 2017), we treat the number of subsequences N and the lengths of subsequences\nL as discrete latent variables rather than given parameters. This makes our model discover the\nunderlying temporal structure adaptively and stochastically.\nWe also assume that a subsequence Xi is generated from a temporal abstraction zi and an observation\nxt has observation abstraction st. The temporal abstraction and observation abstraction have\na hierarchical structure in such a way that all observations in Xi are governed by the temporal\nabstraction zi in addition to the local observation abstraction st. As a temporal model, the two\nabstractions take temporal transitions. The transition of temporal abstraction occurs only at the\nsubsequence scale while the observation transition is performed at every time step. This generative\nprocess can then be written as follows:\n\np(X, S, L, Z, N ) = p(N )\n\np(Xi, Si|zi, li)p(li|zi)p(zi|z