{"title": "A General Method for Amortizing Variational Filtering", "book": "Advances in Neural Information Processing Systems", "page_first": 7857, "page_last": 7868, "abstract": "We introduce the variational filtering EM algorithm, a simple, general-purpose method for performing variational inference in dynamical latent variable models using information from only past and present variables, i.e. filtering. The algorithm is derived from the variational objective in the filtering setting and consists of an optimization procedure at each time step. By performing each inference optimization procedure with an iterative amortized inference model, we obtain a computationally efficient implementation of the algorithm, which we call amortized variational filtering. We present experiments demonstrating that this general-purpose method improves inference performance across several recent deep dynamical latent variable models.", "full_text": "A General Method for\n\nAmortizing Variational Filtering\n\nJoseph Marino, Milan Cvitkovic, Yisong Yue\n\nCalifornia Institute of Technology\n\n{jmarino, mcvitkovic, yyue}@caltech.edu\n\nAbstract\n\nWe introduce the variational \ufb01ltering EM algorithm, a simple, general-purpose\nmethod for performing variational inference in dynamical latent variable models\nusing information from only past and present variables, i.e. \ufb01ltering. The algorithm\nis derived from the variational objective in the \ufb01ltering setting and consists of an op-\ntimization procedure at each time step. By performing each inference optimization\nprocedure with an iterative amortized inference model, we obtain a computationally\nef\ufb01cient implementation of the algorithm, which we call amortized variational\n\ufb01ltering. We present experiments demonstrating that this general-purpose method\nimproves performance across several deep dynamical latent variable models.\n\n1\n\nIntroduction\n\nComplex tasks with time-series data, like audio comprehension or robotic manipulation, must often\nbe performed online, where the model can only consider past and present information. Models for\nsuch tasks, e.g. Hidden Markov Models, frequently operate by inferring the hidden state of the world\nat each time-step. This type of online inference procedure is known as \ufb01ltering. Learning \ufb01ltering\nmodels purely through supervised labels or rewards can be impractical, requiring massive collections\nof labeled data or signi\ufb01cant efforts at reward shaping. In contrast, generative models can learn\nand infer hidden structure and states directly from data. Deep latent variable models [18, 27, 37],\nin particular, offer a promising direction; they infer latent representations using expressive deep\nnetworks, commonly using variational methods to perform inference [24]. Recent works have\nextended deep latent variable models to the time-series setting, e.g. [7, 12]. However, inference\nprocedures for these dynamical models have been proposed on the basis of intuition rather than from\na rigorous inference optimization perspective, potentially limiting performance.\nWe introduce variational \ufb01ltering EM, an algorithm for performing \ufb01ltering variational inference and\nlearning that is rigorously derived from the variational objective. As detailed below, the variational\nobjective in the \ufb01ltering setting results in a sequence of inference optimization objectives, with\none at each time-step. By initializing each of these inference optimization procedures from the\ncorresponding prior distribution, a classic Bayesian prediction-update loop naturally emerges. This\ncontrasts with existing \ufb01ltering approaches for deep dynamical models, which use inference models\nthat do not explicitly account for prior predictions during inference. However, using iterative inference\nmodels [32], which overcome this limitation, we develop a computationally ef\ufb01cient implementation\nof the variational \ufb01ltering EM algorithm, which we refer to as amortized variational \ufb01ltering (AVF).\nThe main contributions of this paper are the variational \ufb01ltering EM algorithm and its amortized\nimplementation, AVF. This general-purpose \ufb01ltering algorithm is widely applicable to dynamical\nlatent variable models, as we demonstrate in our experiments. Moreover, the variational \ufb01ltering EM\nalgorithm is derived from the \ufb01ltering variational objective, providing a solid theoretical framework\nfor \ufb01ltering inference. By precisely specifying the inference optimization procedure, this method\ntakes a simple form compared to previous hand\u2013designed methods. Using several deep dynamical\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fT(cid:89)\n\nt=1\n\nT(cid:89)\n\nt=1\n\nlatent variable models, we demonstrate that this \ufb01ltering approach compares favorably against current\nmethods across a variety of benchmark sequence data sets.\n\n2 Background\n\nSection 2.1 provides the general form of a dynamical latent variable model. Section 2.2 covers\nvariational inference. Deep latent variable models are often trained ef\ufb01ciently by amortizing inference\noptimization (Section 2.3). Applying this technique to dynamical models is non-trivial, leading many\nprior works to use hand\u2013designed amortized inference methods (Section 2.4).\n\n2.1 Dynamical latent variable models\n\nA sequence of T observations, x\u2264T , can be modeled using a dynamical latent variable model,\np\u03b8(x\u2264T , z\u2264T ), which models the joint distribution between x\u2264T and a sequence of latent variables,\nz\u2264T , with parameters \u03b8. It is typically assumed that p\u03b8(x\u2264T , z\u2264T ) can be factorized into conditional\njoint distributions at each step, p\u03b8(xt, zt|x