{"title": "Learning Stable Deep Dynamics Models", "book": "Advances in Neural Information Processing Systems", "page_first": 11128, "page_last": 11136, "abstract": "Deep networks are commonly used to model dynamical systems, predicting how the state of a system will evolve over time (either autonomously or in response to control inputs). Despite the predictive power of these systems, it has been difficult to make formal claims about the basic properties of the learned systems. In this paper, we propose an approach for learning dynamical systems that are guaranteed to be stable over the entire state space. The approach works by jointly learning a dynamics model and Lyapunov function that guarantees non-expansiveness of the dynamics under the learned Lyapunov function. We show that such learning systems are able to model simple dynamical systems and can be combined with additional deep generative models to learn complex dynamics, such as video textures, in a fully end-to-end fashion.", "full_text": "Learning Stable Deep Dynamics Models\n\nGaurav Manek\n\nDepartment of Computer Science\n\nCarnegie Mellon University\n\ngmanek@cs.cmu.edu\n\nJ. Zico Kolter\n\nDepartment of Computer Science\n\nCarnegie Mellon University\n\nand Bosch Center for AI\nzkolter@cs.cmu.edu\n\nAbstract\n\nDeep networks are commonly used to model dynamical systems, predicting how\nthe state of a system will evolve over time (either autonomously or in response to\ncontrol inputs). Despite the predictive power of these systems, it has been dif\ufb01cult\nto make formal claims about the basic properties of the learned systems. In this\npaper, we propose an approach for learning dynamical systems that are guaranteed\nto be stable over the entire state space. The approach works by jointly learning\na dynamics model and Lyapunov function that guarantees non-expansiveness of\nthe dynamics under the learned Lyapunov function. We show that such learning\nsystems are able to model simple dynamical systems and can be combined with\nadditional deep generative models to learn complex dynamics, such as video\ntextures, in a fully end-to-end fashion.\n\n1\n\nIntroduction\n\nThis paper deals with the task of learning (continuous time) dynamical systems. That is, given a state\nat time t, x(t) \u2208 Rn we want to model the time-derivative of the state\n\n\u02d9x(t) \u2261 d\ndt\n\nx(t) = f (x(t))\n\n(1)\nfor some function f : Rn \u2192 Rn. Modeling the time evolution of such dynamical systems (or their\ncounterparts with control inputs \u02d9x(t) = f (x(t), u(t)) for u(t) \u2208 Rm) is a foundational problem, with\napplications in reinforcement learning, control, forecasting, and many other settings. Owing to their\nrepresentational power, neural networks have long been a natural choice for modeling the function\nf [7, 14, 13, 6]. However, when using a generic neural network to model dynamics in this setting,\nvery little can be guaranteed about the behavior of the learned system. For example, it is extremely\ndif\ufb01cult to say anything about the stability properties of a learned model (informally, the tendency\nof the system to remain within some invariant bounded set). While some recent work has begun\nto consider stability properties of neural networks [5, 17, 19], it has typically done so by (\u201csoftly\u201d)\nenforcing stability as an additional loss term on the training data. Consequently, they can say little\nabout the stability of the system in unseen states.\nIn this paper, we propose an approach to learning neural network dynamics that are provably stable\nover the entirety of the state space. To do so, we jointly learn the system dynamics and a Lyapunov\nfunction. This stability is a hard constraint imposed upon the model: unlike recent approaches, we\ndo not enforce stability via an imposed loss function but build it directly into the dynamics of the\nmodel (i.e. Even a randomly initialized model in our proposed model class will be provably stable\neverywhere in state space). The key to this is the design of a proper Lyapunov function, based on\ninput convex neural networks [1], which ensures global exponential stability to an equilibrium point\nwhile still allowing for expressive dynamics.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fUsing these methods, we demonstrate learning dynamics of physical models such as n-link pendulums,\nand show a substantial improvement over generic networks. We also show how such dynamics models\ncan be integrated into larger network systems to learn dynamics over complex output spaces. In\nparticular, we show how to combine the model with a variational auto-encoder (VAE) [11] to learn\ndynamic \u201cvideo textures\u201d [18].\n\n2 Background and related work\n\nStability of dynamical systems. Our work primarily considers the setting of autonomous dynamics\nsystems \u02d9x(t) = f (x(t)) for x(t) \u2208 Rn. (The methods are applicable to the dynamics with control as\nwell, but we focus on the autonomous case for simplicity of exposition.) Such a system is de\ufb01ned to\nbe globally asymptotically stable (for simplicity, around the equilibrium point xe = 0) if we have\nx(t) \u2192 0 as t \u2192 \u221e for any initial state x(0) \u2208 Rn; f is locally asymptotically stable if the same\nholds but only for x(0) \u2208 B where B is some bounded set containing the origin. Similarly, f is\nglobally (locally, respectively) exponentially stable (i.e., converges to the equilibrium \u201cexponentially\nquickly\u201d) if\nfor some constants m, \u03b1 \u2265 0 for any x(0) \u2208 Rn (B, respectively).\nThe area of Lyapunov theory [9, 12] establishes the connection between the various types of stability\nmentioned above and descent according to a particular type of function known as a Lyapunov\nfunction. Speci\ufb01cally, let V : Rn \u2192 R be a continuously differentiable positive de\ufb01nite function,\ni.e., V (x) > 0 for x (cid:54)= 0 and V (0) = 0. Lyapunov analysis says that f is stable (according to the\ndifferent de\ufb01nitions above), if and only if we can \ufb01nd some function V as above such the value of\nthis function is decreasing along trajectories generated by f. Formally, this is the condition that the\ntime derivative \u02d9V (x(t)) < 0, i.e.,\n\n(cid:107)x(t)(cid:107)2 \u2264 m(cid:107)x(0)(cid:107)2e\u2212\u03b1t\n\n(2)\n\n\u02d9V (x(t)) \u2261 d\ndt\n\n(3)\nThis condition must hold for all x(t) \u2208 Rn or for all x(t) \u2208 B to ensure global or local stability\nrespectively. Similarly f is globally asymptotically stable if and only if there exists positive de\ufb01nite\nV such that\n\nV (x(t)) = \u2207V (x)T d\ndt\n\nx(t) = \u2207V (x)T f (x(t)) < 0\n\n\u02d9V (x(t)) \u2264 \u2212\u03b1V (x(t)), with c1(cid:107)x(cid:107)2\n\n(4)\nShowing that these conditions imply the various forms of stability is relatively straightforward, but\nshowing the converse (that any stable system must obey this property for some V ) is relatively more\ncomplex. In this paper, however, we are largely concerned with the \u201csimpler\u201d of these two directions,\nas our goal is to enforce conditions that ensure stability.\n\n2 \u2264 V (x) \u2264 c2(cid:107)x(cid:107)2\n2.\n\nStability of linear systems. For a linear system with matrix A:\n\n(5)\nit is well-established that the system is stable if and only if the real components of the the eigenvalues\nof A are all strictly negative (Re(\u03bbi(A)) < 0). Equivalently, the same same property can be shown\nvia a positive de\ufb01nite quadratic Lyapunov function\n\n\u02d9x(t) = Ax(t)\n\nV (x) = xT Qx\n\nfor Q (cid:31) 0. In this case, by Equation 4, the following ensures stability:\n\n\u02d9V (x(t)) = x(t)T AT Qx(t) + x(t)T QAx(t) \u2264 \u2212\u03b1x(t)T Qx(t)\n\n(7)\ni.e., if we can \ufb01nd a positive de\ufb01nite matrix Q (cid:23) I with that AT Q + QA + \u03b1Q (cid:22) 0 negative\nsemide\ufb01nite. Such bounds (and much more complex extensions) for the basis for using linear matrix\ninequalities (LMIs), as a method to ensure stability of linear dynamical systems. The methods also\nhave applicability to non-linear systems, and several authors have used LMI analysis to learn non-\nlinear dynamical systems by constraining the linearization of the systems to have global Lyapunov\nfunctions [10, 2, 20],\nThe point we want to emphasize from the above discussion, though, is that the task of learning even\na stable linear dynamical system is not a convex problem. Although the constraints\n\n(6)\n\n(8)\n\nQ (cid:23) I, AT Q + QA + \u03b1Q (cid:22) 0\n\n2\n\n\fare convex in A and Q separately, they are not convex in A and Q jointly. Thus, the problem of jointly\nlearning a stable linear dynamical system and its corresponding Lyapunov function, even for the\nsimple linear-quadratic setting, is not a convex optimization problem, and alternative techniques such\nas alternating minimization need to be employed instead. Alternatively, past work has also looked at\ndifferent heuristics, such as approximately projecting a dynamics function A onto the (non-convex)\nstable set of matrices with eigenvalues Re(\u03bbi(A)) < 0 [3].\n\nStability of non-linear systems For general non-linear systems, establishing stability via Lyapunov\ntechniques is typically even more challenging. For the typical task here, which is that of establishing\nstability of some known dynamics \u02d9x(t) = f (x(t)), \ufb01nding a suitable Lyapunov function is often\nmore an art than a science. Although some general techniques such as sum-of-squares certi\ufb01cation\n[16, 15] provide general methods for certifying stability of e.g., polynomial systems, these are often\nexpensive and don\u2019t easily scale to high dimensional systems.\nNotably, our proposed approach here is able to learn provably stable systems without solving this\n(generally hard) problem. Speci\ufb01cally, while it is dif\ufb01cult to \ufb01nd a Lyapunov function that certi\ufb01es\nthe stability of some known system, we exploit the fact that it is relatively much easier to enforce\nsome function to behave in a stable manner according to a Lyapunov function.\n\nLyapunov functions in deep learning Finally, there has been a small set of recent work exploring\nthe intersection of deep learning and Lyapunov analysis [5, 17, 19]. Although related to our work\nhere, the approach in this past work is quite different. As is more common in the control setting, these\npapers try to learn neural-network-based Lyapunov functions for control policies, but in way that\nenforces stability via a loss penalty. For instance Richards et al., [17] optimize a loss function that\nencourages \u02d9V (x) \u2264 0 for x in some training set. In contrast, our work guarantees absolute stability\neverywhere in the state space, not just at a small set of points; but only for a simpler setting where the\nentire dynamics are to be learned (and hence can be \u201cforced\u201d to be stable) rather than a stabilizing\ncontroller for known dynamics.\n\n3 Joint learning of dynamics and Lyapunov functions\n\nThe intuition of the approach we propose in this paper is straightforward: instead of learning a\ndynamics function and attempting to separately verify its stability via a Lyapunov function, we\npropose to jointly learn a dynamics model and Lyapunov function, where the dynamics is inherently\nconstrained to be stable (everywhere in the state space) according to the Lyapunov function.\nSpeci\ufb01cally, following the principles mentioned above, let \u02c6f : Rn \u2192 Rn denote a \u201cnominal\u201d\ndynamics model, and let V : Rn \u2192 R be a positive de\ufb01nite function: V (x) \u2265 0 for x (cid:54)= 0 and\nV (0) = 0. Then in order to (provably, globally) ensure that a dynamics function is stable, we can\nsimply project \u02c6f such that it satis\ufb01es the condition\n\n\u2207V (x)T \u02c6f (x) \u2264 \u2212\u03b1V (x)\n\ni.e., we de\ufb01ne the dynamics\n\nf (x) = Proj\n\n=\n\n(cid:16) \u02c6f (x),{f : \u2207V (x)T f \u2264 \u2212\u03b1V (x)}(cid:17)\n(cid:40) \u02c6f (x)\nReLU(cid:0)\u2207V (x)T \u02c6f (x) + \u03b1V (x)(cid:1)\n\n\u02c6f (x) \u2212 \u2207V (x)\n\n\u2207V (x)T \u02c6f (x)+\u03b1V (x)\n\n(cid:107)\u2207V (x)(cid:107)2\n\n2\n\n= \u02c6f (x) \u2212 \u2207V (x)\n\n(cid:107)\u2207V (x)(cid:107)2\n\n2\n\n(9)\n\n(10)\n\nif \u2207V (x)T \u02c6f (x) \u2264 \u2212\u03b1V (x)\notherwise\n\nwhere Proj(x;C) denotes the orthogonal projection of x onto the point C, and where the second\nequation follows from the analytical projection of a point onto a halfspace. As long as V is de\ufb01ned\nusing automatic differentiation tools, it is straightforward to include the gradient \u2207V terms into\nthe de\ufb01nition of f, and our \ufb01nal network can be trained just like any other function. The general\napproach here is illustrated in Figure 1.\n\n3\n\n\fTrajectory and Lyapunov function\n\nCase 1\n\n\u2022xe\n\n\u2022x\n\nincreasing V\n\n\u2022\n\n\u02c6f (x)\n\nf (x)\n\u2212g(x)\n\nCase 2\n\ng(x)\n\n\u2022\n\nf (x) = \u02c6f (x)\n\nFigure 1: We plot the trajectory and the contour of a Lyapunov function of a stable dynamical system\nand illustrate our method. Let g(x) =\n. In the \ufb01rst case\n\u02c6f (x) has a component g(x) not in the halfspace, which we subtract to obtain f (x). In the second\ncase \u02c6f (x) is already in the halfspace, so is returned unchanged.\n\n\u2207V (x)\n(cid:107)\u2207V (x)(cid:107)2\n\nReLU\n\n(cid:16)\u2207V (x)T \u02c6f (x) + \u03b1V (x)\n\n(cid:17)\n\n2\n\n3.1 Properties of the Lyapunov function V\n\nAlthough the treatment above seems to make the problem of learning stable systems quite straight-\nforward, the sublety of the approach lies in the choice of the function V . Speci\ufb01cally, as mentioned\npreviously, V needs to be positive de\ufb01nite, but additionally V needs to have no local optima except 0.\nThis is due to Lyapunov decrease condition: recall that we are attempting to guarantee stability to\nthe equilibrium point x = 0, yet the decrease condition imposed upon the dynamics means that V is\ndecreasing along trajectories of f. If V has a local optimum away from the origin, the dynamics can\nin theory get stuck in this location; this manifests itself by the (cid:107)\u2207V (x)(cid:107)2\n2 term going to zero, which\nresults in the dynamics becoming unde\ufb01ned at the optima.\nTo enforce these conditions, we make the following design decisions regarding V :\n\nNo local optima. We represent V via an input-convex neural network (ICNN) function g [1], which\nenforces the condition that g(x) be convex in its inputs x. A fairly generic form of such networks\nconsists is given by the recurrence\n\nz1 = \u03c30(W0x + b0)\nzi+1 = \u03c3i(Uizi + Wix + bi), i = 1, . . . , k \u2212 1\ng(x) \u2261 zk\n\n(11)\n\nwhere Wi are real-valued weights mapping from inputs to the i + 1 layer activations; Ui are positive\nweights mapping previously layer activations zi to the next layer; bi are real-valued biases;and \u03c3i are\nconvex, monotonically non-decreasing non-linear activations, such as the ReLU or smooth variants.\nIt is straightforward to show that with this formulation, g is convex in x [1], and indeed any convex\nfunction can be approximated by such networks [4].\n\nPositive de\ufb01nite. While the ICNN property can enforce that V have only a single global optima, it\ndoes not necessarily enforce that this optima be at x = 0. While one could \ufb01x this by e.g., removing\nthe biases term (but this imposes substantial limitations on the representable functions, which can no\nlonger be arbitrary convex functions) or by shifting whatever global minima exists to the origin (but\nthis requires \ufb01nding the global minimum during training, which itself is computationally expensive),\nwe take an alternative approach and simply shift the function such that V (0) = 0, and add a small\nquadratic regularization term to ensure strict positive de\ufb01niteness.\nV (x) = \u03c3k+1(g(x) \u2212 g(0)) + \u0001(cid:107)x(cid:107)2\n2.\n\n(12)\nwhere \u03c3k is a positive convex non-decreasing function with \u03c3k(0) = 0, g is the ICNN de\ufb01ned\npreviously, and \u0001 is a small constant. These terms together still enforce (strong) convexity and positive\nde\ufb01niteness of V .\n\nContinuously differentiable. Although not always required, several of the conditions for Lyapunov\nstability are simpli\ufb01ed is V is continuously differentiable. To achieve this, rather than use ReLU\n\n4\n\n\fFigure 2: Smoothed ReLU, used to make our Lyapunov function continuously differentiable.\n\nactivations,1 we use a smoothed version that replaces the purely linear ReLU with a quadratic region\nin [0, d]\n\n\u03c3(x) =\n\n.\n\n(13)\n\n\uf8f1\uf8f2\uf8f30\n\nif x \u2264 0\nif 0 < x < d\nx2/2d\nx \u2212 d/2 otherwise\n\nAn illustration of this activation is shown in Figure 2.\n\n(Optional) Warped input space. Although convexity ensures that the Lyapunov function have\nno local optima, this is a suf\ufb01cient but not necessary condition, and indeed requiring a strongly\nconvex Lyapunov function may impose too strict a requirement upon the learned dynamics. For this\nreason, the input to the ICNN function g(x) above can be optionally preceded by any continuously\ndifferentiable invertible function F : Rn \u00d7 Rn, i.e., using\n\nV (x) = \u03c3k+1(g(F (x)) \u2212 g(F (0))) + \u0001(cid:107)x(cid:107)2\n2.\n\n(14)\nas the Lyapunov function. Invertibility ensures that the sublevel sets of V (which are convex sets, by\nde\ufb01nition) map to contiguous regions of the composite function g \u25e6 F , thus ensuring that no local\noptima exist in this composed function.\nWith these conditions in place, we have the following result.\nTheorem 1. The dynamics de\ufb01ned by\n\n(15)\nde\ufb01ned by f from (10) and V from (12) or (14) are globally exponentially stable to the equilibrium\npoint x = 0, for any (bounded weight) networks de\ufb01ning the \u02c6f and V functions.\n\n\u02d9x = f (x)\n\nProof. The proof is straightforward, and relies on the properties of the networks created above. First,\nnote that by our de\ufb01nitions we have, for some M,\n\n\u0001(cid:107)x(cid:107)2\n\n2 \u2264 V (x) \u2264 M(cid:107)x(cid:107)2\n\n2\n\n(16)\nwhere the lower bound follows by de\ufb01nition and the fact that g is positive. The upper bound follows\nfrom the fact that the \u03c3 activation as de\ufb01ned is linear for large x and quadratic around 0. This fact in\nturn implies that V (x) behaves linearly as (cid:107)x(cid:107) \u2192 \u221e, and is quadratic around the origin, so can be\nupper bounded by some quadratic M(cid:107)x(cid:107)2\n2.\nThe fact the V is continuously differentiable means that \u2207V (x) (in f) is de\ufb01ned everywhere, bounds\n2 for all x follows from the the Lipschitz property of V , the fact that 0 \u2264 \u03c3(cid:48)(x) \u2264 1,\non (cid:107)\u2207V (x)(cid:107)2\nand the \u0001(cid:107)x(cid:107)2\n2 term\n\n\u0001(cid:107)x(cid:107)2 \u2264 (cid:107)\u2207V (x)(cid:107)2 \u2264 k(cid:88)\n\nk(cid:89)\n\n(cid:107)Uj(cid:107)2(cid:107)Wi(cid:107)2\n\n(17)\n\ni=1\n\nj=i\n\nwhere (cid:107) \u00b7 (cid:107)2 denotes the operator norm when applied to a matrix. This implies that the dynamics are\nde\ufb01ned and bounded everywhere owing to the choice of function \u02c6f.\nNow, consider some initial state x(0). The de\ufb01nition of f implies that\n\nd\ndt\n\nV (x(t)) = \u2207V (x)T d\ndt\n\nx(t) = \u2207V (x)T f (x) \u2264 \u2212\u03b1V (x(t)).\n\n(18)\n\n1Note that the typical softplus smoothed approximation of the ReLU will not work for all purposes above,\n\nsince we require an activation with \u03c3(0) = 0\n\n5\n\n(cid:27)(x)d(cid:27)\u2032(x)d\fNominal \u02c6f\n\nLyapunov Function V\n\nStable f\n\n2\n\n1\n\n0\n\u22121\n\u22122\n\n\u22122 \u22121 0\n\n\u22122 \u22121 0\n\n2\n\n1\n\n1\n\n2\n\n\u22122 \u22121 0\n\n1\n\n2\n\nFigure 3: (left) Nominal dynamics \u02c6f for random network; (center) Convex positive de\ufb01nite Lyapunov\nfunction generated by random ICNN with constraints from Section 3.1; (right) Resulting stable\ndynamics f.\n\nIntegrating this equation gives the bound\n\nV (x(t)) \u2264 V (x(0))e\u2212\u03b1t\n\nand applying the lower and upper bounds gives\n\n\u0001(cid:107)x(t)(cid:107)2\n\n2 \u2264 M(cid:107)x(0)(cid:107)2\n\n2e\u2212\u03b1t =\u21d2 (cid:107)x(t)(cid:107)2 \u2264 M\n\u0001\n\n(cid:107)x(0)(cid:107)2e\u2212\u03b1t/2\n\n(19)\n\n(20)\n\nas required for global exponential convergence.\n\n4 Empirical results\n\nWe illustrate our technique on several example problems, \ufb01rst highlighting the (inherent) stability of\nthe method for random networks, demonstrating learning on simple n-link pendulum dynamics, and\n\ufb01nally learning high-dimensional stable latent space dynamics for dynamic video textures via a VAE\nmodel.\n\n4.1 Random networks\n\nAlthough we mention this only brie\ufb02y, it is interesting to visualize the dynamics created by random\nnetworks according to our process, i.e., before any training at all. Because the dynamics models\nare inherently stable, these random networks lead to stable dynamics with interesting behaviors,\nillusrated in Figure 3. Speci\ufb01cally, we let \u02c6f be de\ufb01ned by a 2-100-100-2 fully connected network,\nand V be a 2-100-100-1 ICNN, with both networks initialized via the default weights of PyTorch (the\nKaiming uniform initialization [8]) and with the ICNN having it\u2019s U weights further put through a\nsoftplus unit to make them positive.\n\n4.2 n-link pendulum\n\nNext we look at the ability of our approach to model a physically-based dynamical system, speci\ufb01cally\nthe n-link pendulum. A damped, rigid n-link pendulum\u2019s state x can be described by the angular\nposition \u03b8i and angular velocity \u03b8i of each link i. As before \u02c6f is a 2n-100-100-2n network, and the\nLyapunov function V is a 2n-60-60-1 ICNN with properties described in Section 3.1. Models are\ntrained with pairs of data (x, \u02d9x) produced by the symbolic algebra solver sympy, using simulation\ncode adapted from [21].\nIn Figure 4, we compare the simulated dynamics with the learned dynamics in the case of a simple\ndamped pendulum (i.e. with n = 1), showing both the streamplot of the vector \ufb01eld and a single\nsimulated trajectory, and draw a contour plot of the learned Lyapunov function. As seen, the system is\nable to learn dynamics that can accurately predict motion of the system even over long time periods.\nWe also evaluate the learned dynamics quantitatively varying n and the time horizon of simulation.\nFigure 5 presents the total error over time for the 8-link pendulum, and the average cumulative error\nover 1000 time steps for different values of n. While both the simple and our stable models show\n\n6\n\n\fSimulated\n\nLearned f\n\nLearned V\n\n2\n\n1\n\n0\n\u22121\n\u22122\n\n\u22122 \u22121 0\n\n1\n\n2\n\n\u22122 \u22121 0\n\n1\n\n2\n\n\u22122 \u22121 0\n\n1\n\n2\n\nFigure 4: Dynamics of a simple damped pendulum. From left to right: the dynamics as simulated\nfrom \ufb01rst principles, the dynamics model f learned by our method, and the Lyapunov function V\nlearned by our method (under which f is non-expansive).\n\nError at each time\n\nfor 8-link pendulums\n\nAverage error over 999 timesteps\n\nfor n-link pendulums\n\nSimple\nStable\n\n105\n\n103\n\nr\no\nr\nr\nE\n\n101\n\n0\n\n200\n\n400\n600\nTimestamp\n\n800\n\n1,000\n\n1\n\n2\n\n4\n\n6\nNumber of links n\n\n8\n\nFigure 5: Error in predicting \u03b8, \u02d9\u03b8 in 8-link pendulum at each timestep (left); and average error over\n999 timesteps as the number of links in the pendulum increases (right).\n\nincreasing mean error at the start of the trajectory, our model is able to capture the contraction in the\nphysical system (implied by conservation of energy) and in fact exhibits decreasing error towards the\nend of the simulation (the true and simulated dynamics are both stable). In comparison, the error in\nthe simple model increases.\n\n4.3 Video Texture Generation\n\nFinally, We apply our technique to stable video texture generation, using a Variational Autoencoder\n(VAE) [11] to learn an encoding for images, and our stable network to learn a dynamics model in\nencoding-space. Given a sequence of frames (y0, y1, . . .), we feed the network the frame at time t and\ntrain it to reconstruct the frames at time t and t + 1. Speci\ufb01cally, we consider a VAE de\ufb01ned by the\nencoder e : Y \u2192 R2n giving mean and variance \u00b5, log \u03c32\nt = e(yt), latent state zt \u2208 Rn \u223c N (\u00b5t, \u03c32\nt ),\nand decoder d : Rn \u2192 Y, yt \u2248 d(zt). We train the network to minimize both the standard VAE loss\n(reconstruction error plus a KL divergence term), but also minimize the reconstruction loss of a next\npredicted state. We model the evolution of the latent dynamics at zt+1 \u2248 f (zt), or more precisely\nyt+1 \u2248 d(f (zt)). In other words, as illustrated in Figure 6, we train the full system to minimize\n\n(cid:2)(cid:107)d(zt) \u2212 yt(cid:107)2\n\n2 + (cid:107)d(f (zt)) \u2212 yt+1(cid:107)2\n\n2\n\n(21)\n\n(cid:3)(cid:19)\n\n(cid:18)\nT\u22121(cid:88)\n\nt=1\n\nminimize\n\ne,d, \u02c6f ,V\n\nKL(N (\u00b5t, \u03c32\n\nt I(cid:107)N (0, I)) + Ez\n\nWe train the model on pairs of successive frames sampled from videos. To generate video textures, we\nseed the dynamics model with the encoding of a single frame and numerically integrate the dynamics\nmodel to obtain a trajectory. The VAE decoder converts each step of the trajectory into a frame. In\nFigure 7, we present sample stable trajectories and frames produced by our network. For comparison,\nwe also include an example trajectory and resulting frames when the dynamics are modelled without\nthe stability constraint (i.e. letting f in the above loss be a generic neural network). For the naive\n\n7\n\n\f\u00b5t\n\nlog \u03c3t\n\ne(yt)\n\nzt \u2208 N (\u00b5t, \u03c32\nt )\n\u02c6f (zt)\n\nV (zt)\n\nzt+1 \u2190 zt + f (zt)\n\nKL(N (\u00b5t, \u03c32\n\nt )(cid:107)N (0, I))\n\n(cid:107)d(zt) \u2212 yt(cid:107)2\n\n2\n\nd(zt)\n\nd(zt+1)\n\n(cid:107)d(zt+1) \u2212 yt+1(cid:107)2\n\n2\n\nFigure 6: Structure of our video texture generation network. The encoder e and decoder d form a\nVariational Autoencoder, and the stable dynamics model f is trained together with the decoder to\npredict the next frame in the video texture.\n\nStable Model Run 1\n20\n\nStable Model Run 2\nStable Model Run 3\n8\n\n18\n\nNaive Model\n1.2 \u00d7 1030\n\n300\n\n0\n15\n\n0\n\n-10\n25\nFrame Number\n\n-5\n\n0\n\u22122 \u00d7 1030\n\n15\n\ns\np\ne\nt\ns\n\n0\n\n0\n\n0\n\n10\n\n20\n\n30\n\n40\n\n50\n\n100\n\n150\n\n200\n\n250\n\n-2\n\n\u221210\nStable\nModel\n\nRun 1\n\nRun 2\n\nRun 3\nNaive\nModel\n\nFigure 7: Samples generated by our stable video texture networks, with associated trajectories above.\nThe true latent space is 320-dimensional; we project the trajectories onto a two-dimensional plane\nfor display. For comparison, we present the video texture generated using an unconstrained neural\nnetwork in place of our stable dynamics model.\n\nmodel, the dynamics quickly diverge and produce a static image, whereas for our approach, we are\nable to generate different (stable) trajectories that keep generating realistic images over long time\nhorizons.\n\n5 Conclusion\n\nIn this paper we proposed a method for learning stable non-linear dynamical systems de\ufb01ned by neural\nnetwork architectures. The approach jointly learns a convex positive de\ufb01nite Lyapunov function\nalong with dynamics constrained to be stable according to these dynamics everywhere in the state\nspace. We show that these models can be integrated into other deep architectures such as VAEs, and\nlearn complex latent space dynamics is a fully end-to-end manner. Although we have focused here\non the autonomous (i.e., uncontrolled) setting, the method opens several directions for future work,\nsuch as integration into dynamical systems for control or reinforcement learning settings. Have stable\nsystems as a \u201cprimitive\u201d can be useful in a large number of contexts, and combining these stable\nsystems with the representational power of deep networks offers a powerful tool in modeling and\ncontrolling dynamical systems.\n\n8\n\n\fReferences\n[1] Brandon Amos, Lei Xu, and J Zico Kolter. Input convex neural networks. In Proceedings of the 34th\n\nInternational Conference on Machine Learning-Volume 70, pages 146\u2013155. JMLR.org, 2017.\n\n[2] Caroline Blocher, Matteo Saveriano, and Dongheui Lee. Learning stable dynamical systems using\ncontraction theory. In 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence\n(URAI), pages 124\u2013129. IEEE, 2017.\n\n[3] Byron Boots, Geoffrey J Gordon, and Sajid M Siddiqi. A constraint generation approach to learning stable\nlinear dynamical systems. In Advances in neural information processing systems, pages 1329\u20131336, 2008.\n\n[4] Yize Chen, Yuanyuan Shi, and Baosen Zhang. Optimal control via neural networks: A convex approach.\n\narXiv preprint arXiv:1805.11835, 2018.\n\n[5] Yinlam Chow, O\ufb01r Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A lyapunov-based\napproach to safe reinforcement learning. In Advances in Neural Information Processing Systems, pages\n8092\u20138101, 2018.\n\n[6] Yarin Gal, Rowan McAllister, and Carl Edward Rasmussen. Improving pilco with bayesian neural network\n\ndynamics models. In Data-Ef\ufb01cient Machine Learning workshop, ICML, volume 4, 2016.\n\n[7] Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning with\n\nmodel-based acceleration. In International Conference on Machine Learning, pages 2829\u20132838, 2016.\n\n[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti\ufb01ers: Surpassing\nhuman-level performance on imagenet classi\ufb01cation. In Proceedings of the IEEE international conference\non computer vision, pages 1026\u20131034, 2015.\n\n[9] Hassan K Khalil and Jessy W Grizzle. Nonlinear systems, volume 3. Prentice hall Upper Saddle River, NJ,\n\n2002.\n\n[10] S Mohammad Khansari-Zadeh and Aude Billard. Learning stable nonlinear dynamical systems with\n\ngaussian mixture models. IEEE Transactions on Robotics, 27(5):943\u2013957, 2011.\n\n[11] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114,\n\n2013.\n\n[12] Joseph La Salle and Solomon Lefschetz. Stability by Liapunov\u2019s Direct Method with Applications by\n\nJoseph L Salle and Solomon Lefschetz, volume 4. Elsevier, 2012.\n\n[13] Nikhil Mishra, Pieter Abbeel, and Igor Mordatch. Prediction and control with temporal segment models.\nIn Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2459\u20132468.\nJMLR. org, 2017.\n\n[14] Anusha Nagabandi, Gregory Kahn, Ronald S Fearing, and Sergey Levine. Neural network dynamics\nfor model-based deep reinforcement learning with model-free \ufb01ne-tuning. In 2018 IEEE International\nConference on Robotics and Automation (ICRA), pages 7559\u20137566. IEEE, 2018.\n\n[15] Antonis Papachristodoulou and Stephen Prajna. On the construction of lyapunov functions using the sum\nof squares decomposition. In Proceedings of the 41st IEEE Conference on Decision and Control, 2002.,\nvolume 3, pages 3482\u20133487. IEEE, 2002.\n\n[16] Pablo A Parrilo. Structured semide\ufb01nite programs and semialgebraic geometry methods in robustness and\n\noptimization. PhD thesis, California Institute of Technology, 2000.\n\n[17] Spencer M Richards, Felix Berkenkamp, and Andreas Krause. The lyapunov neural network: Adaptive\n\nstability certi\ufb01cation for safe learning of dynamic systems. arXiv preprint arXiv:1808.00924, 2018.\n\n[18] Arno Sch\u00f6dl, Richard Szeliski, David H Salesin, and Irfan Essa. Video textures. In Proceedings of\nthe 27th annual conference on Computer graphics and interactive techniques, pages 489\u2013498. ACM\nPress/Addison-Wesley Publishing Co., 2000.\n\n[19] Andrew J Taylor, Victor D Dorobantu, Hoang M Le, Yisong Yue, and Aaron D Ames. Episodic learning\n\nwith control lyapunov functions for uncertain robotic systems. arXiv preprint arXiv:1903.01577, 2019.\n\n[20] Jonas Umlauft and Sandra Hirche. Learning stable stochastic nonlinear dynamical systems. In Proceedings\nof the 34th International Conference on Machine Learning-Volume 70, pages 3502\u20133510. JMLR. org,\n2017.\n\n[21] Jake VanderPlas. Triple pendulum chaos!\n\ntriple-pendulum-chaos/, Mar 2017.\n\nhttp://jakevdp.github.io/blog/2017/03/08/\n\n9\n\n\f", "award": [], "sourceid": 5956, "authors": [{"given_name": "J. Zico", "family_name": "Kolter", "institution": "Carnegie Mellon University / Bosch Center for AI"}, {"given_name": "Gaurav", "family_name": "Manek", "institution": "Carnegie Mellon University"}]}