{"title": "Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 1130, "page_last": 1140, "abstract": "Spectral decomposition of the Koopman operator is attracting attention as a tool for the analysis of nonlinear dynamical systems. Dynamic mode decomposition is a popular numerical algorithm for Koopman spectral analysis; however, we often need to prepare nonlinear observables manually according to the underlying dynamics, which is not always possible since we may not have any a priori knowledge about them. In this paper, we propose a fully data-driven method for Koopman spectral analysis based on the principle of learning Koopman invariant subspaces from observed data. To this end, we propose minimization of the residual sum of squares of linear least-squares regression to estimate a set of functions that transforms data into a form in which the linear regression fits well. We introduce an implementation with neural networks and evaluate performance empirically using nonlinear dynamical systems and applications.", "full_text": "Learning Koopman Invariant Subspaces\n\nfor Dynamic Mode Decomposition\n\nNaoya Takeishi\u00a7, Yoshinobu Kawahara\u2020,\u2021, Takehisa Yairi\u00a7\n\n\u00a7Department of Aeronautics and Astronautics, The University of Tokyo\n\u2020The Institute of Scienti\ufb01c and Industrial Research, Osaka University\n\n{takeishi,yairi}@ailab.t.u-tokyo.ac.jp, ykawahara@sanken.osaka-u.ac.jp\n\n\u2021RIKEN Center for Advanced Intelligence Project\n\nAbstract\n\nSpectral decomposition of the Koopman operator is attracting attention as a tool\nfor the analysis of nonlinear dynamical systems. Dynamic mode decomposition\nis a popular numerical algorithm for Koopman spectral analysis; however, we\noften need to prepare nonlinear observables manually according to the underlying\ndynamics, which is not always possible since we may not have any a priori\nknowledge about them. In this paper, we propose a fully data-driven method for\nKoopman spectral analysis based on the principle of learning Koopman invariant\nsubspaces from observed data. To this end, we propose minimization of the residual\nsum of squares of linear least-squares regression to estimate a set of functions that\ntransforms data into a form in which the linear regression \ufb01ts well. We introduce\nan implementation with neural networks and evaluate performance empirically\nusing nonlinear dynamical systems and applications.\n\n1\n\nIntroduction\n\nA variety of time-series data are generated from nonlinear dynamical systems, in which a state evolves\naccording to a nonlinear map or differential equation. In summarization, regression, or classi\ufb01cation\nof such time-series data, precise analysis of the underlying dynamical systems provides valuable\ninformation to generate appropriate features and to select an appropriate computation method. In\napplied mathematics and physics, the analysis of nonlinear dynamical systems has received signi\ufb01cant\ninterest because a wide range of complex phenomena, such as \ufb02uid \ufb02ows and neural signals, can\nbe described in terms of nonlinear dynamics. A classical but popular view of dynamical systems\nis based on state space models, wherein the behavior of the trajectories of a vector in state space is\ndiscussed (see, e.g., [1]). Time-series modeling based on a state space is also common in machine\nlearning. However, when the dynamics are highly nonlinear, analysis based on state space models\nbecomes challenging compared to the case of linear dynamics.\nRecently, there is growing interest in operator-theoretic approaches for the analysis of dynamical\nsystems. Operator-theoretic approaches are based on the Perron\u2013Frobenius operator [2] or its adjoint,\ni.e., the Koopman operator (composition operator) [3], [4]. The Koopman operator de\ufb01nes the\nevolution of observation functions (observables) in a function space rather than state vectors in a state\nspace. Based on the Koopman operator, the analysis of nonlinear dynamical systems can be lifted\nto a linear (but in\ufb01nite-dimensional) regime. Consequently, we can consider modal decomposition,\nwith which the global characteristics of nonlinear dynamics can be inspected [4], [5]. Such modal\ndecomposition has been intensively used for scienti\ufb01c purposes to understand complex phenomena\n(e.g., [6]\u2013[9]) and also for engineering tasks, such as signal processing and machine learning. In fact,\nmodal decomposition based on the Koopman operator has been utilized in various engineering tasks,\nincluding robotic control [10], image processing [11], and nonlinear system identi\ufb01cation [12].\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fOne of the most popular algorithms for modal decomposition based on the Koopman operator is\ndynamic mode decomposition (DMD) [6], [7], [13]. An important premise of DMD is that the target\ndataset is generated from a set of observables that spans a function space invariant to the Koopman\noperator (referred to as Koopman invariant subspace). However, when only the original state vectors\nare available as the dataset, we must prepare appropriate observables manually according to the\nunderlying nonlinear dynamics. Several methods have been proposed to utilize such observables,\nincluding the use of basis functions [14] and reproducing kernels [15]. Note that these methods work\nwell only if appropriate basis functions or kernels are prepared; however, it is not always possible to\nprepare such functions if we have no a priori knowledge about the underlying dynamics.\nIn this paper, we propose a fully data-driven method for modal decomposition via the Koopman\noperator based on the principle of learning Koopman invariant subspaces (LKIS) from scratch using\nobserved data. To this end, we estimate a set of parametric functions by minimizing the residual sum\nof squares (RSS) of linear least-squares regression, so that the estimated set of functions transforms\nthe original data into a form in which the linear regression \ufb01ts well. In addition to the principle of\nLKIS, an implementation using neural networks is described. Moreover, we introduce empirical\nperformance of DMD based on the LKIS framework with several nonlinear dynamical systems and\napplications, which proves the feasibility of LKIS-based DMD as a fully data-driven method for\nmodal decomposition via the Koopman operator.\n\n2 Background\n2.1 Koopman spectral analysis\n\nWe focus on a (possibly nonlinear) discrete-time autonomous dynamical system\n\nxt+1 = f (xt), x \u2208 M,\n\nt \u2208 T = {0} \u222a N,\n\n(1)\nwhere M denotes the state space and (M, \u03a3, \u00b5) represents the associated probability space. In\ndynamical system (1), Koopman operator K [4], [5] is de\ufb01ned as an in\ufb01nite-dimensional linear\noperator that acts on observables g : M \u2192 R (or C), i.e.,\nKg(x) = g(f (x)),\n\n(2)\nwith which the analysis of nonlinear dynamics (1) can be lifted to a linear (but in\ufb01nite-dimensional)\nregime. Since K is linear, let us consider a set of eigenfunctions {\u03d51, \u03d52, . . .} of K with eigenvalues\n{\u03bb1, \u03bb2, . . .}, i.e., K\u03d5i = \u03bbi\u03d5i for i \u2208 N, where \u03d5 : M \u2192 C and \u03bb \u2208 C. Further, suppose\nthat g can be expressed as a linear combination of those in\ufb01nite number of eigenfunctions, i.e.,\ni=1 \u03d5i(x)ci with a set of coef\ufb01cients {c1, c2, . . .}. By repeatedly applying K to both sides\n\ng(x) =(cid:80)\u221e\n\nof this equation, we obtain the following modal decomposition:\n\n\u221e(cid:88)\n\ng(xt) =\n\n\u03bbt\ni\u03d5i(x0)ci.\n\n(3)\n\ni=1\n\nHere, the value of g is decomposed into a sum of Koopman modes wi = \u03d5i(x0)ci, each of which\nevolves over time with its frequency and decay rate respectively given by \u2220\u03bbi and |\u03bbi|, since \u03bbi\nis a complex value. The Koopman modes and their eigenvalues can be investigated to understand\nthe dominant characteristics of complex phenomena that follow nonlinear dynamics. The above\ndiscussion can also be applied straightforwardly to continuous-time dynamical systems [4], [5].\nModal decomposition based on K, often referred to as Koopman spectral analysis, has been receiving\nattention in nonlinear physics and applied mathematics. In addition, it is a useful tool for engineering\ntasks including machine learning and pattern recognition; the spectra (eigenvalues) of K can be used\nas features of dynamical systems, the eigenfunctions are a useful representation of time-series for\nvarious tasks, such as regression and visualization, and K itself can be used for prediction and optimal\ncontrol. Several methods have been proposed to compute modal decomposition based on K, such\nas generalized Laplace analysis [5], [16], the Ulam\u2013Galerkin method [17], and DMD [6], [7], [13].\nDMD, which is reviewed in more detail in the next subsection, has received signi\ufb01cant attention and\nbeen utilized in various data analysis scenarios (e.g., [6]\u2013[9]).\nNote that the Koopman operator and modal decomposition based on it can be extended to random\ndynamical systems actuated by process noise [4], [14], [18]. In addition, Proctor et al. [19], [20]\ndiscussed Koopman analysis of systems with control signals. In this paper, we primarily target\nautonomous deterministic dynamics (e.g., Eq. (1)) for the sake of presentation clarity.\n\n2\n\n\f2.2 Dynamic mode decomposition and Koopman invariant subspace\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\ng(xm\u22121)]\n\nY0 = [g(x0)\n\nand Y1 = [g(f (x0))\n\n\u2020\n0 [13], [21], where Y\n\nLet us review DMD, an algorithm for Koopman spectral analysis (further details are in the supple-\nmentary). Consider a set of observables {g1, . . . , gn} and let g = [g1\ngn]T be a vector-valued\nobservable. In addition, de\ufb01ne two matrices Y0, Y1 \u2208 Rn\u00d7m generated by x0, f and g, i.e.,\ng(f (xm\u22121))] ,\n\n(4)\nwhere m + 1 is the number of snapshots in the dataset. The core functionality of DMD algorithms\n\u2020\nis computing the eigendecomposition of matrix A = Y1Y\n0 is the Moore\u2013\nPenrose pseudoinverse of Y0. The eigenvectors of A are referred to as dynamic modes, and they\ncoincide with the Koopman modes if the corresponding eigenfunctions of K are in span{g1, . . . , gn}\n[21]. Alternatively (but nearly equivalently), the condition under which DMD works as a numerical\nrealization of Koopman spectral analysis can be described as follows.\nRather than calculating the in\ufb01nite-dimensional K directly, we can consider the restriction of K to\na \ufb01nite-dimensional subspace. Assume the observables are elements of L2(M, \u00b5). The Koopman\ninvariant subspace is de\ufb01ned as G \u2282 L2(M, \u00b5) s.t. \u2200g \u2208 G, Kg \u2208 G. If G is spanned by a \ufb01nite\nnumber of functions, then the restriction of K to G, which we denote K, becomes a \ufb01nite-dimensional\nlinear operator. In the sequel, we assume the existence of such G. If {g1, . . . , gn} spans G, then\n\u2020\n0 coincides with K \u2208 Rn\u00d7n asymptotically, wherein K is the realization of\nDMD\u2019s matrix A = Y1Y\nK with regard to the frame (or basis) {g1, . . . , gn}. For modal decomposition (3), the (vector-valued)\nKoopman modes are given by w and the values of the eigenfunctions are obtained by \u03d5 = zHg,\nwhere w and z are the right- and left-eigenvectors of K normalized such that wH\ni zj = \u03b4i,j [14],\n[21], and zH denotes the conjugate transpose of z.\nHere, an important problem in the practice of DMD arises, i.e., we often have no access to g that\nspans a Koopman invariant subspace G. In this case, for nonlinear dynamics, we must manually\nprepare adequate observables. Several researchers have addressed this issue; Williams et al. [14]\nleveraged a dictionary of prede\ufb01ned basis functions to transform original data, and Kawahara [15]\nde\ufb01ned Koopman spectral analysis in a reproducing kernel Hilbert space. Brunton et al. [22] proposed\nthe use of observables selected in a data-driven manner [23] from a function dictionary. Note that, for\nthese methods, we must select an appropriate function dictionary or kernel function according to the\ntarget dynamics. However, if we have no a priori knowledge about them, which is often the case,\nsuch existing methods do not have to be applied successfully to nonlinear dynamics.\n\n\u00b7\u00b7\u00b7\n\n3 Learning Koopman invariant subspaces\n3.1 Minimizing residual sum of squares of linear least-squares regression\nIn this paper, we propose a method to learn a set of observables {g1, . . . , gn} that spans a Koopman\ninvariant subspace G, given a sequence of measurements as the dataset.\nIn the following, we\nsummarize desirable properties for such observables, upon which the proposed method is constructed.\nTheorem 1. Consider a set of square-integrable observables {g1, . . . , gn}, and de\ufb01ne a vector-\ngn]T. In addition, de\ufb01ne a linear operator G whose matrix form\nvalued observable g = [g1\n. Then, \u2200x \u2208 M, g(f (x)) = Gg(x) if and only\nif {g1, . . . , gn} spans a Koopman invariant subspace.\n\nM(g \u25e6 f )gHd\u00b5(cid:1)(cid:0)(cid:82)\nn(cid:88)\n\nM ggHd\u00b5(cid:1)\u2020\n(cid:32) n(cid:88)\nn(cid:88)\nK such that \u2200x \u2208 M, g(f (x)) = Kg(x); thus,(cid:82)\n\nis given as G =(cid:0)(cid:82)\nProof. If \u2200x \u2208 M, g(f (x)) = Gg(x), then for any \u02c6g =(cid:80)n\n(cid:33)\nM(g \u25e6 f )gHd\u00b5 =(cid:82)\n\ni=1 aigi \u2208 span{g1, . . . , gn},\n\nwhere Gi,j denotes the (i, j)-element of G; thus, span{g1, . . . , gn} is a Koopman invariant subspace.\nOn the other hand, if {g1, . . . , gn} spans a Koopman invariant subspace, there exists a linear operator\nM KggHd\u00b5. Therefore, an\n\ninstance of the matrix form of K is obtained in the form of G.\nAccording to Theorem 1, we should obtain g that makes g \u25e6 f \u2212 Gg zero. However, such problems\ncannot be solved with \ufb01nite data because g is a function. Thus, we give the corresponding empirical\n\ngj(x) \u2208 span{g1, . . . , gn},\n\naigi(f (x)) =\n\nK\u02c6g =\n\naiGi,j\n\ni=1\n\nj=1\n\ni=1\n\n3\n\n\frisk minimization problem based on the assumption of ergodicity of f and the convergence property\nof the empirical matrix as follows.\nAssumption 1. For dynamical system (1), the time-average and space-average of a function g :\nM \u2192 R (or C) coincide in m \u2192 \u221e for almost all x0 \u2208 M, i.e.,\n\nlim\nm\u2192\u221e\n\n1\nm\n\ng(xj) =\n\nM\n\ng(x)d\u00b5(x),\n\nfor almost all x0 \u2208 M.\n\n(cid:90)\n\nm\u22121(cid:88)\n\nj=0\n\n0 and 1\n\nm Y1Y H\n\nTheorem 2. De\ufb01ne Y0 and Y1 by Eq. (4) and suppose that Assumption 1 holds. If all modes are\n\u2020\n0 almost surely converges\nsuf\ufb01ciently excited in the data (i.e., rank(Y0) = n), then matrix A = Y1Y\nto the matrix form of linear operator G in m \u2192 \u221e.\nM(g \u25e6 f )gHd\u00b5 and\n(cid:1)\u2020\nProof. From Assumption 1, 1\nm Y0Y H\nM ggHd\u00b5 for almost all x0 \u2208 M. In addition, since the rank of Y0Y H\n0 )\u2020 con-\nm Y0Y H\n\n0 respectively converge to(cid:82)\nM ggHd\u00b5)\u2020 in m \u2192 \u221e [24]. Consequently, in m \u2192 \u221e, A =(cid:0) 1\n\n(cid:82)\nverges to ((cid:82)\n\n0 is always n, ( 1\nm Y1Y H\n\nalmost surely converges to G, which is the matrix form of linear operator G.\n\u2020\nSince A = Y1Y\n0 is the minimum-norm solution of the linear least-squares regression from the\ncolumns of Y0 to those of Y1, we constitute the learning problem to estimate a set of function that\ntransforms the original data into a form in which the linear least-squares regression \ufb01ts well. In\nparticular, we minimize RSS, which measures the discrepancy between the data and the estimated\nregression model (i.e., linear least-squares in this case). We de\ufb01ne the RSS loss as follows:\n\n(cid:1)(cid:0) 1\n\nm Y0Y H\n\n0\n\n0\n\nLRSS(g; (x0, . . . , xm)) =\n\n(5)\nwhich becomes zero when g spans a Koopman invariant subspace. If we implement a smooth\nparametric model on g, the local minima of LRSS can be found using gradient descent. We adopt g\nthat achieves a local minimum of LRSS as a set of observables that spans (approximately) a Koopman\ninvariant subspace.\n\nF\n\n,\n\n\u2020\n0 )Y0\n\n(cid:13)(cid:13)(cid:13)Y1 \u2212 (Y1Y\n\n(cid:13)(cid:13)(cid:13)2\n\n3.2 Linear delay embedder for state space reconstruction\n\nIn the previous subsection, we have presented an important part of the principle of LKIS, i.e.,\nminimization of the RSS of linear least-squares regression. Note that, to de\ufb01ne RSS loss (5), we need\naccess to a sequence of the original states, i.e., (x0, . . . , xm) \u2208 Mm+1, as a dataset. In practice,\nhowever, we cannot necessarily observe full states x due to limited memory and sensor capabilities.\nIn this case, only transformed (and possibly degenerated) measurements are available, which we\ndenote y = \u03c8(x) with a measurement function \u03c8 : M \u2192 Rr. To de\ufb01ne RSS loss (5) given only\ndegenerated measurements, we must reconstruct the original states x from the actual observations y.\nHere, we utilize delay-coordinate embedding, which has been widely used for state space reconstruc-\ntion in the analysis of nonlinear dynamics. Consider a univariate time-series (. . . , yt\u22121, yt, yt+1, . . . ),\nwhich is a sequence of degenerated measurements yt = \u03c8(xt). According to the well-known\nTaken\u2019s theorem [25], [26], a faithful representation of xt that preserves the structure of the state\n\nyt\u2212\u03c4\n\n\u00b7\u00b7\u00b7\n\nyt\u2212(d\u22121)\u03c4\n\n(cid:3)T with some lag parameter \u03c4 and\n(cid:3)T with each value of \u03c4 and\n\nspace can be obtained by \u02dcxt = (cid:2)yt\n\u02dcxt =(cid:2)y1,t\n\ny1,t\u2212\u03c411\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\ny2,t\u2212\u03c421\n\ny2,t\u2212\u03c42d2\n\nembedding dimension d if d is greater than 2 dim(x). For a multivariate time-series, embedding\nwith non-uniform lags provides better reconstruction [27]. For example, when we have a two-\ny2,t]T, an embedding with non-uniform lags is similar to\ndimensional time-series yt = [y1,t\ny1,t\u2212\u03c41d1\ny2,t\n\nd. Several methods have been proposed for selection of \u03c4 and d [27]\u2013[29]; however, appropriate\nvalues may depend on the given application (attractor inspection, prediction, etc.).\nIn this paper, we propose to surrogate the parameter selection of the delay-coordinate embedding by\nlearning a linear delay embedder from data. Formally, we learn embedder \u03c6 such that\n, W\u03c6 \u2208 Rp\u00d7kr,\n\n(6)\nwhere p = dim( \u02dcx), r = dim(y), and k is a hyperparameter of maximum lag. We estimate weight\nW\u03c6 as well as the parameters of g by minimizing RSS loss (5), which is now de\ufb01ned using \u02dcx instead\nof x. Learning \u03c6 from data yields an embedding that is suitable for learning a Koopman invariant\nsubspace. Moreover, we can impose L1 regularization on weight W\u03c6 to make it highly interpretable\nif necessary according to the given application.\n\n\u02dcxt = \u03c6(y(k)\n\n(cid:2)yT\n\n\u00b7\u00b7\u00b7 yT\n\n) = W\u03c6\n\n(cid:3)T\n\nt\u2212k+1\n\nyT\nt\u22121\n\nt\n\nt\n\n4\n\n\fFigure 1: An instance of LKIS framework, in which g and h are implemented by MLPs.\n\n3.3 Reconstruction of original measurements\nSimple minimization of LRSS may yield trivial g, such as constant values. We should impose some\nconstraints to prevent such trivial solutions. In the proposed framework, modal decomposition is\n\ufb01rst obtained in terms of learned observables g; thus, the values of g must be back-projected to the\nspace of the original measurements y to obtain a physically meaningful representation of the dynamic\nmodes. Therefore, we modify the loss function by employing an additional term such that the original\nmeasurements y can be reconstructed from the values of g by a reconstructor h, i.e., y \u2248 h(g( \u02dcx)).\nSuch term is given as follows:\n\nm(cid:88)\n\nLrec(h, g; ( \u02dcx0, . . . , \u02dcxm)) =\n\n(cid:107)yj \u2212 h(g( \u02dcxj))(cid:107)2 ,\n\n(7)\n\nj=0\n\nand, if h is a smooth parametric model, this term can also be reduced using gradient descent. Finally,\nthe objective function to be minimized becomes\nL(\u03c6, g, h; (y0, . . . , ym)) = LRSS(g, \u03c6; ( \u02dcxk\u22121, . . . , \u02dcxm)) + \u03b1Lrec(h, g; ( \u02dcxk\u22121, . . . , \u02dcxm)),\n\n(8)\n\nwhere \u03b1 is a parameter that controls the balance between LRSS and Lrec.\n\n3.4\n\nImplementation using neural networks\n\nIn Sections 3.1\u20133.3, we introduced the main concepts for the LKIS framework, i.e., RSS loss\nminimization, learning the linear delay embedder, and reconstruction of the original measurements.\nHere, we demonstrate an implementation of the LKIS framework using neural networks.\nFigure 1 shows a schematic diagram of the implementation of the framework. We model g and h\nusing multi-layer perceptrons (MLPs) with a parametric ReLU activation function [30]. Here, the\nsizes of the hidden layer of MLPs are de\ufb01ned by the arithmetic means of the sizes of the input and\noutput layers of the MLPs. Thus, the remaining tunable hyperparameters are k (maximum delay\nof \u03c6), p (dimensionality of \u02dcx), and n (dimensionality of g). To obtain g with dimensionality much\ngreater than that of the original measurements, we found that it was useful to set k > 1 even when\nfull-state measurements (e.g., y = x) were available.\nAfter estimating the parameters of \u03c6, g, and h, DMD can be performed normally by using the values\nof the learned g, de\ufb01ning the data matrices in Eq. (4), and computing the eigendecomposition of\n\u2020\n0 ; the dynamic modes are obtained by w, and the values of the eigenfunctions are obtained\nA = Y1Y\nby \u03d5 = zHg, where w and z are the right- and left-eigenvectors of A. See Section 2.2 for details.\nIn the numerical experiments described in Sections 5 and 6, we performed optimization using \ufb01rst-\norder gradient descent. To stabilize optimization, batch normalization [31] was imposed on the\ninputs of hidden layers. Note that, since RSS loss function (5) is not decomposable with regard to\ndata points, convergence of stochastic gradient descent (SGD) cannot be shown straightforwardly.\nHowever, we empirically found that the non-decomposable RSS loss was often reduced successfully,\neven with mini-batch SGD. Let us show an example; the full-batch RSS loss (denoted L(cid:63)\nRSS) under the\nupdates of the mini-batch SGD are plotted in the rightmost panel of Figure 4. Here, L(cid:63)\nRSS decreases\nrapidly and remains small. For SGD on non-decomposable losses, Kar et al. [32] provided guarantees\nfor some cases; however, examining the behavior of more general non-decomposable losses under\nmini-batch updates remains an open problem.\n\n4 Related work\n\nThe proposed framework is motivated by the operator-theoretic view of nonlinear dynamical systems.\nIn contrast, learning a generative (state-space) model for nonlinear dynamical systems directly has\nbeen actively studied in machine learning and optimal control communities, on which we mention a\n\n5\n\noriginal time-series...,ytk+1,ytk+2,...,yt,yt+1,...\u02dcxt\u02dcxt+1gg(\u02dcxt)g(\u02dcxt+1)gLRSShhytyt+1LrecLrec\fFigure 3: (left) Data generated from system (9)\nand white Gaussian observation noise and (right)\nthe estimated Koopman eigenvalues. LKIS-DMD\nsuccessfully identi\ufb01es the eigenvalues even with\nthe observation noise.\n\nFigure 2: (left) Data generated from system (9)\nand (right) the estimated Koopman eigenvalues.\nWhile linear Hankel DMD produces an inconsis-\ntent eigenvalue, LKIS-DMD successfully identi-\n\ufb01es \u03bb, \u00b5, \u03bb2, and \u03bb0\u00b50 = 1.\nfew examples. A classical but popular method for learning nonlinear dynamical systems is using an\nexpectation-maximization algorithm with Bayesian \ufb01ltering/smoothing (see, e.g., [33]). Recently,\nusing approximate Bayesian inference with the variational autoencoder (VAE) technique [34] to learn\ngenerative dynamical models has been actively researched. Chung et al. [35] proposed a recurrent\nneural network with random latent variables, Gao et al. [36] utilized VAE-based inference for neural\npopulation models, and Johnson et al. [37] and Krishnan et al. [38] developed inference methods for\nstructured models based on inference with a VAE. In addition, Karl et al. [39] proposed a method to\nobtain a more consistent estimation of nonlinear state space models. Moreover, Watter et al. [40]\nproposed a similar approach in the context of optimal control. Since generative models are intrinsically\naware of process and observation noises, incorporating methodologies developed in such studies to\nthe operator-theoretic perspective is an important open challenge to explicitly deal with uncertainty.\nWe would like to mention some studies closely related to our method. After the \ufb01rst submission of\nthis manuscript (in May 2017), several similar approaches to learning data transform for Koopman\nanalysis have been proposed [41]\u2013[45]. The relationships and relative advantages of these methods\nshould be elaborated in the future.\n\n5 Numerical examples\n\nIn this section, we provide numerical examples of DMD based on the LKIS framework (LKIS-DMD)\nimplemented using neural networks. We conducted experiments on three typical nonlinear dynamical\nsystems: a \ufb01xed-point attractor, a limit-cycle attractor, and a system with multiple basins of attraction.\nWe show the results of comparisons with other recent DMD algorithms, i.e., Hankel DMD [46], [47],\nextended DMD [14], and DMD with reproducing kernels [15]. The detailed setups of the experiments\ndiscussed in this section and the next section are described in the supplementary.\nFixed-point attractor Consider a two-dimensional nonlinear map on xt = [x1,t x2,t]T:\n\nx2,t+1 = \u00b5x2,t + (\u03bb2 \u2212 \u00b5)x2\n1,t,\n\n\u03bb\u03d5j\n\nx1,t+1 = \u03bbx1,t,\n\n(9)\nwhich has a stable equilibrium at the origin if \u03bb, \u00b5 < 1. The Koopman eigenvalues of system (9)\ninclude \u03bb and \u00b5, and the corresponding eigenfunctions are \u03d5\u03bb(x) = x1 and \u03d5\u00b5(x) = x2 \u2212 x2\n1,\nrespectively. \u03bbi\u00b5j is also an eigenvalue with corresponding eigenfunction \u03d5i\n\u00b5. A minimal\n1}, and the eigenvalues of the Koopman\nKoopman invariant subspace of system (9) is span{x1, x2, x2\noperator restricted to such subspace include \u03bb, \u00b5 and \u03bb2. We generated a dataset using system (9)\nwith \u03bb = 0.9 and \u00b5 = 0.5 and applied LKIS-DMD (n = 4), linear Hankel DMD [46], [47] (delay 2),\n1}, which corresponds to extended DMD [14] with a\nand DMD with basis expansion by {x1, x2, x2\nright and minimal observable dictionary. The estimated Koopman eigenvalues are shown in Figure 2,\nwherein LKIS-DMD successfully identi\ufb01es the eigenvalues of the target invariant subspace. In\nFigure 3, we show eigenvalues estimated using data contaminated with white Gaussian observation\nnoise (\u03c3 = 0.1). The eigenvalues estimated by LKIS-DMD coincide with the true values even with\nthe observation noise, whereas the results of DMD with basis expansion (i.e., extended DMD) are\ndirectly affected by the observation noise.\nLimit-cycle attractor We generated data from the limit cycle of the FitzHugh\u2013Nagumo equation\n(10)\nwhere a = 0.7, b = 0.8, c = 0.08, and I = 0.8. Since trajectories in a limit-cycle are periodic, the\n(discrete-time) Koopman eigenvalues should lie near the unit circle. Figure 4 shows the eigenvalues\n\n\u02d9x2 = c(x1 \u2212 bx2 + a),\n\n\u02d9x1 = x3\n\n1/3 + x1 \u2212 x2 + I,\n\n6\n\n20406080100-4-2024681012x1x2Re(6)-0.6-0.4-0.200.20.40.60.81Im(6)-0.2-0.100.10.20.3LKISlinear Hankelbasis exp.truth20406080100-4-2024681012noisyx1noisyx2Re(6)-0.6-0.4-0.200.20.40.60.81Im(6)-0.2-0.100.10.20.3LKISlinear Hankelbasis exp.truth\fFigure 4: The left four panels show the estimated Koopman eigenvalues on the limit-cycle of the\nFitzHugh-Nagumo equation by LKIS-DMD, linear Hankel DMD, and kernel DMDs with polynomial\nand RBF kernels. The hyperparameters of each DMD are set to produce 16 eigenvalues. The\nrightmost plot shows the full-batch (size 2,000) loss under mini-batch (size 200) SGD updates along\niterations. Non-decomposable part L(cid:63)\n\nRSS decreases rapidly and remains small, even by SGD.\n\nFigure 5: (left) The continuous-time Koopman eigenvalues estimated by LKIS-DMD on the Duf\ufb01ng\nequation. (center) The true basins of attraction of the Duf\ufb01ng equation, wherein points in the blue\nregion evolve toward (1, 0) and points in the red region evolve toward (\u22121, 0). Note that the stable\nmanifold of the saddle point is not drawn precisely. (right) The values of the Koopman eigenfunction\nwith a nearly zero eigenvalue computed by LKIS-DMD, whose level sets should correspond to the\nbasins of attraction. There is rough agreement between the true boundary of the basins of attraction\nand the numerically computed boundary. The right two plots are best viewed in color.\n\nestimated by LKIS-DMD (n = 16), linear Hankel DMD [46], [47] (delay 8), and DMDs with\nreproducing kernels [15] (polynomial kernel of degree 4 and RBF kernel of width 1). The eigenvalues\nproduced by LKIS-DMD agree well with those produced by kernel DMDs, whereas linear Hankel\nDMD produces eigenvalues that would correspond to rapidly decaying modes.\nMultiple basins of attraction Consider the unforced Duf\ufb01ng equation\n\n\u00a8x = \u2212\u03b4 \u02d9x \u2212 x(\u03b2 + \u03b1x2), x = [x\n\n\u02d9x]T ,\n\n(11)\n\n0]T or [\u22121\n\nwhere \u03b1 = 1, \u03b2 = \u22121, and \u03b4 = 0.5. States x following (11) evolve toward [1\n0]T\ndepending on which basin of attraction the initial value belongs to unless the initial state is on\nthe stable manifold of the saddle. Generally, a Koopman eigenfunction whose continuous-time\neigenvalue is zero takes a constant value in each basin of attraction [14]; thus, the contour plot of\nsuch an eigenfunction shows the boundary of the basins of attraction. We generated 1,000 episodes\nof time-series starting at different initial values uniformly sampled from [\u22122, 2]2. The left plot in\nFigure 5 shows the continuous-time Koopman eigenvalues estimated by LKIS-DMD (n = 100), all\nof which correspond to decaying modes (i.e., negative real parts) and agree with the property of the\ndata. The center plot in Figure 5 shows the true basins of attraction of (11), and the right plot shows\nthe estimated values of the eigenfunction corresponding to the eigenvalue of the smallest magnitude.\nThe surface of the estimated eigenfunction agrees qualitatively with the true boundary of the basins\nof attractions, which indicates that LKIS-DMD successfully identi\ufb01es the Koopman eigenfunction.\n\n6 Applications\n\nThe numerical experiments in the previous section demonstrated the feasibility of the proposed\nmethod as a fully data-driven method for Koopman spectral analysis. Here, we introduce practical\napplications of LKIS-DMD.\nChaotic time-series prediction Prediction of a chaotic time-series has received signi\ufb01cant interest\nin nonlinear physics. We would like to perform the prediction of a chaotic time-series using DMD,\nsince DMD can be naturally utilized for prediction as follows. Since g(xt) is decomposed as\ni g(xt) where zi is a left-eigenvalue of K, the next\ni g(xt))ci. In\n\n(cid:80)n\nstep of g can be described in terms of the current step, i.e., g(xt+1) =(cid:80)n\n\ni=1 \u03d5i(xt)ci and \u03d5 is obtained by \u03d5i(xt) = zH\n\ni=1 \u03bbi(zH\n\n7\n\nRe(6)Im(6)LKISRe(6)Im(6)linear HankelRe(6)Im(6)polynomialRe(6)Im(6)RBFiterations05010015010-610-410-2100102104log(L?RSS)log(,L?rec)Re(log(6)=/t)-20-18-16-14-12-10-8-6-4-20Im(log(6)=/t)-10-50510x_xx_x\fFigure 7: The top plot shows the raw time-series\nobtained by a far-infrared laser [50]. The other plots\nshow the results of unstable phenomena detection,\nwherein the peaks should correspond to the occur-\nrences of unstable phenomena.\n\nFigure 6: The left plot shows RMS errors from\n1- to 30-step predictions, and the right plot\nshows a part of the 30-step prediction obtained\nby LKIS-DMD on (upper) the Lorenz-x series\nand (lower) the Rossler-x series.\naddition, in the case of LKIS-DMD, the values of g must be back-projected to y using the learned h.\nWe generated two types of univariate time-series by extracting the {x} series of the Lorenz attractor\n[48] and the Rossler attractor [49]. We simulated 25,000 steps for each attractor and used the \ufb01rst\n10,000 steps for training, the next 5,000 steps for validation, and the last 10,000 steps for testing\nprediction accuracy. We examined the prediction accuracy of LKIS-DMD, a simple LSTM network,\nand linear Hankel DMD [46], [47], all of whose hyperparameters were tuned using the validation set.\nThe prediction accuracy of every method and an example of the predicted series on the test set by\nLKIS-DMD are shown in Figure 6. As can be seen, the proposed LKIS-DMD achieves the smallest\nroot-mean-square (RMS) errors in the 30-step prediction.\nUnstable phenomena detection One of the most popular applications of DMD is the investigation\nof the global characteristics of dynamics by inspecting the spatial distribution of the dynamic modes.\nIn addition to the spatial distribution, we can investigate the temporal pro\ufb01les of mode activations by\nexamining the values of corresponding eigenfunctions. For example, assume there is an eigenfunction\n\u03d5\u03bb(cid:28)1 that corresponds to a discrete-time eigenvalue \u03bb whose magnitude is considerably smaller\nthan one. Such a small eigenvalue indicates a rapidly decaying (i.e., unstable) mode; thus, we can\ndetect occurrences of unstable phenomena by observing the values of \u03d5\u03bb(cid:28)1. We applied LKIS-DMD\n(n = 10) to a time-series generated by a far-infrared laser, which was obtained from the Santa Fe\nTime Series Competition Data [50]. We investigated the values of eigenfunction \u03d5\u03bb(cid:28)1 corresponding\nto the eigenvalue of the smallest magnitude. The original time-series and values of \u03d5\u03bb(cid:28)1 obtained\nby LKIS-DMD are shown in Figure 7. As can be seen, the activations of \u03d5\u03bb(cid:28)1 coincide with\nsudden decays of the pulsation amplitudes. For comparison, we applied the novelty/change-point\ndetection technique using one-class support vector machine (OC-SVM) [51] and direct density-ratio\nestimation by relative unconstrained least-squares importance \ufb01tting (RuLSIF) [52]. We computed\nAUC, de\ufb01ning the sudden decays of the amplitudes as the points to be detected, which were 0.924,\n0.799, and 0.803 for LKIS, OC-SVM, and RuLSIF, respectively.\n\n7 Conclusion\n\nIn this paper, we have proposed a framework for learning Koopman invariant subspaces, which\nis a fully data-driven numerical algorithm for Koopman spectral analysis. In contrast to existing\napproaches, the proposed method learns (approximately) a Koopman invariant subspace entirely\nfrom the available data based on the minimization of RSS loss. We have shown empirical results for\nseveral typical nonlinear dynamics and application examples.\nWe have also introduced an implementation using multi-layer perceptrons; however, one possible\ndrawback of such an implementation is the local optima of the objective function, which makes\nit dif\ufb01cult to assess the adequacy of the obtained results. Rather than using neural networks, the\nobservables to be learned could be modeled by a sparse combination of basis functions as in [23] but\nstill utilizing optimization based on RSS loss. Another possible future research direction could be\nincorporating approximate Bayesian inference methods, such as VAE [34]. The proposed framework\nis based on a discriminative viewpoint, but inference methodologies for generative models could be\nused to modify the proposed framework to explicitly consider uncertainty in data.\n\n8\n\n0102030RMS error0.511.522.53LKISLSTMlinear Hankel-20-15-10-50510152030-step predictiontruth0102030RMS error0.20.40.60.811.21.41.61.822.2-10-50510rawLKISOC-SVMRuLSIF\fAcknowledgments\n\nThis work was supported by JSPS KAKENHI Grant No. JP15J09172, JP26280086, JP16H01548,\nand JP26289320.\n\nReferences\n\n[1] M. W. Hirsch, S. Smale, and R. L. Devaney, Differential equations, dynamical systems, and an\n\nintroduction to chaos, 3rd. Academic Press, 2013.\n\n[2] A. Lasota and M. C. Mackey, Chaos, fractals, and noise: Stochastic aspects of dynamics, 2nd.\n\nSpringer, 1994.\n\n[3] B. O. Koopman, \u201cHamiltonian systems and transformation in Hilbert space,\u201d Proceedings of\nthe National Academy of Sciences of the United States of America, vol. 17, no. 5, pp. 315\u2013318,\n1931.\nI. Mezi\u00b4c, \u201cSpectral properties of dynamical systems, model reduction and decompositions,\u201d\nNonlinear Dynamics, vol. 41, no. 1-3, pp. 309\u2013325, 2005.\n\n[4]\n\n[5] M. Budi\u0161i\u00b4c, R. Mohr, and I. Mezi\u00b4c, \u201cApplied Koopmanism,\u201d Chaos, vol. 22, p. 047 510, 2012.\n[6] C. W. Rowley, I. Mezi\u00b4c, S. Bagheri, P. Schlatter, and D. S. Henningson, \u201cSpectral analysis of\n\nnonlinear \ufb02ows,\u201d Journal of Fluid Mechanics, vol. 641, pp. 115\u2013127, 2009.\n\n[8]\n\n[7] P. J. Schmid, \u201cDynamic mode decomposition of numerical and experimental data,\u201d Journal of\n\nFluid Mechanics, vol. 656, pp. 5\u201328, 2010.\nJ. L. Proctor and P. A. Eckhoff, \u201cDiscovering dynamic patterns from infectious disease data\nusing dynamic mode decomposition,\u201d International Health, vol. 7, no. 2, pp. 139\u2013145, 2015.\n[9] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz, \u201cExtracting spatial-temporal\ncoherent patterns in large-scale neural recordings using dynamic mode decomposition,\u201d Journal\nof Neuroscience Methods, vol. 258, pp. 1\u201315, 2016.\n\n[10] E. Berger, M. Sastuba, D. Vogt, B. Jung, and H. B. Amor, \u201cEstimation of perturbations in\nrobotic behavior using dynamic mode decomposition,\u201d Advanced Robotics, vol. 29, no. 5,\npp. 331\u2013343, 2015.\nJ. N. Kutz, X. Fu, and S. L. Brunton, \u201cMultiresolution dynamic mode decomposition,\u201d SIAM\nJournal on Applied Dynamical Systems, vol. 15, no. 2, pp. 713\u2013735, 2016.\n\n[11]\n\n[12] A. Mauroy and J. Goncalves, \u201cLinear identi\ufb01cation of nonlinear systems: A lifting technique\nbased on the Koopman operator,\u201d in Proceedings of the 2016 IEEE 55th Conference on\nDecision and Control, 2016, pp. 6500\u20136505.\nJ. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor, Dynamic mode decomposition:\nData-driven modeling of complex systems. SIAM, 2016.\n\n[13]\n\n[14] M. O. Williams, I. G. Kevrekidis, and C. W. Rowley, \u201cA data-driven approximation of the\nKoopman operator: Extending dynamic mode decomposition,\u201d Journal of Nonlinear Science,\nvol. 25, no. 6, pp. 1307\u20131346, 2015.\n\n[15] Y. Kawahara, \u201cDynamic mode decomposition with reproducing kernels for Koopman spectral\nanalysis,\u201d in Advances in Neural Information Processing Systems, vol. 29, 2016, pp. 911\u2013919.\nI. Mezi\u00b4c, \u201cAnalysis of \ufb02uid \ufb02ows via spectral properties of the Koopman operator,\u201d Annual\nReview of Fluid Mechanics, vol. 45, pp. 357\u2013378, 2013.\n\n[16]\n\n[17] G. Froyland, G. A. Gottwald, and A. Hammerlindl, \u201cA computational method to extract\nmacroscopic variables and their dynamics in multiscale systems,\u201d SIAM Journal on Applied\nDynamical Systems, vol. 13, no. 4, pp. 1816\u20131846, 2014.\n\n[18] N. Takeishi, Y. Kawahara, and T. Yairi, \u201cSubspace dynamic mode decomposition for stochastic\n\nKoopman analysis,\u201d Physical Review E, vol. 96, no. 3, 033310, p. 033 310, 3 2017.\nJ. L. Proctor, S. L. Brunton, and J. N. Kutz, \u201cDynamic mode decomposition with control,\u201d\nSIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142\u2013161, 2016.\n\n[19]\n\n[20] \u2014\u2014, \u201cGeneralizing Koopman theory to allow for inputs and control,\u201d arXiv:1602.07647,\n\n2016.\nJ. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz, \u201cOn dynamic mode\ndecomposition: Theory and applications,\u201d Journal of Computational Dynamics, vol. 1, no. 2,\npp. 391\u2013421, 2014.\n\n[21]\n\n9\n\n\f[22] S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N. Kutz, \u201cKoopman invariant subspaces\nand \ufb01nite linear representations of nonlinear dynamical systems for control,\u201d PLoS ONE, vol.\n11, no. 2, e0150171, 2016.\n\n[23] S. L. Brunton, J. L. Proctor, and J. N. Kutz, \u201cDiscovering governing equations from data by\nsparse identi\ufb01cation of nonlinear dynamical systems,\u201d Proceedings of the National Academy\nof Sciences of the United States of America, vol. 113, no. 15, pp. 3932\u20133937, 2016.\n\n[24] V. Rako\u02c7cevi\u00b4c, \u201cOn continuity of the Moore\u2013Penrose and Drazin inverses,\u201d Matemati\u02c7cki Vesnik,\n\nvol. 49, no. 3-4, pp. 163\u2013172, 1997.\n\n[25] F. Takens, \u201cDetecting strange attractors in turbulence,\u201d in Dynamical Systems and Turbulence,\n\nWarwick 1980, ser. Lecture Notes in Mathematics, vol. 898, 1981, pp. 366\u2013381.\n\n[26] T. Sauer, J. A. Yorke, and M. Casdagli, \u201cEmbedology,\u201d Journal of Statistical Physics, vol. 65,\n\nno. 3-4, pp. 579\u2013616, 1991.\n\n[27] S. P. Garcia and J. S. Almeida, \u201cMultivariate phase space reconstruction by nearest neighbor\nembedding with different time delays,\u201d Physical Review E, vol. 72, no. 2, 027205, p. 027 205,\n2005.\n\n[28] Y. Hirata, H. Suzuki, and K. Aihara, \u201cReconstructing state spaces from multivariate data using\n\n[29]\n\nvariable delays,\u201d Physical Review E, vol. 74, no. 2, 026202, p. 026 202, 2006.\nI. Vlachos and D. Kugiumtzis, \u201cNonuniform state-space reconstruction and coupling detection,\u201d\nPhysical Review E, vol. 82, no. 1, 016207, p. 016 207, 2010.\n\n[30] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDelving deep into recti\ufb01ers: Surpassing human-level\nperformance on imagenet classi\ufb01cation,\u201d in Proceedings of the 2015 IEEE International\nConference on Computer Vision, 2015, pp. 1026\u20131034.\n\n[31] S. Ioffe and C. Szegedy, \u201cBatch normalization: Accelerating deep network training by reducing\ninternal covariate shift,\u201d in Proceedings of the 32nd International Conference on Machine\nLearning, ser. Proceedings of Machine Learning Research, vol. 37, 2015, pp. 448\u2013456.\n\n[32] P. Kar, H. Narasimhan, and P. Jain, \u201cOnline and stochastic gradient methods for non-\ndecomposable loss functions,\u201d in Advances in Neural Information Processing Systems, vol. 27,\n2014, pp. 694\u2013702.\n\n[33] Z. Ghahramani and S. T. Roweis, \u201cLearning nonlinear dynamical systems using an EM\nalgorithm,\u201d in Advances in Neural Information Processing Systems, vol. 11, 1999, pp. 431\u2013\n437.\n\n[34] D. P. Kingma and M. Welling, \u201cStochastic gradient VB and the variational auto-encoder,\u201d in\n\nProceedings of the 2nd International Conference on Learning Representations, 2014.\nJ. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio, \u201cA recurrent latent\nvariable model for sequential data,\u201d in Advances in Neural Information Processing Systems,\nvol. 28, 2015, pp. 2980\u20132988.\n\n[35]\n\n[36] Y. Gao, E. W. Archer, L. Paninski, and J. P. Cunningham, \u201cLinear dynamical neural population\nmodels through nonlinear embeddings,\u201d in Advances in Neural Information Processing Systems,\nvol. 29, 2016, pp. 163\u2013171.\n\n[37] M. Johnson, D. K. Duvenaud, A. Wiltschko, R. P. Adams, and S. R. Datta, \u201cComposing\ngraphical models with neural networks for structured representations and fast inference,\u201d in\nAdvances in Neural Information Processing Systems, vol. 29, 2016, pp. 2946\u20132954.\n\n[38] R. G. Krishnan, U. Shalit, and D. Sontag, \u201cStructured inference networks for nonlinear state\nspace models,\u201d in Proceedings of the 31st AAAI Conference on Arti\ufb01cial Intelligence, 2017,\npp. 2101\u20132109.\n\n[39] M. Karl, M. Soelch, J. Bayer, and P. van der Smagt, \u201cDeep variational Bayes \ufb01lters: Unsuper-\nvised learning of state space models from raw data,\u201d in Proceedings of the 5th International\nConference on Learning Representations, 2017.\n\n[40] M. Watter, J. Springenberg, J. Boedecker, and M. Riedmiller, \u201cEmbed to control: A locally\nlinear latent dynamics model for control from raw images,\u201d in Advances in Neural Information\nProcessing Systems, vol. 28, 2015, pp. 2746\u20132754.\n\n[41] Q. Li, F. Dietrich, E. M. Bollt, and I. G. Kevrekidis, \u201cExtended dynamic mode decomposition\nwith dictionary learning: A data-driven adaptive spectral decomposition of the Koopman\noperator,\u201d Chaos, vol. 27, p. 103 111, 2017.\n\n[42] E. Yeung, S. Kundu, and N. Hodas, \u201cLearning deep neural network representations for Koop-\n\nman operators of nonlinear dynamical systems,\u201d arXiv:1708.06850, 2017.\n\n10\n\n\f[43] A. Mardt, L. Pasquali, H. Wu, and F. No\u00e9, \u201cVAMPnets: Deep learning of molecular kinetics,\u201d\n\narXiv:1710.06012, 2017.\n\n[44] S. E. Otto and C. W. Rowley, \u201cLinearly-recurrent autoencoder networks for learning dynamics,\u201d\n\narXiv:1712.01378, 2017.\n\n[45] B. Lusch, J. N. Kutz, and S. L. Brunton, \u201cDeep learning for universal linear embeddings of\n\nnonlinear dynamics,\u201d arXiv:1712.09707, 2017.\n\n[46] H. Arbabi and I. Mezi\u00b4c, \u201cErgodic theory, dynamic mode decomposition and computation of\nspectral properties of the Koopman operator,\u201d SIAM Journal on Applied Dynamical Systems,\nvol. 16, no. 4, 2096\u20132126, 2017.\n\n[47] Y. Susuki and I. Mezi\u00b4c, \u201cA Prony approximation of Koopman mode decomposition,\u201d in\nProceedings of the 2015 IEEE 54th Conference on Decision and Control, 2015, pp. 7022\u2013\n7027.\n\n[48] E. N. Lorenz, \u201cDeterministic nonperiodic \ufb02ow,\u201d Journal of the Atmospheric Sciences, vol. 20,\n\nno. 2, pp. 130\u2013141, 1963.\n\n[49] O. E. R\u00f6ssler, \u201cAn equation for continuous chaos,\u201d Physical Letters, vol. 57A, no. 5, pp. 397\u2013\n\n398, 1976.\n\n[50] A. S. Weigend and N. A. Gershenfeld, Eds., Time series prediction: Forecasting the future\n\nand understanding the past, ser. Santa Fe Institute Series. Westview Press, 1993.\n\n[51] S. Canu and A. Smola, \u201cKernel methods and the exponential family,\u201d Neurocomputing, vol.\n\n69, no. 7-9, pp. 714\u2013720, 2006.\n\n[52] S. Liu, M. Yamada, N. Collier, and M. Sugiyama, \u201cChange-point detection in time-series data\n\nby relative density-ratio estimation,\u201d Neural Networks, vol. 43, pp. 72\u201383, 2013.\n\n11\n\n\f", "award": [], "sourceid": 768, "authors": [{"given_name": "Naoya", "family_name": "Takeishi", "institution": "The University of Tokyo"}, {"given_name": "Yoshinobu", "family_name": "Kawahara", "institution": "Osaka University / RIKEN"}, {"given_name": "Takehisa", "family_name": "Yairi", "institution": "The University of Tokyo"}]}