{"title": "Metric on Nonlinear Dynamical Systems with Perron-Frobenius Operators", "book": "Advances in Neural Information Processing Systems", "page_first": 2856, "page_last": 2866, "abstract": "The development of a metric for structural data is a long-term problem in pattern recognition and machine learning. In this paper, we develop a general metric for comparing nonlinear dynamical systems that is defined with Perron-Frobenius operators in reproducing kernel Hilbert spaces. Our metric includes the existing fundamental metrics for dynamical systems, which are basically defined with principal angles between some appropriately-chosen subspaces, as its special cases. We also describe the estimation of our metric from finite data. We empirically illustrate our metric with an example of rotation dynamics in a unit disk in a complex plane, and evaluate the performance with real-world time-series data.", "full_text": "Metric on Nonlinear Dynamical Systems\n\nwith Perron-Frobenius Operators\n\nIsao Ishikawa\u2020\u2021, Keisuke Fujii\u2020, Masahiro Ikeda\u2020\u2021, Yuka Hashimoto\u2020\u2021, Yoshinobu Kawahara\u2020\u00a7\n\n\u2020RIKEN Center for Advanced Intelligence Project\n\n\u2021School of Fundamental Science and Technology, Keio University\n\u00a7The Institute of Scienti\ufb01c and Industrial Research, Osaka University\n\n{isao.ishikawa, keisuke.fujii.zh, masahiro.ikeda}@riken.jp\n\nyukahashimoto@keio.jp, ykawahara@sanken.osaka-u.ac.jp\n\nAbstract\n\nThe development of a metric for structural data is a long-term problem in pattern\nrecognition and machine learning. In this paper, we develop a general metric for\ncomparing nonlinear dynamical systems that is de\ufb01ned with Perron-Frobenius\noperators in reproducing kernel Hilbert spaces. Our metric includes the existing\nfundamental metrics for dynamical systems, which are basically de\ufb01ned with\nprincipal angles between some appropriately-chosen subspaces, as its special cases.\nWe also describe the estimation of our metric from \ufb01nite data. We empirically\nillustrate our metric with an example of rotation dynamics in a unit disk in a\ncomplex plane, and evaluate the performance with real-world time-series data.\n\n1\n\nIntroduction\n\nClassi\ufb01cation and recognition has been one of the main focuses of research in machine learning\nfor the past decades. When dealing with some structural data other than vector-valued ones, the\ndevelopment of an algorithm for this problem according to the type of the structure is basically\nreduced to the design of an appropriate metric or kernel. However, not much of the existing literature\nhas addressed the design of metrics in the context of dynamical systems. To the best of our knowledge,\nthe metric for ARMA models based on comparing their cepstrum coef\ufb01cients [12] is one of the\n\ufb01rst papers to address this problem. De Cock and De Moor extended this to linear state-space\nmodels by considering the subspace angles between the observability subspaces [6]. Meanwhile,\nVishwanathan et al. developed a family of kernels for dynamical systems based on the Binet-Cauchy\ntheorem [25]. Chaudhry and Vidal extended this to incorporate the invariance on initial conditions [4].\nAs mentioned in some of the above literature, the existing metrics for dynamical systems that\nhave been developed are de\ufb01ned with principal angles between some appropriate subspaces such\nas column subspaces of observability matrices. However, those are basically restricted to linear\ndynamical systems although Vishwanathan et al. mentioned an extension with reproducing kernels\nfor some speci\ufb01c metrics [25]. Recently, Fujii et al. discussed a more general extension of these\nmetrics to nonlinear systems with Koopman operator [8]. Mezic et al. propose metrics of dynamcal\nsystems in the context of ergodic theory via Koopman operators on L2-spaces[14, 15]. The Koopman\noperator, also known as the composition operator, is a linear operator on an observable for a nonlinear\ndynamical system [10]. Thus, by analyzing the operator in place of directly nonlinear dynamics,\none could extract more easily some properties about the dynamics. In particular, spectral analysis\nof Koopman operator has attracted attention with its empirical procedure called dynamic mode\ndecomposition (DMD) in a variety of \ufb01elds of science and engineering [18, 2, 17, 3].\nIn this paper, we develop a general metric for nonlinear dynamical systems, which includes the\nexisting fundamental metrics for dynamical systems mentioned above as its special cases. This metric\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fis de\ufb01ned with Perron-Frobenius operators in reproducing kernel Hilbert spaces (RKHSs), which are\nshown to be essentially equivalent to Koopman operators, and allows us to compare a pair of datasets\nthat are supposed to be generated from nonlinear systems. We also describe the estimation of our\nmetric from \ufb01nite data. We empirically illustrate our metric using an example of rotation dynamics in\na unit disk in a complex plane, and evaluate the performance with real-world time-series data.\nThe remainder of this paper is organized as follows. In Section 2, we \ufb01rst brie\ufb02y review the de\ufb01nition\nof Koopman operator, especially the one de\ufb01ned in RKHSs. In Section 3, we give the de\ufb01nition of\nour metric for comparing nonlinear dynamical systems (NLDSs) with Koopman operators and, then,\ndescribe the estimation of the metric from \ufb01nite data. In Section 4, we describe the relation of our\nmetric to the existing ones. In Section 5, we empirically illustrate our metric with synthetic data and\nevaluate the performance with real-world data. Finally, we conclude this paper in Section 6.\n\nKg = g  f ,\n\n2 Perron-Frobenius operator in RKHS\nConsider a discrete-time nonlinear dynamical system xt+1 = f (xt) with time index t 2 T := {0}[N\nand de\ufb01ned on a state space M (i.e., x 2M ), where x is the state vector and f : M!M is\na (possibly, nonlinear) state-transition function. Then, the Koopman operator (also known as the\ncomposition operator), which is denoted by K here, is a linear operator in a function space X de\ufb01ned\nby the rule\n(1)\nwhere g is an element of X. The domain D(K) of the Koopman operator K is D(K) := {g 2\nX | g  f 2 X}, where  denotes the composition of g with f [10]. The choice of X depends on\nthe problem considered. In this paper, we consider X as an RKHS. The function g is referred as an\nobservable. We see that K acts linearly on the function g, even though the dynamics de\ufb01ned by f may\nbe nonlinear. In recent years, spectral decomposition methods for this operator has attracted attention\nin a variety of scienti\ufb01c and engineering \ufb01elds because it could give a global modal description of\na nonlinear dynamical system from data. In particular, a variant of estimation algorithms, called\ndynamic mode decomposition (DMD), has been successfully applied in many real-world problems,\nsuch as image processing [11], neuroscience [3], and system control [16]. In the community of\nmachine learning, several algorithmic improvements have been investigated by a formulation with\nreproducing kernels [9] and in a Bayesian framework [22].\nNow, let Hk be the RKHS endowed with a dot product h\u00b7,\u00b7i and a positive de\ufb01nite kernel k : X\u21e5X !\nC (or R), where X is a set. Here, Hk is a function space on X . The corresponding feature map is\ndenoted by  : X!H k. Also, assume M\u21e2X , and de\ufb01ne the closed subspace Hk,M \u21e2H k by\nthe closure of the vector space generated by (x) for 8x 2M , i.e. Hk,M := span{(x) | x 2M} .\nThen, the Perron-Frobenius operator in RKHS associated with f (see [9], note that Kf is called\nKoopman operator on the feature map  in the literature), Kf : Hk,M !H k,M, is de\ufb01ned as a linear\noperator with dense domain D(Kf ) := span ((M)) satisfying for all x 2M ,\n(2)\nSince Kf is densely de\ufb01ned, there exists the adjoint operator K\u21e4f . In the following proposition, we\nsee that K\u21e4f is essentially the same as Koopman operator K.\nProposition 2.1. Let X = H be the RKHS associated with the positive de\ufb01nite kernel k|M\u21e5M\nde\ufb01ned by the restriction of k to M\u21e5M , which is a function space on M. Let \u21e2 : Hk,M ! H be a\nlinear isomorphism de\ufb01ned via the restriction of functions from X to M. Then, we have\n\nKf [(x)] = (f (x)).\n\n\u21e2K\u21e4f \u21e21 = K,\n\nwhere (\u00b7)\u21e4 means the Hermitian transpose.\nProof. Let g 2 D(K). Since the feature map for H is the same as \u21e2  , by the reproducing property,\nhg, \u21e2Kf ((x))iH = hg, \u21e2  (f (x))iH = g  f (x) = hKg, \u21e2  (x)iH. Thus the de\ufb01nitions (1),\n(2), and the fact \u21e2\u21e4 = \u21e21 show the statement.\n\n3 Metric on NLDSs with Perron-Frobenius Operators in RKHSs\n\nWe propose a general metric for the comparison of nonlinear dynamical systems, which is de\ufb01ned with\nPerron-Frobenius operators in RKHSs. Intuitively, the metric compares the behaviors of dynamical\n\n2\n\n\fsystems over in\ufb01nite time. To ensure the convergence property, we consider the ratio of metrics,\nnamely angles instead of directly considering exponential decay terms. We \ufb01rst give the de\ufb01nition in\nSubsection 3.1, and then derive an estimator of the metric from \ufb01nite data in Subsection 3.2.\n\n3.1 De\ufb01nition\nLet Hob be a Hilbert space and M\u21e2X a subset. Let h : M!H ob be a map, often called\nan observable. We de\ufb01ne the observable operator for h by a linear operator Lh : Hk,M !H ob\nsuch that h = Lh  . We give two examples here: First, in the case of Hob = Cd and h(x) =\n(g1(x), . . . , gm(x)) for some g1, . . . , gm 2H k, the observable operator is Lh(v) := (hgi, vi)m\ni=1.\nThis situation appears, for example, in the context of DMD, where observed data is obtained by\nvalues of functions in RKHS. Secondly, in the case of Hob = Hk,M and h = |M, the observable\noperator is Lh(v) = v. This situation appears when we can observe the state space X , and we try to\nget more detailed information by observing data sent to RKHS via the feature map.\nLet Hin be a Hilbert space. we refer to Hin as an initial value space. We call a linear operator\nI : Hin !H k,M an initial value operator on M if I is a bounded operator. Initial value operators\nare regarded as expressions of initial values in terms of linear operators. In fact, in the case of\nHin = CN and let x1, . . . , xN 2M . Let I := ((x1), . . . , (xN )) be an initial value operator on\ni=1) =Pi ai(xi). Let Kf be a Perron-Frobenius\nM, which is a linear operator de\ufb01ned by I ((ai)N\noperator associated with a dynamical system f : M!M . Then for any positive integer n > 0,\ni=1) =Pi ai(f n(xi)), and Kn\nwe have Kn\nf I is a linear operator including information at\ntime n of the orbits of the dynamical system f with inital values x1, . . . , xN.\nNow, we de\ufb01ne triples of dynamical systems. A triple of a dynamical system with respect to an\ninitial value space Hin and an observable space Hob is a triple (f , h, I ), where the \ufb01rst component\nf : M!M is a dynamical system on a subset M\u21e2X (M depends on f) with Perron-Frobenius\noperator Kf , the second component h : M!H ob is an observable with an observable operator\nLh, and the third component I : Hin !H k,M is an initial value operator on M, such that for\nany r  0, the composition LhKr\nf I is well-de\ufb01ned and a Hilbert Schmidt operator. We denote by\nT (Hin,Hob) the set of triples of dynamical systems with respect to an initial value space Hin and an\nobservable space Hob.\nFor two triples D1 = (f1, h1, I1), D2 = (f2, h2, I2) 2 T (Hin,Hob), and for T, m 2 N, we \ufb01rst\nde\ufb01ne\n\nf I ((ai)N\n\nKT\n\nm (D1, D2) := tr m^ T1Xr=0Lh2Kr\n\nf2I2\u21e4 Lh1Kr\n\nf1I1! 2 C,\n\nwhere the symbol ^m is the m-th exterior product (see Appendix A). We note that since Kfi is\nbounded, we regard Kfi as a unique extension of Kfi to a bounded linear operator with domain\nHk,M.\nProposition 3.1. The function KT\n\nm is a positive de\ufb01nite kernel on T (Hin,Hob).\n\nProof. See Appendix B\n\nNext, for positive number \"> 0, we de\ufb01ne AT\n\nm with KT\n\nm by\n\nAT\n\nm (D1, D2) := lim\n\u270f!+0\n\n(\u270f + KT\n\nm (D2, D2)) 2 [0, 1].\n\nWe remark that for D 2 T (Hin,Hob),KT\nT =1 is a non-negative increasing sequence.\nNow, we denote by `1 the Banach space of bounded sequences of complex numbers, and de\ufb01ne\nAm : T (Hin,Hob)2 ! `1 by\n\nm (D1, D1)) (\u270f + KT\n\nm (D1, D2)2\n\u270f + KT\nm(D, D)1\nAm :=AT\nm1\n\nT =1\n\nMoreover, we introduce Banach limits for elements of `1. The Banach limit is a bounded linear\nfunctional B : `1 ! C satisfying B ((1)1n=1) = 1, B ((zn)1n=1) = B ((zn+1)1n=1) for any (zn)n,\nand B((zn)1n=1)  0 for any non-negative real sequence (zn)1n=1, namely zn  0 for all n  1.\nWe remark that if (zn)n 2 `1 converges a complex number \u21b5, then for any Banach limit B,\nB ((zn)1n=1) = \u21b5. The existence of the Banach limits is \ufb01rst introduced by Banach [1] and proved\nthrough the Hahn-Banach theorem. In general, the Banach limit is not unique.\n\n3\n\n\fDe\ufb01nition 3.1. For an integer m > 0 and a Banach limit B, a positive de\ufb01nite kernel A Bm is de\ufb01ned\nby\n\nA Bm := B (Am) .\n\nWe remark that positive de\ufb01niteness of A Bm follows Proposition 3.1 and the properties of the Banach\nlimit. We then simply denote A Bm (D1, D2) by Am(D1, D2) if Am(D1, D2) converges since that is\nindependent of the choice of B.\nIn general, a Banach limit B is hard to compute. However, under some assumption and suitable\nchoice of B, we prove that A Bm is computable in Proposition 3.6 below. Thus, we obtain an estimation\nformula of A Bm (see [20], [21], [7] for other results on the estimation of Banach limit). In the\nfollowing proposition, we show that we can construct a pseudo-metric from the positive de\ufb01nite\nkernel A Bm:\n\nProposition 3.2. Let B be a Banach limit. For m > 0, p1  A Bm (\u00b7,\u00b7) is a pseudo-metric on\nT (Hin,Hob).\nProof. See Appendix C.\n\nRemark 3.3. Although we de\ufb01ned KT\nm with RKHS, it can be de\ufb01ned in a more general situation\nas follows. Let H, H0 and H00 be Hilbert spaces. For i = 1, 2, let Vi \u21e2H be a closed subspace,\nKi : Vi ! Vi and Li : Vi !H 00 linear operators, and let Ii : H0 ! Vi be a bounded operator. Then,\nwe can de\ufb01ne KT\n\nm between the triples (K1, L1, I1) and (K2, L2, I2) in the similar manner.\n\n3.2 Estimation from \ufb01nite data\n\nvi := I ((0, . . . , 0,\n\ni\n1, 0, . . . , 0)).\n\ni=1 7! PN\n\ni\n1, 0, . . . , 0)), we have I = (v1, . . . , vN ).\n\nNow we derive an formula to compute the above metric from \ufb01nite data, which allows us to compare\nseveral time-series data generated from dynamical systems just by evaluating the values of kernel func-\ntions. First, we argue the computability of A Bm (D1, D2) and then state the formula for computation.\nIn this section, the initial value space is of \ufb01nite dimension: Hin = CN, and for v1, . . . , vN 2H k,M.\nWe de\ufb01ne a linear operator (v1, . . . , vN ) : CN !H k,M by (ai)N\ni=1 aivi. We note\nthat any linear operator I : Hin = CN !H k,M is an initial value operator), and, by putting\nvi := I ((0, . . . , 0,\nDe\ufb01nition 3.4. Let D = (f , h, I ) 2 TCN ,Hob. We call D admissible if there exists Kf \u2019s\neigen-vectors '1,' 2,\u00b7\u00b7\u00b7 2 H k,M with ||'n|| = 1 and Kf 'n = n'n for all n  0 such that\n|1|| 2| . . . and each vi is expressed as vi =P1n=1 ai,n'n withP1n=1 |ai,n| < 1, where\nDe\ufb01nition 3.5. The triple D = (f , h, I ) 2 TCN ,Hob is semi-stable if D is admissible and\n|1|\uf8ff 1.\nThen, we have the following asymptotic properties of Am.\nProposition 3.6. Let D1, D2 2 TCN ,Hob. If D1 and D2 are semi-stable, then the sequence\nAm (D1, D2) converges and the limit is equal to A Bm (D1, D2) for any Banach limit B. Similarly,\nlet C be the Ces\u00e0ro operator, namely, C is de\ufb01ned to be C((xn)1n=1) :=n1Pn\nn=1. If D1\nand D2 are admissible, then CAm (D1, D2) converges and the limit is equal to A Bm (D1, D2) for\nany Banach limit B with BC = B.\nProof. See Appendix D.\n\nk=1 xn1\n\nWe note that it is proved that there exists a Banach limit with BC = B [19, Theorem 4]. The\nadmissible or semi-stable condition holds in many cases, for example, in our illustrative example\n(Section 5.1).\nNow, we derive an estimation formula of the above metric from \ufb01nite time-series data. To this end,\nwe \ufb01rst need the following lemma:\n\n4\n\n\fm(D1, D2)\n\nhave the following formula:\nKT\n\nLemma 3.7. Let D1 = f1, h1, (v1,l)N\nT1Xt1,...,tm=0 X0<s1<...\n\n<sm\uf8ffNDLhiKt1\n\n=\n\nfi\n\nl=1 , D2 = f2, h2, (v2,l)N\n\nvi,s1 ^\u00b7\u00b7\u00b7^ LhiKtm\n\nfi\n\nvi,sm, Lhj Kt1\nfj\n\nl=1 2 TCN ,Hob. Then we\nvj,smE\n\nvj,s1 ^\u00b7\u00b7\u00b7^ Lhj Ktm\n\nfj\n\nProof. See Appendix E.\n\nKT\n\n=\n\ni,t), xl\n\ni,t = hi(xl\n\ni,0 2M i.\n\ni,t+1 = fixl\n\ni,t , yl\n\nFor i = 1, 2, we consider N time-series sequences {yl\ni,2, . . .}\u21e2H ob in an observable\nspace (l = 1, . . . , N), which are supposed to be generated from dynamical system fi on Mi \u21e2X\nand observed via hi. That is, we consider, for i = 1, 2, t 2 T, and l = 1, . . . , N,\n(3)\n\ni,0, yl\n\ni,1, yl\n\nxl\n\nm(D1, D2)\n\nLemma 3.7, we have\n\nAssume for i = 1, 2, the triple Di = \u21e3fi, hi,(xl\nl=1\u2318 is in TCN ,Hob. Then, from\ni,0)N\nT1Xt1,...,tm=0 X0<s1<...\nj,t1 ^\u00b7\u00b7\u00b7^ Lhj xsm\n<sm\uf8ffN\u2326Lhixs1\ni,t1 ^\u00b7\u00b7\u00b7^ Lhixsm\nj,tm\u21b5\ni,tm , Lhj xs1\n1CCA .\ndet0BB@\nj,tm\u21b5Hob\n\u2326ys1\nj,t1\u21b5Hob\ni,t1, ysm\ni,t1, ys1\nT1Xt1,...,tm=0 X0<s1<\u00b7\u00b7\u00b7<sm\uf8ffN\n...\n...\n\u2326ysm\nj,tm\u21b5Hob\nj,t1\u21b5Hob\ni,tm, ysm\ni,tm, ys1\nIn the case of Hob = Hk and hi = |Mi, we see that\u2326ysa\nj,td\u21b5Hob\n\nj,td). Therefore,\nby Proposition 3.6, if Di\u2019s are semi-stable or admissible, then we can compute an convergent estimator\nof A Bm through AT\n\nm just by evaluating the values of kernel functions.\n\n\u00b7\u00b7\u00b7\n...\n\u00b7\u00b7\u00b7\ni,tb, ysc\n\n\u2326ys1\n\u2326ysm\n\ni,tb, xsc\n\n= k(xsa\n\n(4)\n\n=\n\n4 Relation to Existing Metrics on Dynamical Systems\n\nIn this section, we show that our metric covers the existing metrics de\ufb01ned in the previous works\n[12, 6, 25]. That is, we describe the relation to the metric via subspace angles and Martin\u2019s metric in\nSubsection 4.1 and the one to the Binet-Chaucy metric for dynamical systems in Subsection 4.2 as\nthe special cases of our metric.\n\n4.1 Relation to metric via principal angles and Martin\u2019s metric\nIn this subsection, we show that in a certain situation, our metric reconstruct the metric (De\ufb01nition\n2 in [12]) for the ARMA models introduced by Martin [12] and DeCock-DeMoor [6]. Moreover,\nour formula generalizes their formula to the non-stable case, that is, we do not need to assume the\neigenvalues are strictly smaller than 1.\nWe here consider two linear dynamical systems. That is, in Eqs. (3), let fi : Rq ! Rq and hi : Rq !\nRr be linear maps for i = 1, 2 with l = 1, which we respectively denote by Ai and Ci. Then,\nDe Cock and De Moor propose to compare these two models by using the subspace angles as\n\nd((A1, C1), (A2, C2)) =  log\n\ncos2 \u2713i,\n\n(5)\n\n(CiAi)> (CiA2\n\nwhere \u2713i is the i-th subspace angle between the column spaces of the extended observability matrices\ni )> \u00b7\u00b7\u00b7 ] for i = 1, 2. Meanwhile, Martin de\ufb01ne a distance on AR\nOi := [C>i\nmodels via cepstrum coef\ufb01cients, which is later shown to be equivalent to the distance (5) [6].\nNow, we regard X = Rq. The positive de\ufb01nite kernel here is the usual inner product of Rq and the\nassociated RKHS is canonically isomorphic to Cq. Let Hin = Cq and Hob = Cr. Note that for\n\nmYi=1\n\n5\n\n\fi = 1, 2, Di = (Ai, Ci, Iq) 2 T (Cq, Cr), and for any linear maps f : Rq ! Rq and h : Rq ! RN,\nKf = f and Lh = h.\nThen we have the following theorem:\nProposition 4.1. The sequence Aq (D1, D2) converges. In the case that the systems are observable\nand stable, this limit Aq (D1, D2) is essentially equal to (5).\n\nProof. See Appendix F.\n\nTherefore, we can de\ufb01ne a metric between linear dynamical systems with (A1, C1) and (A2, C2) by\nAq (D1, D2).\nMoreover, the value Aq (D1, D2) captures an important characteristic of behavior of dynamical\nsystems. We here illustrate it in the situation where the state space models come from AR models.\nWe will see that Aq (D1, D2) has a sensitive behavior on the unit circle, and gives a reasonable\ngeneralization of Martin\u2019s metric [12] to the non-stable case.\nFor i = 1, 2, we consider an observable AR model:\n\n(6)\nwhere ai,k 2 R for k 2{ 1,\u00b7\u00b7\u00b7 , q}. Let Ci = (1, 0, . . . , 0) 2 C1\u21e5q, and let Ai be the companion\nmatrix for Mi. And, let i,1, . . . , i,q be the roots of the equation yq  ai,1yq1 \u00b7\u00b7\u00b7 ai,q = 0.\nFor simplicity, we assume these roots are distinct complex numbers. Then, we de\ufb01ne\n\n(Mi) yt = ai,1yt1 + \u00b7\u00b7\u00b7 + ai,qytq,\n\nAs a result, if |P1| = |P2|, |R1| = |R2|, and Q1 = Q2, we have\n\nPi :=ni,n |i,n| > 1o , Qi :=ni,n |i,n| = 1o , and Ri :=ni,n |i,n| < 1o .\n\u00b7 Y\u21b5,2R11  \u21b5 \u00b7 Y\u21b5,2R21  \u21b5\n= Y\u21b5,2P11  \u21b5 \u00b7 Y\u21b5,2P21  \u21b5\n\nAq (D1, D2)\n\n,\n\n(7)\n\nY\u21b52P1,2P2\n\n|1  \u21b5|2\n\nY\u21b52R1,2R2\n\n|1  \u21b5|2\n\nand, otherwise, Aq (D1, D2) = 0. The detail of the derivation is in Appendix G.\nThrough this metric, we can observe a kind of \u201cphase transition\u201d of linear dynamical systems on the\nunit circle, and the metric has sensitive behavior when eigen values on it. We note that in the case of\nPi = Qi = ;, the formula (7) is essentially equivalent to the distance (5) (see Theorem 4 in [6]).\n4.2 Relation to the Binet-Cauchy metric on dynamical systems\n\nHere, we discuss the relation between our metric and the Binet-Cauchy kernels on dynamical systems\nde\ufb01ned by Vishwanathan et al. [25, Section 5]. Let us consider two linear dynamical systems as\nin Subsection 4.1. In [25, Section 5], they give two kernels to measure the distance between two\nsystems (for simplicity, here we disregard the expectations over variables); the trace kernels ktr and\nthe determinant kernels kdet, which are respectively de\ufb01ned by\n\nktr((x1,0, f1, h1), (x2,0, f2, h2)) =\n\nety>1,ty2,t,\n\nkdet((x1,0, f1, h1), (x2,0, f2, h2)) = det 1Xt=1\n\nety1,ty>2,t! ,\n\n1Xt=1\n\nwhere > 0 is a positive number satisfying e||f1||||f2||<1 to make the limits convergent. And\nx1,0 and x2,0 are initial state vectors, which affect the kernel values through the evolutions of the\nobservation sequences. Vishwanathan et al. discussed a way of removing the effect of initial values\nby taking expectations over those by assuming some distributions.\n\n6\n\n\fThese kernels can be described in terms of our notation as follows (see also Remark 3.3). That\nis, let us regard Hk = Cq. For i = 1, 2, we de\ufb01ne Di := (efi, hi, xi,0) 2 T (C, Cr), and\nD\u21e4i := (ef\u21e4i , x\u21e4i,0, h\u21e4i ) 2 T (Cr, C). Then these are described as\nKT\n\nktr ((x1,0, f1, h1), (x2,0, f2, h2)) = lim\nT!1\nkdet ((x1,0, f1, h1), (x2,0, f2, h2)) = lim\nT!1\n\n1 (D1, D2) ,\nr (D\u21e41, D\u21e42) .\nKT\n\nNote that, introducing the exponential discounting e is a way to construct a mathematically valid\nkernel to compare dynamical systems. However, in a certain situation, this method does not work\neffectively. In fact, if we consider three dynamical systems on R: \ufb01x a small positive number \u270f> 0\nand let f1(x) = (1 + \u270f)x, f2(x) = x, and f3(x) = (1  \u270f)x be linear dynamical systems. We\nchoose 1 2 R as the initial value. Here, it would be natural to regard these dynamical systems are\n\"different\" each other even with almost zero \u270f. However, if we compute the kernel de\ufb01ned via the\nexponential discounting, these dynamical systems are judged to be similar or almost the same. Instead\nof introducing such an exponential discounting, our idea to construct a mathematically valid kernel is\nconsidering the limit of the ratio of kernels de\ufb01ned via \ufb01nite series of the orbits of dynamical systems.\nAs a consequence, we do not need to introduce the exponential discounting. It enables ones to deal\nwith a wide range of dynamical systems, and capture the difference of the systems effectively. In fact,\nin the above example, our kernel judges these dynamical systems are completely different, i.e., the\nvalue of A1 for each pair among them takes zero.\n\n5 Empirical Evaluations\n\nWe empirically illustrate how our metric works with synthetic data of the rotation dynamics on the\nunit disk in a complex plane in Subsection 5.1, and then evaluate the discriminate performance of our\nmetric with real-world time-series data in Subsection 5.2.\n\nIllustrative example: Rotation on the unit disk\n\n5.1\nWe use the rotation dynamics on the unit disk in the complex plane since we can compute the analytic\nsolution of our metric for this dynamics. Here, we regard X = D := {z 2 C | |z| < 1} and let\nk(z, w) := (1  zw)1 be the Szeg\u00f6 kernel for z, w 2 D. The corresponding RKHS Hk is the\nspace of holomorphic functions f on D with the Taylor expansion f (z) =Pn0 an(f )zn such that\nPn0 |an(f )|2 < 1. For f, g 2H k, the inner product is de\ufb01ned by hf, gi :=Pn0 an(f )an(g).\nLet Hin = C and Hob = Hk.\nFor \u21b5 2 C with |\u21b5|\uf8ff 1, let R\u21b5 : D ! D; z 7! \u21b5z. We denote by K\u21b5 the Koopman operator for\nRKHS de\ufb01ned by R\u21b5. We note that since K\u21b5 is the adjoint of the composition operator de\ufb01ned by\nR\u21b5, by Littlewood subordination theorem, K\u21b5 is bounded. Now, we de\ufb01ne z : Hk ! C; f 7! f (z)\nand z,w : Hk ! C2; f 7! (f (z), f (w)). Then we de\ufb01ne D1\n\u21b5,z := (R\u21b5,, \u21e4z ) 2 T (C,Hk) and\n\u21b5,z := (R\u21b5,, \u21e4z,\u21b5z ) 2 T (C2,Hk).\nD2\nBy direct computation, we have the following formula (see Appendix H and Appendix I for the\nderivation): For A1, we have\n\nA1D1\n\n\u21b5,z, D1\n\n,w =\n\nFor A2 we have\n\n|1(zw)q|2\n\n(1|z|2)(1|w|2)\n(1 | z|2)(1 | w|2)\n1 | z|2\n1 | w|2\n1\n\n|\u21b5| = || = 1 and \u21b5 = e2\u21e1ip/q with (p, q) = 1,\n|\u21b5| = || = 1 and \u21b5 = e2\u21e1i with /2 Q,\n|\u21b5| = 1,|| < 1,\n|\u21b5| < 1,|| = 1,\n|\u21b5|,|| < 1.\n\n(8)\n\n(9)\n\n8>>>>><>>>>>:\n,w =8>>><>>>:\n\n\u21b5,z, D2\n\nA2D2\n\nO(|zw|2\u00b5(\u21b5,))\n0\n0\n(1|\u21b5|2)(1||2)\n\n|1\u21b5|2\n\n|1+\u21b5|2\n\n(1+|\u21b5|2)(1+||2) + O(|zw|2)\n\n\u00b7\n\n|\u21b5| = || = 1\n|\u21b5| = 1,|| < 1,\n|\u21b5| < 1,|| = 1,\n|\u21b5|,|| < 1.\n\nwhere, \u00b5(\u21b5, ) is a positive scalar value described in Appendix I. From the above, we see that A1\ndepends on the initial values of z and w, but A2 could independently discriminate the dynamics.\n\n7\n\n\f1: \n\n = 1/3, |\n\n| = 1\n\n2: \n\n = 1/3, |\n\n| = 0.9\n\n3: \n\n = 1/3, |\n\n| = 0.3\n\nz0\n\nA1\n\nA10\n1\n\nA100\n\n1\n\n1\n\n0.5\n\n0\n\n-0.5\n\n-1\n\n1\n\n0.5\n\n0\n\n-0.5\n\n-1\n\n1\n\n0.5\n\n0\n\n-0.5\n\n-1\n\n-1\n\n4: \n\n = 1/4, |\n\n| = 1\n\n5: \n\n = 1/4, |\n\n| = 0.9\n\n6: \n\n = 1/4, |\n\n| = 0.3\n\n7: \n\n = \n\n/3, |\n\n| = 1\n\n8: \n\n = \n\n/3, |\n\n| = 0.9\n\n9: \n\n = \n\n/3, |\n\n| = 0.3\n\n0\n\n1\n\n-1\n\n0\n\n1\n\n-1\n\n0\n\n1\n\n0.9\n\n0.3\n\n(a)\n\n(b)\n\n(c)\n\nFigure 1: Orbits of rotation dynamics\nby multiplying \u21b5 = |\u21b5|e2\u21e1i\u2713 on the unit\ndisk with the same initial values.\n\n(d)\n\n(e)\n\n(f)\n\nFigure 2: Comparison of empirical values (4) and\ntheoretical values (8) of the kernels AT\n1 and A1 of\nrotation dynamics with initial values z0\n\nSzeg\u00f6 kernel\n\nA100\n\n1\n\nA100\n\n2\n\nGaussian kernel\n2\n\nA100\n\nA100\n\n1\n\nKDMD[8]\n\nAkkp\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\n(e)\n\nz0\n\n0.9\n\n0.3\n\n(f)\n\n(g)\n\n(h)\n\n(i)\n\n(j)\n\nFigure 3: Discrimination results of various metrics for rotation dynamics with initial values z0.\nVertical and horizontal axes correspond to the dynamics in Figure 1.\n\nNext, we show empirical results with Eq. (4) from \ufb01nite data for this example.1 For A1, we consider\n\u21b5,t = \u21b5tz0, where \u21b5 = |\u21b5|e2\u21e1i\u2713. And for A2, we consider x1\n\u21b5,t = \u21b5tz0 and x2\nx1\n\u21b5,t = \u21b5t+1z0 =\n\u21b5tz1. The graphs in Figure 1 show the dynamics on the unit disk with \u2713 = {1/3, 1/4,\u21e1/ 3} and\n|\u21b5| = {1, 0.9, 0.3}. For simplicity, all of the initial values were set so that |z0| = 0.9.\nFigure 3 shows the confusion matrices for the above dynamics to see the discriminative performances\nof the proposed metric using the Szeg\u00f6 kernel (Figure 3a, 3b, 3f, and 3g), using radial basis function\n(Gaussian) kernel (Figure 3c, 3d, 3h, and 3i), and the comparable previous metric (Figure 3e and\n3j) [8]. For the Gaussian kernel, the kernel width was set as the median of the distances from data.\nThe last metric called Koopman spectral kernels [8] generalized the kernel de\ufb01ned by Vishwanathan\net al. [25] to the nonlinear dynamical systems and outperformed the method. Among the above\nkernels, we used Koopman kernel of principal angle (Akkp) between the subspaces of the estimated\nKoopman mode, showing the best discriminative performance [8].\nThe discriminative performance in A1 when T = 100 shown in Figure 2c converged to the analytic\nsolution when considering T ! 1 in Figure 2a compared with that when T = 10 in Figure 2b. As\nguessed from the theoretical results, although A1 did not discriminate the difference between the\ndynamics converging to the origin while rotating and that converging linearly, A2 in Figure 3b did.\nA2 using the Gaussian kernel (Ag2) in Figure 3d achieved almost perfect discrimination, whereas\nA1 using Gaussian kernel (Ag1) in Figure 3c and Akkp in Figure 3e did not. Also, we examined the\n\n1The Matlab code is available at https://github.com/keisuke198619/metricNLDS\n\n8\n\n\fa\n\ne\n\ni\n\nb\n\nf\n\nj\n\nc\n\ng\n\nk\n\nd\n\nh\n\nl\n\nFigure 4: Embeddings of four time series data using t-SNE for Ag1 (a-d), Ag2 (e-h), and Akkp\n(i-l). (a,e,i) Sony AIBO robot surface I and (b,f,j) II datasets. (c,g,k) Star light curve dataset. (d,h,l)\nComputers dataset. The markers x, o, and triangle represent the class 1, 2, and 3 in the datasets.\n\ncase of small initial values in Figure 3f-3j so that |z0| = 0.3 for all the dynamics. A2 (Figure 3g, 3i)\ndiscriminated the two dynamics, whereas the remaining metrics did not (Figure 3f, 3h, and 3j).\n\n5.2 Real-world time-series data\nIn this section, we evaluated our algorithm for discrimination using dynamical properties in time-\nseries datasets from various real-world domains. We used the UCR time series classi\ufb01cation archive\nas open-source real-world data [5]. It should be noted that our algorithm in this paper primarily target\nthe deterministic dynamics; therefore, we selected the examples apparently with smaller noises and\nderived from some dynamics (For random dynamical systems, see e.g., [13, 26, 23]). From the above\nviewpoints, we selected two Sony AIBO robot surface (sensor data), star light curve (sensor data),\ncomputers (device data) datasets. We used Am by Proposition 3.6 because we con\ufb01rmed that the data\nsatisfying the semi-stable condition in De\ufb01nition 3.5 using the approximation of Kf de\ufb01ned in [9].\nWe compared the discriminative performances by embedding of the distance matrices computed by\nthe proposed metric and the conventional Koopman spectral kernel used above. For clear visualization,\nwe randomly selected 20 sequences for each label from validation data, because our algorithms do\nnot learn any hyper-parameters using training data. All of these data are one-dimensional time-series\nbut for comparison, we used time-delay coordinates to create two-dimensional augmented time-series\nmatrices. Note that it would be dif\ufb01cult to apply the basic estimation methods of Koopman modes\nassuming high-dimensional data, such as DMD and its variants. In addition, we evaluated the\nclassi\ufb01cation error using k-nearest neighbor classi\ufb01er (k = 3) for simplicity. We used 40 sequences\nfor each label and computed averaged 10-fold cross-validation error (over 10 random trials).\nFigure 4 shows examples of the embedding of the Ag1, Ag2, and Akkp using t-SNE [24] for four\ntime-series data. In the Sony AIBO robot surface datasets, D in Figure 4a,b,e,f (classi\ufb01cation error:\n0.025, 0.038, 0.213, and 0.150) had better discriminative performance than Akkp in Figure 4i,j\n(0.100 and 0.275). This tendency was also observed in the star light curve dataset in Figure 4c,g,k\n(0.150, 0.150, and 0.217), where one class (circle) was perfectly discriminated using Ag1 and Ag2\nbut the distinction in the remaining two class was less obvious. In computers dataset, Ag2, and Akkp\nin Figure 4h,l (0.450 and 0.450) show slightly better discrimination than Akkp in Figure 4d (0.500).\n\n6 Conclusions\n\nIn this paper, we developed a general metric for comparing nonlinear dynamical systems that is\nde\ufb01ned with Koopman operator in RKHSs. We described that our metric includes Martin\u2019s metric and\nBinet-Cauchy kernels for dynamical systems as its special cases. We also described the estimation of\nour metric from \ufb01nite data. Finally, we empirically showed the effectiveness of our metric using an\nexample of rotation dynamics in a unit disk in a complex plane and real-world time-series data.\nSeveral perspectives to be further investigated related to this work would exist. For example, it would\nbe interesting to see discriminate properties of the metric in more details with speci\ufb01c algorithms.\nAlso, it would be important to develop models for prediction or dimensionality reduction for nonlinear\ntime-series data based on mathematical schemes developed in this paper.\n\n9\n\n\fReferences\n[1] S. Banach. Th\u00e9orie des \u00f3perations lin\u00e9aires. Chelsea Publishing Co., 1995.\n[2] E. Berger, M. Sastuba, D. Vogt, B. Jung, and H.B. Amor. Estimation of perturbations in robotic\n\nbehavior using dynamic mode decomposition. Advanced Robotics, 29(5):331\u2013343, 2015.\n\n[3] B.W. Brunton, J.A. Johnson, J.G. Ojemann, and J.N. Kutz. Extracting spatial-temporal coherent\npatterns in large-scale neural recordings using dynamic mode decomposition. Journal of\nNeuroscience Methods, 258:1\u201315, 2016.\n\n[4] R. Chaudhry and R. Vidal. Initial-state invariant Binet-Cauchy kernels for the comparison of\nlinear dynamical systems. In Proc. of the 52nd IEEE Conf. on Decision and Control (CDC\u201913),\npages 5377\u20135384, 2014.\n\n[5] Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista. The UCR\nTime Series Classi\ufb01cation Archive, 2015. URL: www.cs.ucr.edu/~eamonn/time_series_\ndata/.\n\n[6] K. De Cock and B. De Moor. Subspace angles between ARMA models. Systems & Control\n\nLetters 46, pages 265\u2013270, 2002.\n\n[7] B. Q. Feng and J. L. Li. Some estimations of Banach limits. J. Math. Anal. Appl., 323:481\u2013496,\n\n2006.\n\n[8] K. Fujii, Y. Inaba, and Y. Kawahara. Koopman spectral kernels for comparing complex dynamics:\nApplication to multiagent sport plays. In Proc. of the 2017 European Conf. on Machine Learning\nand Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD\u201917), pages\n127\u2013139. 2017.\n\n[9] Y. Kawahara. Dynamic mode decomposition with reproducing kernels for koopman spectral\n\nanalysis. In Advances in Neural Information Processing Systems 29, pages 911\u2013919. 2016.\n\n[10] B.O. Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the\n\nNational Academy of Sciences, 17(5):315\u2013318, 1931.\n\n[11] J.N. Kutz, X. Fu, and S.L. Brunton. Multiresolution dynamic mode decomposition. SIAM\n\nJournal on Applied Dynamical Systems, 15(2):713\u2013735, 2016.\n\n[12] R.J. Martin. A metric for ARMA processes. IEEE Trans. Signal Process. 48, page 1164\u20131170,\n\n2000.\n\n[13] I. Mezi\u00b4c. Spectral properties of dynamical systems, model reduction and decompositions.\n\nNonlinear Dynamics, 41(1):309\u2013325, 2005.\n\n[14] I. Mezic. Comparison of dynamics of dissipative \ufb01nite- time systems using koopman operator\n\nmethods. IFAC-PaperOnline 49-18, page 454\u2013461, 2016.\n\n[15] I. Mezic and A. Banaszuk. Comparison of systems with complex behavior. Physica D,\n\n197:101\u2013133, 2004.\n\n[16] J.L. Proctor, S.L. Brunton, and J.N. Kutz. Dynamic mode decomposition with control. SIAM\n\nJournal on Applied Dynamical Systems, 15(1):142\u2013161, 2016.\n\n[17] J.L. Proctor and P.A. Eckhoff. Discovering dynamic patterns from infectious disease data using\n\ndynamic mode decomposition. International health, 7(2):139\u2013145, 2015.\n\n[18] C.W. Rowley, I. Mezi\u00b4c, S. Bagheri, P. Schlatter, and D.S. Henningson. Spectral analysis of\n\nnonlinear \ufb02ows. Journal of Fluid Mechanics, 641:115\u2013127, 2009.\n\n[19] E. M. Semenov and F. A. Sukochev. Invariant banach limits and applications. Journal of\n\nFunctional Analysis, 259:1517\u20131541, 2010.\n\n[20] L. Sucheston. On existence of \ufb01nite invariant measures. Math. Z., 86:327\u2013336, 1964.\n[21] L. Sucheston. Banach limits. In Amer. Math. Monthly, volume 74, pages 308\u2013311. 1967.\n[22] N. Takeishi, Y. Kawahara, Y. Tabei, and T. Yairi. Bayesian dynamic mode decomposition. In\nProc. of the 26th Int\u2019l Joint Conf. on Arti\ufb01cial Intelligence (IJCAI\u201917), pages 2814\u20132821, 2017.\n[23] N. Takeishi, Y. Kawahara, and T. Yairi. Subspace dynamic mode decomposition for stochastic\n\nkoopman analysis. Physical Review E, 96:033310, 2017.\n\n[24] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine\n\nLearning Research, 9:2579\u20132605, 2008.\n\n10\n\n\f[25] S.V.N. Vishwanathan, A.J. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems\nand its application to the analysis of dynamic scenes. Int\u2019l J. of Computer Vision, 73(1):95\u2013119,\n2007.\n\n[26] M.O. Williams, I.G. Kevrekidis, and C.W. Rowley. A data-driven approximation of the koopman\noperator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25(6):1307\u2013\n1346, 2015.\n\n11\n\n\f", "award": [], "sourceid": 1495, "authors": [{"given_name": "Isao", "family_name": "Ishikawa", "institution": "RIKEN AIP"}, {"given_name": "Keisuke", "family_name": "Fujii", "institution": "RIKEN AIP Center"}, {"given_name": "Masahiro", "family_name": "Ikeda", "institution": "RIKEN AIP"}, {"given_name": "Yuka", "family_name": "Hashimoto", "institution": "NTT Network Technology Laboratories"}, {"given_name": "Yoshinobu", "family_name": "Kawahara", "institution": "Osaka University / RIKEN"}]}