{"title": "Switched Latent Force Models for Movement Segmentation", "book": "Advances in Neural Information Processing Systems", "page_first": 55, "page_last": 63, "abstract": "Latent force models encode the interaction between multiple related dynamical systems in the form of a kernel or covariance function. Each variable to be modeled is represented as the output of a differential equation and each differential equation is driven by a weighted sum of latent functions with uncertainty given by a Gaussian process prior. In this paper we consider employing the latent force model framework for the problem of determining robot motor primitives. To deal with discontinuities in the dynamical systems or the latent driving force we introduce an extension of the basic latent force model, that switches between different latent functions and potentially different dynamical systems. This creates a versatile representation for robot movements that can capture discrete changes and non-linearities in the dynamics. We give illustrative examples on both synthetic data and for striking movements recorded using a Barrett WAM robot as haptic input device. Our inspiration is robot motor primitives, but we expect our model to have wide application for dynamical systems including models for human motion capture data and systems biology.", "full_text": "Switched Latent Force Models\nfor Movement Segmentation\n\nMauricio A. \u00b4Alvarez 1, Jan Peters 2, Bernhard Sch\u00a8olkopf 2, Neil D. Lawrence 3,4\n1 School of Computer Science, University of Manchester, Manchester, UK M13 9PL\n\n2 Max Planck Institute for Biological Cybernetics, T\u00a8ubingen, Germany 72076\n3 School of Computer Science, University of Shef\ufb01eld, Shef\ufb01eld, UK S1 4DP\n4 The Shef\ufb01eld Institute for Translational Neuroscience, Shef\ufb01eld, UK S10 2HQ\n\nAbstract\n\nLatent force models encode the interaction between multiple related dynamical\nsystems in the form of a kernel or covariance function. Each variable to be mod-\neled is represented as the output of a differential equation and each differential\nequation is driven by a weighted sum of latent functions with uncertainty given\nby a Gaussian process prior. In this paper we consider employing the latent force\nmodel framework for the problem of determining robot motor primitives. To deal\nwith discontinuities in the dynamical systems or the latent driving force we intro-\nduce an extension of the basic latent force model, that switches between different\nlatent functions and potentially different dynamical systems. This creates a ver-\nsatile representation for robot movements that can capture discrete changes and\nnon-linearities in the dynamics. We give illustrative examples on both synthetic\ndata and for striking movements recorded using a Barrett WAM robot as haptic in-\nput device. Our inspiration is robot motor primitives, but we expect our model to\nhave wide application for dynamical systems including models for human motion\ncapture data and systems biology.\n\nIntroduction\n\n1\nLatent force models [1] are a new approach for modeling data that allows combining dimensionality\nreduction with systems of differential equations. The basic idea is to assume an observed set of\nD correlated functions to arise from an unobserved set of R forcing functions. The assumption is\nthat the R forcing functions drive the D observed functions through a set of differential equation\nmodels. Each differential equation is driven by a weighted mix of latent forcing functions. Sets\nof coupled differential equations arise in many physics and engineering problems particularly when\nthe temporal evolution of a system needs to be described. Learning such differential equations has\nimportant applications, e.g., in the study of human motor control and in robotics [6]. A latent force\nmodel differs from classical approaches as it places a probabilistic process prior over the latent\nfunctions and hence can make statements about the uncertainty in the system. A joint Gaussian\nprocess model over the latent forcing functions and the observed data functions can be recovered\nusing a Gaussian process prior in conjunction with linear differential equations [1]. The resulting\nlatent force modeling framework allows the combination of the knowledge of the systems dynamics\nwith a data driven model. Such generative models can be used to good effect, for example in ranked\ntarget prediction for transcription factors [5].\nIf a single Gaussian process prior is used to represent each latent function then the models we con-\nsider are limited to smooth driving functions. However, discontinuities and segmented latent forces\nare omnipresent in real-world data. For example, impact forces due to contacts in a mechanical\ndynamical system (when grasping an object or when the feet touch the ground) or a switch in an\nelectrical circuit result in discontinuous latent forces. Similarly, most non-rhythmic natural mo-\n\n1\n\n\ftor skills consist of a sequence of segmented, discrete movements. If these segments are separate\ntime-series, they should be treated as such and not be modeled by the same Gaussian process model.\nIn this paper, we extract a sequence of dynamical systems motor primitives modeled by second\norder linear differential equations in conjunction with forcing functions (as in [1, 6]) from human\nmovement to be used as demonstrations of elementary movements for an anthropomorphic robot.\nAs human trajectories have a large variability: both due to planned uncertainty of the human\u2019s\nmovement policy, as well as due to motor execution errors [7], a probabilistic model is needed to\ncapture the underlying motor primitives. A set of second order differential equations is employed\nas mechanical systems are of the same type and a temporal Gaussian process prior is used to allow\nprobabilistic modeling [1]. To be able to obtain a sequence of dynamical systems, we augment the\nlatent force model to include discontinuities in the latent function and change dynamics. We intro-\nduce discontinuities by switching between different Gaussian process models (super\ufb01cially similar\nto a mixture of Gaussian processes; however, the switching times are modeled as parameters so that\nat any instant a single Gaussian process is driving the system). Continuity of the observed functions\nis then ensured by constraining the relevant state variables (for example in a second order differential\nequation velocity and displacement) to be continuous across the switching points. This allows us\nto model highly non stationary multivariate time series. We demonstrate our approach on synthetic\ndata and real world movement data.\n\n2 Review of Latent force models (LFM)\nLatent force models [1] are hybrid models that combine mechanistic principles and Gaussian pro-\ncesses as a \ufb02exible way to introduce prior knowledge for data modeling. A set of D functions\n{yd(t)}D\nd=1 is modeled as the set of output functions of a series of coupled differential equations,\nwhose common input is a linear combination of R latent functions, {ur(t)}R\nr=1. Here we focus on a\nsecond order ordinary differential equation (ODE). We assume the output yd(t) is described by\n\nAd\n\nd2yd(t)\ndt2 + Cd\n\ndyd(t)\n\ndt\n\n+ \u03badyd(t) =(cid:80)R\n\nr=1Sd,rur(t),\n\nwhere, for a mass-spring-damper system, Ad would represent the mass, Cd the damper and \u03bad, the\nspring constant associated to the output d. We refer to the variables Sd,r as the sensitivity parameters.\nThey are used to represent the relative strength that the latent force r exerts over the output d. For\nsimplicity we now focus on the case where R = 1, although our derivations apply more generally.\nNote that models that learn a forcing function to drive a linear system have proven to be well-suited\nfor imitation learning for robot systems [6]. The solution of the second order ODE follows\n\ninitial conditions (IC). The angular frequency is given by \u03c9d = (cid:112)(4Ad\u03bad \u2212 C 2\n\n(1)\nwhere yd(0) and \u02d9yd(0) are the output and the velocity at time t = 0, respectively, known as the\nd) and the\n\nyd(t) = yd(0)cd(t) + \u02d9yd(0)ed(t) + fd(t, u),\n\nd)/(4A2\n\nremaining variables are given by\n\ncd(t) = e\u2212\u03b1dt(cid:104)\n(cid:90) t\n\nfd(t, u) = Sd\nAd\u03c9d\n\n0\n\ncos(\u03c9dt) + \u03b1d\n\u03c9d\n\nsin(\u03c9dt)\nGd(t \u2212 \u03c4)u(\u03c4)d\u03c4 = Sd\nAd\u03c9d\n\n(cid:105)\n(cid:90) t\n\n,\n\n0\n\ned(t) = e\u2212\u03b1dt\ne\u2212\u03b1d(t\u2212\u03c4 ) sin[(t \u2212 \u03c4)\u03c9d]u(\u03c4)d\u03c4,\n\nsin(\u03c9dt),\n\n\u03c9d\n\nwith \u03b1d = Cd/(2Ad). Note that fd(t, u) has an implicit dependence on the latent function u(t). The\nuncertainty in the model of Eq. (1) is due to the fact that the latent force u(t) and the initial conditions\nyd(0) and \u02d9yd(0) are not known. We will assume that the latent function u(t) is sampled from a zero\nmean Gaussian process prior, u(t) \u223c GP(0, ku,u(t, t(cid:48))), with covariance function ku,u(t, t(cid:48)).\nIf the initial conditions, yIC = [y1(0), y2(0), . . . , yD(0), v1(0), v2(0), . . . , vD(0)](cid:62), are indepen-\ndent of u(t) and distributed as a zero mean Gaussian with covariance KIC the covariance function\nbetween any two output functions, d and d(cid:48) at any two times, t and t(cid:48), kyd,yd(cid:48) (t, t(cid:48)) is given by\ncd(t)cd(cid:48)(t(cid:48))\u03c3yd,yd(cid:48) + cd(t)ed(cid:48)(t(cid:48))\u03c3yd,vd(cid:48) + ed(t)cd(cid:48)(t(cid:48))\u03c3vd,yd(cid:48) + ed(t)ed(cid:48)(t(cid:48))\u03c3vd,vd(cid:48) + kfd,fd(cid:48) (t, t(cid:48)),\nwhere \u03c3yd,yd(cid:48) , \u03c3yd,vd(cid:48) , \u03c3vd,yd(cid:48) and \u03c3vd,vd(cid:48) are entries of the covariance matrix KIC and\n\nkfd,fd(cid:48) (t, t(cid:48)) = K0\n\n0 Gd(cid:48)(t(cid:48) \u2212 \u03c4(cid:48))ku,u(t, t(cid:48))d\u03c4(cid:48)d\u03c4,\n\n(2)\n\n(cid:82) t\n0Gd(t \u2212 \u03c4)(cid:82) t(cid:48)\n\n2\n\n\fwhere K0 = SdSd(cid:48)/(AdAd(cid:48)\u03c9d\u03c9d(cid:48)). So the covariance function kfd,fd(cid:48) (t, t(cid:48)) depends on the covari-\nance function of the latent force u(t). If we assume the latent function has a radial basis function\n(RBF) covariance, ku,u(t, t(cid:48)) = exp[\u2212(t \u2212 t(cid:48))2/(cid:96)2], then kfd,fd(cid:48) (t, t(cid:48)) can be computed analyti-\ncally [1] (see also supplementary material). The latent force model induces a joint Gaussian process\nmodel across all the outputs. The parameters of the covariance function are given by the parameters\nof the differential equations and the length scale of the latent force. Given a multivariate time series\ndata set these parameters may be determined by maximum likelihood.\nThe model can be thought of as a set of mass-spring-dampers being driven by a function sampled\nfrom a Gaussian process. In this paper we look to extend the framework to the case where there can\nbe discontinuities in the latent functions. We do this through switching between different Gaussian\nprocess models to drive the system.\n\nzd(t) = yq\n\nd(tq\u22121) + eq\n\nd(t \u2212 tq\u22121) \u02d9yq\n\n\u02d9zd(t) = \u02d9yq\nwhere gd(t) = \u2212e\u2212\u03b1dt sin(\u03c9dt)(\u03b12\n\nd(t \u2212 tq\u22121) = gq\n(cid:20) \u03b1d\n\nd(t \u2212 tq\u22121)yq\nd\u03c9\u22121\n\nhd(t) = \u2212e\u2212\u03b1dt\n\nd(t \u2212 tq\u22121) = cq\n\n3 Switching dynamical latent force models (SDLFM)\nWe now consider switching the system between different latent forces. This allows us to change the\ndynamical system and the driving force for each segment. By constraining the displacement and\nvelocity at each switching time to be the same, the output functions remain continuous.\n3.1 De\ufb01nition of the model\nWe assume that the input space is divided in a series of non-overlapping intervals [tq\u22121, tq]Q\nq=1.\nDuring each interval, only one force uq\u22121(t) out of Q forces is active, that is, there are {uq\u22121(t)}Q\nq=1\nforces. The force uq\u22121(t) is activated after time tq\u22121 (switched on) and deactivated (switched off)\nafter time tq. We can use the basic model in equation (1) to describe the contribution to the output\ndue to the sequential activation of these forces. A particular output zd(t) at a particular time instant\nt, in the interval (tq\u22121, tq), is expressed as\nd(t \u2212 tq\u22121)yq\nd (t \u2212 tq\u22121, uq\u22121).\nd(tq\u22121) + f q\nThis equation is assummed to be valid for describing the output only inside the interval (tq\u22121, tq).\nd(t \u2212 tq\u22121) to represent the interval\nHere we highlighted this idea by including the superscript q in yq\nq for which the equation holds, although later we will omit it to keep the notation uncluttered. Note\nthat for Q = 1 and t0 = 0, we recover the original latent force model given in equation (1). We also\nde\ufb01ne the velocity \u02d9zd(t) at each time interval (tq\u22121, tq) as\nd(t \u2212 tq\u22121, uq\u22121),\n(cid:21)\n(cid:19)\nsin(\u03c9dt) \u2212 cos(\u03c9dt)\nGiven the parameters \u03b8 = {{Ad, Cd, \u03bad, Sd}D\n}, the uncertainty in the outputs is\ninduced by the prior over the initial conditions yq\nd(tq\u22121) for all values of tq\u22121 and the\nprior over latent force uq\u22121(t) that is active during (tq\u22121, tq). We place independent Gaussian\nprocess priors over each of these latent forces uq\u22121(t), assuming independence between them.\nd(tq\u22121), \u02d9yq\nFor initial conditions yq\nd(tq\u22121), we could assume that they are either parameters to\nbe estimated or random variables with uncertainty governed by independent Gaussian distribu-\ntions with covariance matrices K q\nIC as described in the last section. However, for the class\nof applications we will consider: mechanical systems, the outputs should be continuous across\nthe switching points. We therefore assume that the uncertainty about the initial conditions\nfor the interval q, yq\nd(tq\u22121) are proscribed by the Gaussian process that describes the\noutputs zd(t) and velocities \u02d9zd(t) in the previous interval q \u2212 1.\nIn particular, we assume\n(tq\u22121 \u2212 tq\u22122) and\nyq\nd(tq\u22121), \u02d9yq\n(tq\u22121 \u2212 tq\u22122) and covariances kzd,zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[yq\u22121\n(tq\u22121 \u2212\n(tq\u22121 \u2212 tq\u22122), yq\u22121\n\u02d9yq\u22121\nd(cid:48)\n(tq\u22121 \u2212 tq\u22122)]. We also consider\ntq\u22122)] and k \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[ \u02d9yq\u22121\ncovariances between zd(tq\u22121) and \u02d9zd(cid:48)(tq(cid:48)\u22121), this is, between positions and velocities for different\nvalues of q and d.\nExample 1. Let us assume we have one output (D = 1) and three switching intervals (Q = 3)\nwith switching points t0, t1 and t2. At t0, we assume that yIC follows a Gaussian distribution with\n\nd(tq\u22121) are Gaussian-distributed with mean values given by yq\u22121\n\n, md(t) = Sd\nAd\u03c9d\nd=1,{(cid:96)q\u22121}Q\nd(tq\u22121), \u02d9yq\n\n(tq\u22121 \u2212 tq\u22122), \u02d9yq\u22121\nd(cid:48)\n\nGd(t \u2212 \u03c4)u(\u03c4)d\u03c4\n\n.\n\n(cid:18)(cid:90) t\n\nd(tq\u22121) + hq\n\nd(t \u2212 tq\u22121) \u02d9yq\n\nd(tq\u22121) + mq\n\nd + \u03c9d) and\n\nd\n\nd\n\nd\ndt\n\n0\n\nq=1\n\n\u03c9d\n\nd(tq\u22121), \u02d9yq\n\nd\n\nd\n\n3\n\n\fmean zero and covariance KIC. From t0 to t1, the output z(t) is described by\n\nz(t) = y1(t \u2212 t0) = c1(t \u2212 t0)y1(t0) + e1(t \u2212 t0) \u02d9y1(t0) + f 1(t \u2212 t0, u0).\n\nThe initial condition for the position in the interval (t1, t2) is given by the last equation evaluated a\nt1, this is, z(t1) = y2(t1) = y1(t1 \u2212 t0). A similar analysis is used to obtain the initial condition\nassociated to the velocity, \u02d9z(t1) = \u02d9y2(t1) = \u02d9y1(t1 \u2212 t0). Then, from t1 to t2, the output z(t) is\n\nz(t) = y2(t \u2212 t1) = c2(t \u2212 t1)y2(t1) + e2(t \u2212 t1) \u02d9y2(t1) + f 2(t \u2212 t1, u1),\n\n= c2(t \u2212 t1)y1(t1 \u2212 t0) + e2(t \u2212 t1) \u02d9y1(t1 \u2212 t0) + f 2(t \u2212 t1, u1).\n\nFollowing the same train of thought, the output z(t) from t2 is given as\n\nz(t) = y3(t \u2212 t2) = c3(t \u2212 t2)y3(t2) + e3(t \u2212 t2) \u02d9y3(t2) + f 3(t \u2212 t2, u2),\n\nwhere y3(t2) = y2(t2 \u2212 t1) and \u02d9y3(t2) = \u02d9y2(t2 \u2212 t1). Figure 1 shows an example of the switching\ndynamical latent force model scenario. To ensure the continuity of the outputs, the initial condition\nis forced to be equal to the output of the last interval evaluated at the switching point.\n\n3.2 The covariance function\n\nThe derivation of the co-\nvariance function for the\nswitching model is rather\ninvolved.\nFor contin-\nuous output signals, we\nmust take into account con-\nstraints at each switching\ntime. This causes initial\nconditions for each inter-\nval to be dependent on \ufb01nal\nconditions for the previous\ninterval and induces cor-\nrelations across the inter-\nvals. This effort is worth-\nwhile though as the result-\ning model is very \ufb02exible\nand can take advantage of\nthe switching dynamics to represent a range of signals.\nAs a taster, Figure 2 shows samples from a covariance function of a switching dynamical latent\nforce model with D = 1 and Q = 3. Note that while the latent forces (a and c) are discrete,\nthe outputs (b and d) are continuous and have matching gradients at the switching points. The\noutputs are highly nonstationary. The switching times turn out to be parameters of the covariance\nfunction. They can be optimized along with the dynamical system parameters to match the location\nof the nonstationarities. We now give an overview of the covariance function derivation. Details are\nprovided in the supplementary material.\n\nFigure 1: Representation of an output constructed through a switching dynam-\nical latent force model with Q = 3. The initial conditions yq(tq\u22121) for each\ninterval are matched to the value of the output in the last interval, evaluated at\nthe switching point tq\u22121, this is, yq(tq\u22121) = yq\u22121(tq\u22121 \u2212 tq\u22122).\n\n(a) System 1.\nfrom the latent force.\n\nSamples\n\n(b) System 1.\nfrom the output.\n\nSamples\n\n(c) System 2.\nfrom the latent force.\n\nSamples\n\n(d) System 2.\nfrom the output.\n\nSamples\n\nFigure 2: Joint samples of a switching dynamical LFM model with one output, D = 1, and three intervals,\nQ = 3, for two different systems. Dashed lines indicate the presence of switching points. While system 2\nresponds instantaneously to the input force, system 1 delays its reaction due to larger inertia.\n\n4\n\ny1(t\u2212t0)y2(t\u2212t1)y3(t\u2212t2)y1(t0)y1(t1\u2212t0)y2(t1)y2(t2\u2212t1)y3(t2)z(t)t0t1t20246810\u22124\u22123\u22122\u22121012340246810\u221210\u2212505100246810\u22123\u22122\u2212101230246810\u22126\u22124\u221220246\fIn general, we need to compute the covariance kzd,zd(cid:48) (t, t(cid:48)) = cov[zd(t), zd(cid:48)(t(cid:48))] for zd(t) in time\ninterval (tq\u22121, tq) and zd(cid:48)(t(cid:48)) in time interval (tq(cid:48)\u22121, tq(cid:48)). By de\ufb01nition, this covariance follows\n\ncov[zd(t), zd(cid:48)(t(cid:48))] = cov(cid:2)yq\n\nd(cid:48)(t \u2212 tq(cid:48)\u22121))(cid:3).\n\nd(t \u2212 tq\u22121), yq(cid:48)\n\nWe assumme independence between the latent forces uq(t) and independence between the initial\nconditions yIC and the latent forces uq(t).1 With these conditions, it can be shown2 that the covari-\nance function3 for q = q(cid:48) is given as\nd(cid:48)(t(cid:48) \u2212 tq\u22121)kzd, \u02d9zd(cid:48) (tq\u22121, tq\u22121)\nd(cid:48)(t(cid:48) \u2212 tq\u22121)k \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq\u22121)\n(3)\n\nd(cid:48)(t(cid:48) \u2212 tq\u22121)kzd,zd(cid:48) (tq\u22121, tq\u22121) + cq\nd(cid:48)(t(cid:48) \u2212 tq\u22121)k \u02d9zd,zd(cid:48) (tq\u22121, tq\u22121) + eq\n\nd(t \u2212 tq\u22121)cq\ncq\nd(t \u2212 tq\u22121)cq\n+eq\n\nd(t \u2212 tq\u22121)eq\nd(t \u2212 tq\u22121)eq\n\n+kq\n\nfd,fd(cid:48) (t, t(cid:48)),\n\nwhere\nkzd,zd(cid:48) (tq\u22121, tq\u22121) = cov[yq\nk \u02d9zd,zd(cid:48) (tq\u22121, tq\u22121) = cov[ \u02d9yq\n\nd(tq\u22121)yq\nd(tq\u22121)yq\n\nd(cid:48)(tq\u22121)],\nd(cid:48)(tq\u22121)],\nfd,fd(cid:48) (t, t(cid:48)) = cov[f q\nkq\n\nkzd, \u02d9zd(cid:48) (tq\u22121, tq\u22121) = cov[yq\nk \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq\u22121) = cov[ \u02d9yq\n\nd(tq\u22121) \u02d9yq\nd(tq\u22121) \u02d9yq\n\nd(cid:48)(tq\u22121)],\nd(cid:48)(tq\u22121)].\n\nd (t \u2212 tq\u22121)f q\n\nd(cid:48)(t(cid:48) \u2212 tq\u22121)].\n(tq\u22121 \u2212 tq\u22122), yq\u22121\nd(cid:48)\n\nd\n\n(tq\u22121 \u2212 tq\u22122)] and values\nIn expression (3), kzd,zd(cid:48) (tq\u22121, tq\u22121) = cov[yq\u22121\nfor kzd, \u02d9zd(cid:48) (tq\u22121, tq\u22121), k \u02d9zd,zd(cid:48) (tq\u22121, tq\u22121) and k \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq\u22121) can be obtained by similar ex-\nfd,fd(cid:48) (t, t(cid:48)) follows a similar expression that the one for kfd,fd(cid:48) (t, t(cid:48)) in\npressions. The covariance kq\nequation (2), now depending on the covariance kuq\u22121,uq\u22121(t, t(cid:48)). We will assume that the covari-\nances for the latent forces follow the RBF form, with length-scale (cid:96)q.\nWhen q > q(cid:48), we have to take into account the correlation between the initial conditions yq\nd(tq\u22121),\nd(tq\u22121) and the latent force uq(cid:48)\u22121(t(cid:48)). This correlation appears because of the contribution of\n\u02d9yq\nuq(cid:48)\u22121(t(cid:48)) to the generation of the initial conditions, yq\nd(tq\u22121). It can be shown4 that the\ncovariance function cov[zd(t), zd(cid:48)(t(cid:48))] for q > q(cid:48) follows\nd(cid:48)(t(cid:48) \u2212 tq(cid:48)\u22121)kzd, \u02d9zd(cid:48) (tq\u22121, tq(cid:48)\u22121)\nd(t \u2212 tq\u22121)cq(cid:48)\nd(cid:48)(t(cid:48) \u2212 tq(cid:48)\u22121)kzd,zd(cid:48) (tq\u22121, tq(cid:48)\u22121) + cq\ncq\nd(t \u2212 tq\u22121)cq(cid:48)\nd(cid:48)(t(cid:48) \u2212 tq(cid:48)\u22121)k \u02d9zd,zd(cid:48) (tq\u22121, tq(cid:48)\u22121) + eq\nd(cid:48)(t(cid:48) \u2212 tq(cid:48)\u22121)k \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq(cid:48)\u22121)\n+eq\nd(t \u2212 tq\u22121)X 1\nd kq(cid:48)\nd kq(cid:48)\nfd,fd(cid:48) (tq(cid:48)\u22121, t(cid:48)) + cq\nmd,fd(cid:48) (tq(cid:48)\u22121, t(cid:48))\n+cq\nd(t \u2212 tq\u22121)X 3\nd kq(cid:48)\nd kq(cid:48)\nfd,fd(cid:48) (tq(cid:48)\u22121, t(cid:48)) + eq\nmd,fd(cid:48) (tq(cid:48)\u22121, t(cid:48)),\n\nd(tq\u22121), \u02d9yq\nd(t \u2212 tq\u22121)eq(cid:48)\nd(t \u2212 tq\u22121)eq(cid:48)\nd(t \u2212 tq\u22121)X 2\nd(t \u2212 tq\u22121)X 4\n\n+eq\n\n(4)\n\nkzd, \u02d9zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[yq\nk \u02d9zd, \u02d9zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[ \u02d9yq\n\nd(tq\u22121) \u02d9yq(cid:48)\nd(tq\u22121) \u02d9yq(cid:48)\n\nd(cid:48)(tq(cid:48)\u22121)],\nd(cid:48)(tq(cid:48)\u22121)],\n\nwhere\nkzd,zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[yq\nk \u02d9zd,zd(cid:48) (tq\u22121, tq(cid:48)\u22121) = cov[ \u02d9yq\n\nd(tq\u22121)yq(cid:48)\nd(tq\u22121)yq(cid:48)\n\nd(cid:48)(tq(cid:48)\u22121)],\nd(cid:48)(tq(cid:48)\u22121)],\nmd,fd(cid:48) (t, t(cid:48)) = cov[mq\nkq\nd and X 4\n\nd(t \u2212 tq\u22121)f q\nd are functions of the form (cid:80)q\u2212q(cid:48)\n\nd(cid:48)(t(cid:48) \u2212 tq\u22121)],\n(cid:81)q\u2212q(cid:48)\ni=2 xq\u2212i+1\n\nd\n\nd , X 3\n\nd , X 2\nbeing equal to cq\u2212i+1\n\nand X 1\nxq\u2212i+1\nA similar expression to (4) can be obtained for q(cid:48) > q. Examples of these functions for speci\ufb01c\nvalues of q and q(cid:48) and more details are also given in the supplementary material.\n\n, depending on the values of q and q(cid:48).\n\n(tq\u2212i+1 \u2212 tq\u2212i), with\n\nor hq\u2212i+1\n\n, gq\u2212i+1\n\n, eq\u2212i+1\n\nn=2\n\nd\n\nd\n\nd\n\nd\n\nd\n\n4 Related work\nThere has been a recent interest in employing Gaussian processes for detection of change points in\ntime series analysis, an area of study that relates to some extent to our model. Some machine learning\nrelated papers include [3, 4, 9]. [3, 4] deals speci\ufb01cally with how to construct covariance functions\n\n1Derivations of these equations are rather involved. In the supplementary material, section 2, we include a\n\ndetailed description of how to obtain the equations (3) and (4)\n\n2See supplementary material, section 2.2.1.\nd (t \u2212 tq\u22121, uq\u22121) as f q\n3We will write f q\n4See supplementary material, section 2.2.2\n\nd (t \u2212 tq\u22121) for notational simplicity.\n\n5\n\n\fin the presence of change points (see [3], section 4). The authors propose different alternatives\naccording to the type of change point. From these alternatives, the closest ones to our work appear\nin subsections 4.2, 4.3 and 4.4. In subsection 4.2, a mechanism to keep continuity in a covariance\nfunction when there are two regimes described by different GPs, is proposed. The authors call this\ncovariance continuous conditionally independent covariance function. In our switched latent force\nmodel, a more natural option is to use the initial conditions as the way to transit smoothly between\ndifferent regimes. In subsections 4.3 and 4.4, the authors propose covariances that account for a\nsudden change in the input scale and a sudden change in the output scale. Both type of changes\nare automatically included in our model due to the latent force model construction: the changes in\nthe input scale are accounted by the different length-scales of the latent force GP process and the\nchanges in the output scale are accounted by the different sensitivity parameters. Importantly, we\nalso concerned about multiple output systems.\nOn the other hand, [9] proposes an ef\ufb01cient inference procedure for Bayesian Online Change Point\nDetection (BOCPD) in which the underlying predictive model (UPM) is a GP. This reference is less\nconcerned about the particular type of change that is represented by the model: in our application\nscenario, the continuity of the covariance function between two regimes must be assured beforehand.\n\nImplementation\n\n5\nIn this section, we describe additional details on the implementation, i.e., covariance function, hy-\nperparameters, sparse approximations.\nAdditional covariance functions. The covariance functions k \u02d9zd,zd(cid:48) (t, t(cid:48)), kzd, \u02d9zd(cid:48) (t, t(cid:48)) and\nk \u02d9zd, \u02d9zd(cid:48) (t, t(cid:48)) are obtained by taking derivatives of kzd,zd(cid:48) (t, t(cid:48)) with respect to t and t(cid:48) [10].\nEstimation of hyperparameters. Given the number of outputs D and the number of intervals\nQ, we estimate the parameters \u03b8 by maximizing the marginal-likelihood of the joint Gaussian pro-\ncess {zd(t)}D\nn=1, the\nmarginal-likelihood is given as p(z|\u03b8) = N (z|0, Kz,z + \u03a3), where z = [z(cid:62)\nD](cid:62), with\nzd = [zd(t1), . . . , zd(tN )](cid:62), Kz,z is a D \u00d7 D block-partitioned matrix with blocks Kzd,zd(cid:48) . The\nentries in each of these blocks are evaluated using kzd,zd(cid:48) (t, t(cid:48)). Furthermore, kzd,zd(cid:48) (t, t(cid:48)) is com-\nputed using the expressions (3), and (4), according to the relative values of q and q(cid:48).\nEf\ufb01cient approximations Optimizing the marginal likelihood involves the inversion of the ma-\ntrix Kz,z, inversion that grows with complexity O(D3N 3). We use a sparse approximation based\non variational methods presented in [2] as a generalization of [11] for multiple output Gaussian\nprocesses. The approximations establish a lower bound on the marginal likelihood and reduce com-\nputational complexity to O(DN K 2), being K a reduced number of points used to represent u(t).\n\nd=1 using gradient-descent methods. With a set of input points, t = {tn}N\n1 , . . . , z(cid:62)\n\n6 Experimental results\n\nWe now show results with arti\ufb01cial data and data recorded from a robot performing a basic set of\nactions appearing in table tennis.\n\n6.1 Toy example\n\nUsing the model, we generate samples from the GP with covariance function as explained before.\nIn the \ufb01rst experiment, we sample from a model with D = 2, R = 1 and Q = 3, with switching\npoints t0 = \u22121, t1 = 5 and t2 = 12. For the outputs, we have A1 = A2 = 0.1, C1 = 0.4, C2 = 1,\n\u03ba1 = 2, \u03ba2 = 3. We restrict the latent forces to have the same length-scale value (cid:96)0 = (cid:96)1 = (cid:96)2 =\n1e\u22123, but change the values of the sensitivity parameters as S1,1 = 10, S2,1 = 1, S1,2 = 10, S2,2 =\n5, S1,3 = \u221210 and S2,3 = 1, where the \ufb01rst subindex refers to the output d and the second subindex\nrefers to the force in the interval q. In this \ufb01rst experiment, we wanted to show the ability of the\nmodel to detect changes in the sensitivities of the forces, while keeping the length scales equal along\nthe intervals. We sampled 5 times from the model with each output having 500 data points and add\nsome noise with variance equal to ten percent of the variance of each sampled output. In each of the\n\ufb01ve repetitions, we took N = 200 data points for training and the remaining 300 for testing.\n\n6\n\n\fQ = 1\n76.27\u00b135.63\nMSLL \u22120.98\u00b10.46\n7.27\u00b16.88\nMSLL \u22121.79\u00b10.28\n\n1 SMSE\n2 SMSE\n\nQ = 2\n\nQ = 3\nQ = 5\n0.72\u00b10.56\n0.30\u00b10.02\n14.66\u00b111.74\n\u22121.79\u00b10.26\n\u22122.90\u00b10.03 \u22122.87\u00b10.04 \u22122.55\u00b10.41\n1.10\u00b10.05\n1.10\u00b10.09\n1.08\u00b10.05\n\u22122.26\u00b10.02 \u22122.25\u00b10.02 \u22122.27\u00b10.03 \u22122.26\u00b10.06\n\nQ = 4\n0.31\u00b10.03\n1.06\u00b10.05\n\nTable 1: Standarized mean square error (SMSE) and mean standardized log loss (MSLL) using different values\nof Q for both toy examples. The \ufb01gures for the SMSE must be multiplied by 10\u22122. See the text for details.\n(c) Output 2 toy example 1.\n(a) Latent force toy example 1.\n\n(b) Output 1 toy example 1.\n\n(d) Latent force toy example 2.\n\n(e) Output 1 toy example 2.\n\n(f) Output 3 toy example 2.\n\nFigure 4: Mean and two standard deviations for the predictions over the latent force and two of the three outputs\nin the test set. Dashed lines indicate the \ufb01nal value of the swithcing points after optimization. Dots indicate\ntraining data.\n\nOptimization of the hyperparameters (including t1 and t2) is done\nby maximization of the marginal likelihood through scaled conju-\ngate gradient. We train models for Q = 1, 2, 3, 4 and 5 and measure\nthe mean standarized log loss (MSLL) and the mean standarized\nmean square error (SMSE) [8] over the test set for each value of Q.\nTable 1, \ufb01rst two rows, show the corresponding average results over\nthe 5 repetitions together with one standard deviation. Notice that\nfor Q = 3, the model gets by the \ufb01rst time the best performance,\nperformance that repeats again for Q = 4. The SMSE performance\nremains approximately equal for values of Q greater than 3. Fig-\nures 4(a), 4(b) and 4(c) shows the kind of predictions made by the\nmodel for Q = 3.\nWe generate also a different toy example, in which the length-scales of the intervals are different.\nFor the second toy experiment, we assume D = 3, Q = 2 and switching points t0 = \u22122 and\nt1 = 8. The parameters of the outputs are A1 = A2 = A3 = 0.1, C1 = 2, C2 = 3, C3 = 0.5,\n\u03ba1 = 0.4, \u03ba2 = 1, \u03ba3 = 1 and length scales (cid:96)0 = 1e \u2212 3 and (cid:96)1 = 1. Sensitivities in this case are\nS1,1 = 1, S2,1 = 5, S3,1 = 1, S1,2 = 5, S2,2 = 1 and S3,2 = 1. We follow the same evaluation\nsetup as in toy example 1. Table 1, last two rows, show the performance again in terms of MLSS\nand SMSE. We see that for values of Q > 2, the MLSS and SMSE remain similar. In \ufb01gures 4(d),\n4(e) and 4(f), the inferred latent force and the predictions made for two of the three outputs.\n6.2 Segmentation of human movement data for robot imitation learning\nIn this section, we evaluate the feasibility of the model for motion segmentation with possible appli-\ncations in the analysis of human movement data and imitation learning. To do so, we had a human\nteacher take the robot by the hand and have him demonstrate striking movements in a cooperative\ngame of table tennis with another human being as shown in Figure 3. We recorded joint positions,\n\nFigure 3: Data collection was\nperformed using a Barrett WAM\nrobot as haptic input device.\n\n7\n\n051015\u22122\u22121012051015\u22120.2\u22120.100.10.20.30.4051015\u22120.25\u22120.2\u22120.15\u22120.1\u22120.0500.0505101520\u2212101205101520\u22121\u22120.500.511.505101520\u22120.100.10.20.30.40.5\f(a) Log-Likelihood Try 1.\n\n(b) Latent force Try 1.\n\n(c) HR Output Try 1.\n\n(d) Log-Likelihood Try 2.\n\n(e) Latent force Try 2.\n\n(f) SFE Output Try 2.\n\nFigure 5: Employing the switching dynamical LFM model on the human movement data collected as in\nFig.3 leads to plausible segmentations of the demonstrated trajectories. The \ufb01rst row corresponds to the log-\nlikelihood, latent force and one of four outputs for trial one. Second row shows the same quantities for trial two.\nCrosses in the bottom of the \ufb01gure refer to the number of points used for the approximation of the Gaussian\nprocess, in this case K = 50.\n\nangular velocities, and angular acceleration of the robot for two independent trials of the same ta-\nble tennis exercise. For each trial, we selected four output positions and train several models for\ndifferent values of Q, including the latent force model without switches (Q = 1). We evaluate the\nquality of the segmentation in terms of the log-likelihood. Figure 5 shows the log-likelihood, the\ninferred latent force and one output for trial one (\ufb01rst row) and the corresponding quantities for trial\ntwo (second row). Figures 5(a) and 5(d) show peaks for the log-likelihood at Q = 9 for trial one and\nQ = 10 for trial two. As the movement has few gaps and the data has several output dimensions,\nit is hard even for a human being to detect the transitions between movements (unless it is visual-\nized as in a movie). Nevertheless, the model found a maximum for the log-likelihood at the correct\ninstances in time where the human transits between two movements. At these instances the human\nusually reacts due to an external stimulus with a large jerk causing a jump in the forces. As a result,\nwe obtained not only a segmentation of the movement but also a generative model for table tennis\nstriking movements.\n\n7 Conclusion\nWe have introduced a new probabilistic model that develops the latent force modeling framework\nwith switched Gaussian processes. This allows for discontinuities in the latent space of forces. We\nhave shown the application of the model in toy examples and on a real world robot problem, in\nwhich we were interested in \ufb01nding and representing striking movements. Other applications of the\nswitching latent force model that we envisage include modeling human motion capture data using\nthe second order ODE and a \ufb01rst order ODE for modeling of complex circuits in biological networks.\nTo \ufb01nd the order of the model, this is, the number of intervals, we have used cross-validation. Future\nwork includes proposing a less expensive model selection criteria.\n\nAcknowledgments\n\nMA and NL are very grateful for support from a Google Research Award \u201cMechanistically Inspired\nConvolution Processes for Learning\u201d and the EPSRC Grant No EP/F005687/1 \u201cGaussian Processes\nfor Systems Identi\ufb01cation with Applications in Systems Biology\u201d. MA also thanks PASCAL2 Inter-\nnal Visiting Programme. We also thank to three anonymous reviewers for their helpful comments.\n\n8\n\n123456789101112\u22121000\u2212800\u2212600\u2212400\u22122000200400Value of the log\u2212likelihoodNumber of intervals5101520\u22122\u22121012TimeLatent Force5101520\u22123\u22122.5\u22122\u22121.5\u22121\u22120.500.5TimeHR123456789101112\u22121200\u22121000\u2212800\u2212600\u2212400\u22122000200Value of the log\u2212likelihoodNumber of intervals51015\u22122\u2212101234TimeLatent Force51015\u22121\u22120.500.511.522.5TimeSFE\fReferences\n[1] Mauricio \u00b4Alvarez, David Luengo, and Neil D. Lawrence. Latent Force Models. In David van Dyk and\nMax Welling, editors, Proceedings of the Twelfth International Conference on Arti\ufb01cial Intelligence and\nStatistics, pages 9\u201316, Clearwater Beach, Florida, 16-18 April 2009. JMLR W&CP 5.\n\n[2] Mauricio A. \u00b4Alvarez, David Luengo, Michalis K. Titsias, and Neil D. Lawrence. Ef\ufb01cient multioutput\n\nGaussian processes through variational inducing kernels. In JMLR: W&CP 9, pages 25\u201332, 2010.\n\n[3] Roman Garnett, Michael A. Osborne, Steven Reece, Alex Rogers, and Stephen J. Roberts. Sequential\nBayesian prediction in the presence of changepoints and faults. The Computer Journal, 2010. Advance\nAccess published February 1, 2010.\n\n[4] Roman Garnett, Michael A. Osborne, and Stephen J. Roberts. Sequential Bayesian prediction in the pres-\nence of changepoints. In Proceedings of the 26th Annual International Conference on Machine Learning,\npages 345\u2013352, 2009.\n\n[5] Antti Honkela, Charles Girardot, E. Hilary Gustafson, Ya-Hsin Liu, Eileen E. M. Furlong, Neil D.\nLawrence, and Magnus Rattray. Model-based method for transcription factor target identi\ufb01cation with\nlimited data. PNAS, 107(17):7793\u20137798, 2010.\n\n[6] A. Ijspeert, J. Nakanishi, and S. Schaal. Learning attractor landscapes for learning motor primitives. In\n\nAdvances in Neural Information Processing Systems 15, 2003.\n\n[7] T. Oyama, Y. Uno, and S. Hosoe. Analysis of variability of human reaching movements based on the\nsimilarity preservation of arm trajectories. In International Conference on Neural Information Processing\n(ICONIP), pages 923\u2013932, 2007.\n\n[8] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT\n\nPress, Cambridge, MA, 2006.\n\n[9] Yunus Saatc\u00b8i, Ryan Turner, and Carl Edward Rasmussen. Gaussian Process change point models. In\n\nProceedings of the 27th Annual International Conference on Machine Learning, pages 927\u2013934, 2010.\n\n[10] E. Solak, R. Murray-Smith W. E. Leithead, D. J. Leith, and C. E. Rasmussen. Derivative observations in\nGaussian process models of dynamic systems. In Sue Becker, Sebastian Thrun, and Klaus Obermayer,\neditors, NIPS, volume 15, pages 1033\u20131040, Cambridge, MA, 2003. MIT Press.\n\n[11] Michalis K. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In JMLR:\n\nW&CP 5, pages 567\u2013574, 2009.\n\n9\n\n\f", "award": [], "sourceid": 1222, "authors": [{"given_name": "Mauricio", "family_name": "Alvarez", "institution": null}, {"given_name": "Jan", "family_name": "Peters", "institution": null}, {"given_name": "Neil", "family_name": "Lawrence", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}