{"title": "Local Gaussian Process Regression for Real Time Online Model Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1193, "page_last": 1200, "abstract": "Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian Process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other nonparametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and nu-SVR.", "full_text": "Local Gaussian Process Regression\n\nfor Real Time Online Model Learning and Control\n\nDuy Nguyen-Tuong\n\nJan Peters Matthias Seeger\n\nMax Planck Institute for Biological Cybernetics\nSpemannstra\u00dfe 38, 72076 T\u00a8ubingen, Germany\n\n{duy,jan.peters,matthias.seeger}@tuebingen.mpg.de\n\nAbstract\n\nLearning in real-time applications, e.g., online approximation of the inverse dy-\nnamics model for model-based robot control, requires fast online regression tech-\nniques. Inspired by local learning, we propose a method to speed up standard\nGaussian process regression (GPR) with local GP models (LGP). The training\ndata is partitioned in local regions, for each an individual GP model is trained.\nThe prediction for a query point is performed by weighted estimation using nearby\nlocal models. Unlike other GP approximations, such as mixtures of experts, we\nuse a distance based measure for partitioning of the data and weighted prediction.\nThe proposed method achieves online learning and prediction in real-time. Com-\nparisons with other non-parametric regression methods show that LGP has higher\naccuracy than LWPR and close to the performance of standard GPR and \u03bd-SVR.\n\n1\n\nIntroduction\n\nPrecise models of technical systems can be crucial in technical applications. Especially in robot\ntracking control, only a well-estimated inverse dynamics model can allow both high accuracy and\ncompliant control. For complex robots such as humanoids or light-weight arms, it is often hard to\nmodel the system suf\ufb01ciently well and, thus, modern regression methods offer a viable alternative\n[7,8]. For most real-time applications, online model learning poses a dif\ufb01cult regression problem due\nto three constraints, i.e., \ufb01rstly, the learning and prediction process should be very fast (e.g., learning\nneeds to take place at a speed of 20-200Hz and prediction at 200Hz to a 1000Hz). Secondly, the\nlearning system needs to be capable at dealing with large amounts of data (i.e., with data arriving\nat 200Hz, less than ten minutes of runtime will result in more than a million data points). And,\nthirdly, the data arrives as a continuous stream, thus, the model has to be continuously adapted to\nnew training examples over time.\nThese problems have been addressed by real-time learning methods such as locally weighted pro-\njection regression (LWPR) [7,8]. Here, the true function is approximated with local linear functions\ncovering the relevant state-space and online learning became computationally feasible due to low\ncomputational demands of the local projection regression which can be performed in real-time. The\nmajor drawback of LWPR is the required manual tuning of many highly data-dependent metaparam-\neters [15]. Furthermore, for complex data, large numbers of linear models are necessary in order to\nachieve a competitive approximation.\nA powerful alternative for accurate function approximation in high-dimensional space is Gaussian\nprocess regression (GPR) [1]. Since the hyperparameters of a GP model can be adjusted by maxi-\nmizing the marginal likelihood, GPR requires little effort and is easy and \ufb02exible to use. However,\nthe main limitation of GPR is that the computational complexity scales cubically with the training\nexamples n. This drawback prevents GPR from applications which need large amounts of training\ndata and require fast computation, e.g., online learning of inverse dynamics model for model-based\n\n1\n\n\frobot control. Many attempts have been made to alleviate this problem, for example, (i) sparse\nGaussian process (SGP) [2], and (ii) mixture of experts (ME) [3, 4]. In SGP, the training data is\napproximated by a smaller set of so-called inducing inputs [2,5]. Here, the dif\ufb01culty is to choose an\nappropriate set of inducing inputs, essentially replacing the full data set [2]. In contrast to SGP, ME\ndivide the input space in smaller subspaces by a gating network, within which a Gaussian process\nexpert, i.e., Gaussian local model, is trained [4, 6]. The computational cost is then signi\ufb01cantly re-\nduced due to much smaller number of training examples within a local model. The ME performance\ndepends largely on the way of partitioning the training data and the choice of an optimal number of\nlocal models for a particular data set [4].\nIn this paper, we combine the basic idea behind both approaches, i.e., LWPR and GPR, attempting\nto get as close as possible to the speed of local learning while having a comparable accuracy to\nGaussian process regression. This results in an approach inspired by [6, 8] using many local GPs in\norder to obtain a signi\ufb01cant reduction of the computational cost during both prediction and learning\nstep allowing the application to online learning. For partitioning the training data, we use a dis-\ntance based measure, where the corresponding hyperparameters are optimized by maximizing the\nmarginal likelihood.\nThe remainder of the paper is organized as follows: \ufb01rst, we give a short review of standard GPR in\nSection 2. Subsequently, we describe our local Gaussian process models (LGP) approach in Section\n3 and discuss how it inherits the advantages of both GPR and LWPR. Furthermore, the learning\naccuracy and performance of our LGP approach will be compared with other important standard\nmethods in Section 4, e.g., LWPR [8], standard GPR [1], sparse online Gaussian process regression\n(OGP) [5] and \u03bd-support vector regression (\u03bd-SVR) [11], respectively. Finally, our LGP method is\nevaluated for an online learning of the inverse dynamics models of real robots for accurate tracking\ncontrol in Section 5. Here, the online learning is demonstrated by rank-one update of the local GP\nmodels [9]. The tracking task is performed in real-time using model-based control [10]. To our best\nknowledge, it is the \ufb01rst time that GPR is successfully used for high-speed online model learning\nin real time control on a physical robot. We present the results on a version of the Barrett WAM\nshowing that with the online learned model using LGP the tracking accuracy is superior compared\nto state-of-the art model-based methods [10] while remaining fully compliant.\n\nby y \u223c N(cid:0)\n\n2 Regression with standard GPR\nGiven a set of n training data points {xi, yi}n\ni=1, we would like to learn a function f(xi) trans-\nforming the input vector xi into the target value yi given by yi = f(xi)+\u0001i , where \u0001i is Gaussian\nnoise with zero mean and variance \u03c32\nn [1]. As a result, the observed targets can also be described\n(cid:18)\n, where K(X, X) denotes the covariance matrix. As covariance\nfunction, a Gaussian kernel is frequently used [1]\n\u22121\n2\n\n(cid:19)\n(xp\u2212xq)T W(xp\u2212xq)\n\n0, K(X, X) + \u03c32\nnI\n\n(cid:1)\n\nk (xp, xq)= \u03c32\n\nsexp\n\n(1)\n\n,\n\ns denotes the signal variance and W are the widths of the Gaussian kernel. The joint\n\nwhere \u03c32\ndistribution of the observed target values and predicted value for a query point x\u2217 is given by\n\n(cid:20)\n\n(cid:21)\n\n(cid:18)\n\n(cid:21)(cid:19)\n\ny\n\nf(x\u2217)\n\n\u223c N\n\n0,\n\nK(X, X) + \u03c32\n\nk(x\u2217, X)\n\nnI k(X, x\u2217)\nk(x\u2217, x\u2217)\n\n.\n\n(2)\n\n(cid:20)\n\n(cid:0)\n\nThe conditional distribution yields the predicted mean value f(x\u2217) with the corresponding variance\nV (x\u2217) [1]\n\n(cid:1)\u22121 y = kT\u2217 \u03b1 ,\n(cid:1)\u22121 k\u2217 ,\n(cid:0)\n\nK + \u03c32\nnI\n\nK + \u03c32\nf(x\u2217) = kT\u2217\nnI\nV (x\u2217) = k(x\u2217, x\u2217) \u2212 kT\u2217\n\n(3)\n\nwith k\u2217 = k(X, x\u2217), K = K(X, X) and \u03b1 denotes the so-called prediction vector. The hyperpa-\nrameters of a Gaussian process with Gaussian kernel are \u03b8 = [\u03c32\nf , W] and their optimal value\nfor a particular data set can be derived by maximizing the log marginal likelihood using common\noptimization procedures, e.g., Quasi-Newton methods [1].\n\nn, \u03c32\n\n2\n\n\fInput: query data point x, M .\nDetermine M local models next to x.\nfor k = 1 to M do\n\nCompute distance to the k-th local model:\nwk =exp(\u22120.5(x \u2212 ck)T W(x \u2212 ck))\nCompute local mean using the k-th local\nmodel:\n\n\u00afyk = kT\n\nk \u03b1k\n\nend for\nCompute weighted prediction using M local\nmodels:\n\n(cid:80)M\n\n(cid:80)M\n\n\u02c6y =\n\nk=1 wk \u00afyk/\n\nk=1 wk .\n\nAlgorithm 2: Prediction for a query point.\n(b) Barrett WAM\n(a) SARCOS arm\n\nInput: new data point {x, y}.\nfor k =1 to number of local models do\n\nCompute distance to the k-th local model:\nwk =exp(\u22120.5(x \u2212 ck)T W(x \u2212 ck))\n\nend for\nTake the nearest local model:\n\nv = max(wk)\nif v > wgen then\nInsert {x, y} to nearest local model:\n\nXnew =[X, x]\nynew =[y, y]\ncnew = mean(Xnew)\n\nUpdate corresponding center:\n\nCompute inverse covariance matrix and\nprediction vector of local model:\nKnew = K(Xnew, Xnew)\n\u03b1new = (Knew + \u03c32I)\u22121ynew\n\nelse\n\nend if\n\nCreate new model:\n\nck+1 =x,\nXk+1 =[x], yk+1 =[y]\n\nInitialization new inverse covariance ma-\ntrix and new prediction vector.\n\nAlgorithm 1: Partitioning of training data and\nmodel learning.\n\nFigure 1: Robot arms used for data generation\nand evaluation.\n\n3 Approximation using Local GP Models\n\nnI)\u22121 which\nThe major limitation of GPR is the expensive computation of the inverse matrix (K+ \u03c32\nyields a cost of O(n3). Reducing this computational cost, we cluster the training data in local\nregions and, subsequently, train the corresponding GP models on these local clusters. The mean\nprediction for a query point is then made by weighted prediction using the nearby local models\nin the neighborhood. Thus, the algorithm consists out of two stages: (i) localization of data, i.e.,\nallocation of new input points and learning of corresponding local models, (ii) prediction for a query\npoint.\n\n(cid:18)\n\n\u22121\n2\n\n3.1 Partitioning and Training of Local Models\nClustering input data is ef\ufb01ciently performed by considering a distance measure of the input point x\nto the centers of all local models. The distance measure wk is given by the kernel used to learn the\nlocal GP models, e.g., Gaussian kernel\nwk = exp\n\n(cid:19)\n(x \u2212 ck)T W (x \u2212 ck)\n\n(4)\nwhere ck denotes the center of the k-th local model and W a diagonal matrix represented the kernel\nwidth. It should be noted, that we use the same kernel width for computing wk as well as for training\nof all local GP models as given in Section 2. The kernel width W is obtained by maximizing the\nlog likelihood on a subset of the whole training data points. For doing so, we subsample the training\ndata and, subsequently, perform an optimization procedure.\nDuring the localization process, a new model with center ck+1 is created, if all distance measures wk\nfall below a limit value wgen. The new data point x is then set as new center ck+1. Thus, the number\nof local models is allowed to increase as the trajectories become more complex. Otherwise, if a new\npoint is assigned to a particular k-th model, the center ck is updated as mean of corresponding local\n\n,\n\n3\n\n\fdata points. With the new assigned input point, the inverse covariance matrix of the corresponding\nlocal model can be updated. The localization procedure is summarized in Algorithm 1.\nThe main computational cost of this algorithm is O(N 3) for inverting the local covariance matrix,\nwhere N presents the number of data points in a local model. Furthermore, we can control the\ncomplexity by limiting the number of data points in a local model. Since the number of local data\npoints increases continuously over time, we can adhere to comply with this limit by deleting old data\npoint as new ones are included. Insertion and deletion of data points can be decided by evaluating\nthe information gain of the operation. The cost for inverting the local covariance matrix can be\nfurther reduced, as we need only to update the full inverse matrix once it is computed. The update\ncan be ef\ufb01ciently performed in a stable manner using rank-one update [9] which has a complexity\nof O(N 2).\n\n(cid:80)M\n\n3.2 Prediction using Local Models\nThe prediction for a mean value \u02c6y is performed using weighted averaging over M local predic-\ntions \u00afyk for a query point x [8]. The weighted prediction \u02c6y is then given by \u02c6y = E{\u00afyk|x} =\n\u00afykp(k|x). According to the Bayesian theorem, the probability of the model k given x can be\nexpressed as p(k|x)= p(k, x)/\n\n(cid:80)M\n\n(cid:80)M\n\nk=1 p(k, x)= wk/\n\nk=1 wk. Hence, we have\n\nk=1\n\n(cid:80)M\n(cid:80)M\n\nk=1 wk \u00afyk\nk=1 wk\n\n\u02c6y =\n\n.\n\n(5)\nThe probability p(k|x) can be interpreted as a normalized distance of the query point x to the\nlocal model k where the measure metric wk is used as given in Equation (4). Thus, each local\nprediction \u00afyk, determined using Equation (3), is additionally weighted by the distance wk between\nthe corresponding center ck and the query point x. The search for M local models can be quickly\ndone by evaluating the distances between the query point x and all model centers ck. The prediction\nprocedure is summarized in Algorithm 2.\n\n4 Learning Inverse Dynamics\n\nWe have evaluated our algorithm using high-dimensional robot data taken from real robots, e.g.,\nthe 7 degree-of-freedom (DoF) anthropomorphic SARCOS master arm and 7-DoF Barrett whole\narm manipulator shown in Figure 1, as well as a physically realistic SL simulation [12]. We com-\npare the learning performance of LGP with the state-of-the-art in non-parametric regression, e.g.,\nLWPR, \u03bd-SVR, OGP and standard GPR in the context of approximating inverse robot dynamics.\nFor evaluating \u03bd-SVR and GPR, we have employed the libraries [13] and [14].\n\n4.1 Dynamics Learning Accuracy Comparison\n\nFor the comparison of the accuracy of our method in the setting of learning inverse dynamics, we\nuse three data sets, (i) SL simulation data (SARCOS model) as described in [15] (14094 training\npoints, 5560 test points), (ii) data from the SARCOS master arm (13622 training points, 5500 test\npoints) [8], (iii) a data set generated from our Barrett arm (13572 training points, 5000 test points).\nGiven samples x=[q, \u02d9q, \u00a8q] as input, where q, \u02d9q, \u00a8q denote the joint angles, velocity and acceleration,\nand using the corresponding joint torques y = [u] as targets, we have a proper regression problem.\nFor the considered 7 degrees of freedom robot arms, we, thus, have data with 21 input dimensions\n(for each joint, we have an angle, a velocity and an acceleration) and 7 targets (a torque for each\njoint). We learn the robot dynamics model in this 21-dim space for each DoF separately employing\nLWPR, \u03bd-SVR, GPR, OGP and LGP, respectively.\nPartitioning of the training examples for LGP can be performed either in the same input space (where\nthe model is learned) or in another space which has to be physically consistent with the approximated\nfunction. In the following, we localize the data depending on the position of the robot. Thus, the\npartitioning of training data is performed in a 7-dim space (7 joint angles). After determining wk\nfor all k local models in the partitioning space, the input point will be assigned to the nearest local\nmodel, i.e., the local model with the maximal value of distance measure wk.\n\n4\n\n\f(a) Approximation Error us-\ning SL data (SARCOS model)\n\n(b) Approximation Error us-\ning SARCOS data\n\n(c) Approximation Error us-\ning Barrett WAM data\n\nFigure 2: Approximation error as nMSE for each DoF. The error is computed after prediction on\nthe test sets with simulated data from SL Sarcos-model, real robot data from Barrett and SARCOS\nmaster arm, respectively. In most cases, LGP outperforms LWPR and OGP in learning accuracy\nwhile being competitive to \u03bd-SVR and standard GPR. It should be noted that the nMSE depends on\nthe target variances. Due to smaller variances in the Barrett data, the corresponding nMSE has also\na larger scale compared to SARCOS.\n\nFigure 2 shows the normalized mean squared error (nMSE) of the evaluation on the test set for each\nof the three evaluated scenarios, i.e., the simulated SARCOS arm in (a), the real SARCOS arm in\n(b) and the Barrett arm in (c). Here, the normalized mean squared error is de\ufb01ned as nMSE =\nMean squared error/Variance of target. During the prediction on the test set using LGP, we take the\nmost activated local models, i.e., the ones which are next to the query point.\n\nIt should be noted that the choice of the limit\nvalue wgen during the partitioning step is cru-\ncial for the performance of LGP and, unfortu-\nnately, is an open parameter.\nIf wgen is too\nsmall, a lot of local models will be generated\nwith small number of training points. It turns\nout that these small local models do not per-\nform well in generalization for unknown data.\nIf wgen is large, the local models become also\nlarge which increase the computational com-\nplexity. Here, the training data are clustered\nin about 30 local regions ensuring that each lo-\ncal model has a suf\ufb01cient amount of data points\nfor high accuracy (in practice, roughly a hun-\ndred data points for each local model suf\ufb01ce)\nwhile having suf\ufb01ciently few that the solution\nremains feasible in real-time (on our current\nhardware, a Core Duo at 2GHz, that means less\nthan 1000 data points). On average, each lo-\ncal model has approximately 500 training ex-\namples. This small number of training inputs\nenables a fast training for each local model, i.e.,\nthe matrix inversion. For estimating the hyper-\nparameters using likelihood optimization, we\nsubsample the training data which results in a\n\nFigure 3: Average time in millisecond needed for\nprediction of 1 query point. The computation time\nis plotted logarithmic in respect of the number of\ntraining examples. The time as stated above is the\nrequired time for prediction of all 7 DoF. Here,\nLWPR presents the fastest method due to simple\nregression models. Compared to global regression\nmethods such as standard GPR and \u03bd-SVR, local\nGP makes signi\ufb01cant improvement in term of pre-\ndiction time.\n\nsubset of about 1000 data points.\nConsidering the approximation error on the test set shown in Figure 2(a-c), it can be seen that\nLGP generalizes well using only few local models for prediction. In all cases, LGP outperforms\nLWPR and OGP while being close in learning accuracy to global methods such as GPR and \u03bd-\nSVR. The mean-prediction for GPR is determined according to Equation (3) where we precomputed\n\n5\n\n123456700.010.020.030.040.05Degree of FreedomnMSE LWPROGP\u03bd\u2212SVRGPRLGP123456700.010.020.030.040.050.060.07Degree of FreedomnMSE LWPROGP\u03bd\u2212SVRGPRLGP123456700.050.10.150.20.25Degree of FreedomnMSE LWPROGP\u03bd\u2212SVRGPRLGP0500010000150001234567Nr. of Training PointsPrediction Time [ms] (log. Scale) LWPR\u03bd\u2212SVRGPRLGP\fthe prediction vector \u03b1 from training data. When a query point appears, the kernel vector kT\u2217 is\nevaluated for this particular point. The operation of mean-prediction has then the order of O(n) for\nstandard GPR (similarly, for \u03bd-SVR) and O(N M) for LGP, where n denotes the total number of\ntraining points, M number of local models and N number of data points in a local model.\n\n4.2 Comparison of Computation Speed for Prediction\n\nBeside the reduction of training time (i.e., matrix inversion), the prediction time is also reduced\nsigni\ufb01cantly compared to GPR and \u03bd-SVR due to the fact that only a small amount of local models\nin the vicinity of the query point are needed during prediction for LGP. Thus, the prediction time\ncan be controlled by the number of local models. A large number of local models may provide a\nsmooth prediction but on the other hand increases the time complexity.\nThe comparison of prediction speed is shown in Figure 3. Here, we train LWPR, \u03bd-SVR, GPR\nand LGP on 5 different data sets with increasing training examples (1065, 3726, 7452, 10646 and\n14904 data points, respectively). Subsequently, using the trained models we compute the average\ntime needed to make a prediction for a query point for all 7 DoF. For LGP, we take a limited number\nof local models in the vicinity for prediction, e.g., M = 3. Since our control system requires a\nminimal prediction rate at 100 Hz (10 ms) in order to ensure system stability, data sets with more\nthan 15000 points are not applicable for standard GPR or \u03bd-SVR due to high computation demands\nfor prediction.\nThe results show that the computation time requirements of \u03bd-SVR and GPR rises very fast with\nthe size of training data set as expected. LWPR remains the best method in terms of computational\ncomplexity only increasing at a very low speed. However, as shown in Figure 3, the cost for LGP is\nsigni\ufb01cantly lower than the one \u03bd-SVR and GPR and increases at a much lower rate. In practice, we\ncan also curb the computation demands of single models by deleting old data points, if a new ones are\nassigned to the model. As approach to deleting and inserting data points, we can use the information\ngain of the corresponding local model as a principled measure. It can be seen from the results that\nLGP represents a compromise between learning accuracy and computational complexity. For large\ndata sets (e.g., more than 5000 training examples), LGP reduces the prediction cost considerably\nwhile keeping a good learning performance.\n\n5 Application in Model-based Robot Control\n\nIn this section, \ufb01rst, we use the inverse dynam-\nics models learned in Section 4.1 for a model-\nbased tracking control task [10] in the setting\nshown in Figure 4. Here, the learned model\nof the robot is applied for an online predic-\ntion of the feedforward torques uFF given the\ndesired trajectory [qd, \u02d9qd, \u00a8qd]. Subsequently,\nthe model approximated by LGP is used for\nan online learning performance. Demonstrat-\ning the online learning, the local GP models are\nadapted in real-time using rank-one update.\nAs shown in Figure 4, the controller command\nu consists of the feedforward part uFF and the\nfeedback part uFB = Kpe + Kv \u02d9e, where e =\nqd\u2212 q denotes the tracking error and Kp, Kv\nposition-gain and velocity-gain, respectively.\nDuring the control experiment we set the gains\nto very low values taking the aim of compliant\ncontrol into account. As a result, the learned\nmodel has a stronger effect on computing the predicted torque uFF and, hence, a better learning\nperformance of each method results in a lower tracking error.\nFor comparison with the learned models, we also compute the feedforward torque using rigid-body\n(RB) formulation which is a common approach in robot control [10]. The control task is performed\n\nFigure 4: Schematic showing model-based robot\ncontrol. The learned dynamics model can be up-\ndated online using LGP.\n\n6\n\nLocal GPRobot\u00a8qd\u02d9qdqdKvKp!!!+++\u2212\u2212++uq\u02d9q!\f(a) Tracking Error on Barrett without on-\nline learning\n\n(b) Tracking Error after LGP online\nlearning on Barrett\n\nFigure 5: (a) Tracking error as RMSE on test trajectory for each DoF with Barrett WAM. (b) Track-\ning error after online learning with LGP. The model uncertainty is reduced with online learning\nusing LGP. With online learning, LGP is able to outperform of\ufb02ine learned models using standard\nGPR for test trajectories.\n\nin real-time on the Barrett WAM, as shown in Figure 1. As desired trajectory, we generate a test\ntrajectory which is similar to the one used for learning the inverse dynamics models in Section 4.1.\nFigure 5 (a) shows the tracking errors on test trajectory for 7 DoFs, where the error is computed as\nroot mean squared error (RMSE). Here, LGP provides a competitive control performance compared\nto GPR while being superior to LWPR and the state-of-the art rigid-body model. It can be seen that\nfor several DoFs the tracking errors are large, for example 5., 6. and 7. DoF. The reason is that for\nthese DoFs the unknown nonlinearities are time-dependent, e.g., gear drive for 7. DoF, which can\nnot be approximated well using just one of\ufb02ine learned model. Since it is not possible to learn the\ncomplete state space using a single data set, online learning is necessary.\n\n5.1 Online Learning of Inverse Dynamics Models with LGP\n\nThe ability of online adaptation of the learned inverse dynamics models with LGP is shown by the\nrank-one update of the local models which has a complexity of O(n2) [9]. Since the number of\ntraining examples in each local model is limited (500 points in average), the update procedure is fast\nenough for real-time application. For online learning the models are updated as shown in Figure 4.\nFor doing so, we regularly sample the joint torques u and the corresponding robot trajectories\n[q, \u02d9q, \u00a8q] online. For the time being, as a new point is inserted we randomly delete another data\npoint from the local model if the maximal number of data point is reached. The process of insertion\nand deletion of data points can be further improved by considering the information gain (and infor-\nmation lost) of the operation. Figure 5 (b) shows the tracking error after online learning with LGP.\nIt can be seen that the errors for each DoF are signi\ufb01cantly reduced with online LGP compared to\nthe ones with of\ufb02ine learned models. With online learning, LGP is also able to outperform standard\nGPR.\n\n6 Conclusion\n\nWe combine with LGP the fast computation of local regression with more accurate regression meth-\nods while having little tuning efforts. LGP achieves higher learning accuracy compared to locally\nlinear methods such as LWPR while having less computational cost compared to GPR and \u03bd-SVR.\nThe reducing cost allows LGP for model online learning which is necessary in oder to generalize\nthe model for all trajectories. Model-based tracking control using online learned model achieves su-\nperior control performance compared to the state-of-the-art method as well as of\ufb02ine learned model\nfor unknown trajectories.\n\n7\n\n123456700.020.040.060.080.10.12Degree of FreedomRMSE RBDLWPR\u03bd\u2212SVRGPRLGP offline123456700.010.020.030.040.050.060.07Degree of FreedomRMSE LGP offlineGPRLGP online\fReferences\n[1] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine Learning. Mas-\n\nsachusetts Institute of Technology: MIT-Press, 2006.\n\n[2] J. Q. Candela and C. E. Rasmussen, \u201cA unifying view of sparse approximate gaussian process\n\nregression,\u201d Journal of Machine Learning Research, 2005.\n\n[3] V. Treps, \u201cMixtures of gaussian process,\u201d Advances in Neural Information Processing Systems,\n\n2001.\n\n[4] C. E. Rasmussen and Z. Ghahramani, \u201cIn\ufb01nite mixtures of gaussian process experts,\u201d Advances\n\nin Neural Information Processing Systems, 2002.\n\n[5] L. Csato and M. Opper, \u201cSparse online gaussian processes,\u201d Neural Computation, 2002.\n[6] E. Snelson and Z. Ghahramani, \u201cLocal and global sparse gaussian process approximations,\u201d\n\nArti\ufb01cial Intelligence and Statistics, 2007.\n\n[7] S. Schaal, C. G. Atkeson, and S. Vijayakumar, \u201cScalable techniques from nonparameteric\n\nstatistics for real-time robot learning,\u201d Applied Intelligence, pp. 49\u201360, 2002.\n\n[8] S. Vijayakumar, A. D\u2019Souza, and S. Schaal, \u201cIncremental online learning in high dimensions,\u201d\n\nNeural Computation, 2005.\n\n[9] M. Seeger, \u201cLow rank update for the cholesky decomposition,\u201d Tech. Rep., 2007. [Online].\n\nAvailable: http://www.kyb.tuebingen.mpg.de/bs/people/seeger/\n\n[10] J. J. Craig, Introduction to Robotics: Mechanics and Control, 3rd ed. Prentice Hall, 2004.\n[11] B. Sch\u00a8olkopf and A. Smola, Learning with Kernels: Support Vector Machines, Regularization,\n\nOptimization and Beyond. Cambridge, MA: MIT-Press, 2002.\n\n[12] S. Schaal, \u201cThe SL simulation and real-time control software package,\u201d Tech. Rep., 2006.\n\n[Online]. Available: http://www-clmc.usc.edu/publications/S/schaal-TRSL.pdf\n\n[13] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001,\n\n[14] M.\n\nSeeger,\n\nhttp://www.csie.ntu.edu.tw/ cjlin/libsvm.\nToolbox\n\nAdaptive\nhttp://www.kyb.tuebingen.mpg.de/bs/people/seeger/lhotse/.\n\nLHOTSE:\n\nfor\n\nStatistical Model,\n\n2007,\n\n[15] D. Nguyen-Tuong, J. Peters, and M. Seeger, \u201cComputed torque control with nonparametric\nregression models,\u201d Proceedings of the 2008 American Control Conference (ACC 2008), 2008.\n\n8\n\n\f", "award": [], "sourceid": 236, "authors": [{"given_name": "Duy", "family_name": "Nguyen-tuong", "institution": null}, {"given_name": "Jan", "family_name": "Peters", "institution": null}, {"given_name": "Matthias", "family_name": "Seeger", "institution": null}]}