{"title": "Linking Motor Learning to Function Approximation: Learning in an Unlearnable Force Field", "book": "Advances in Neural Information Processing Systems", "page_first": 197, "page_last": 203, "abstract": null, "full_text": "Linking motor learning to function\napproximation: Learning in an\nunlearnable force field\nOpher Donchin and Reza Shadmehr\n\nDept. of Biomedical Engineering\nJohns Hopkins University, Baltimore, MD 21205\nEmail: opher@bme.jhu.edu, reza@bme.jhu.edu\nAbstract\n\nReaching movements require the brain to generate motor com-\nmands that rely on an internal model of the task's dynamics. Here\nwe consider the errors that subjects make early in their reaching\ntrajectories to various targets as they learn an internal model. Us-\ning a framework from function approximation, we argue that the\nsequence of errors should reflect the process of gradient descent. If\nso, then the sequence of errors should obey hidden state transitions\nof a simple dynamical system. Fitting the system to human data,\nwe find a surprisingly good fit accounting for 98% of the variance.\nThis allows us to draw tentative conclusions about the basis ele-\nments used by the brain in transforming sensory space to motor\ncommands. To test the robustness of the results, we estimate the\nshape of the basis elements under two conditions: in a traditional\nlearning paradigm with a consistent force field, and in a random\nsequence of force fields where learning is not possible. Remarkably,\nwe find that the basis remains invariant.\n1 Introduction\n\nIt appears that in constructing the motor commands to guide the arm toward a\ntarget, the brain relies on an internal model (IM) of the dynamics of the task that\nit learns through practice [1]. The IM is presumably a system that transforms\na desired limb trajectory in sensory coordinates to motor commands. The motor\ncommands in turn create the complex activation of muscles necessary to cause\naction. A major issue in motor control is to infer characteristics of the IM from the\nactions of subjects.\nRecently, we took a first step toward mathematically characterizing the IM's rep-\nresentation in the brain [2]. We analyzed the sequence of errors made by subjects\non successive movements as they reached to targets while holding a robotic arm.\nThe robot produced a force field and subjects learned to compensate for the field\n(presumably by constructing an IM) and eventually produced straight movements\nwithin the field. Our analysis sought to draw conclusions about the structure of\nthe IM from the sequence of errors generated by the subjects. For instance, in a\n\f\nvelocity-dependent force field (such as the fields we use), the IM must be able to\nencode velocity in order to anticipate the upcoming force. We hoped that the e#ect\nof errors in one direction on subsequent movements in other directions would give\ninformation about the width of the elements which the IM used in encoding velocity.\nFor example, if the basis elements were narrow, then movements in a given direction\nwould result in little or no change in performance in neighboring directions. Wide\nbasis elements would mean appropriately larger e#ects.\nWe hypothesized that an estimate of the width of the basis elements could be cal-\nculated by fitting the time sequence of errors to a set of equations representing\na dynamical system. The dynamical system assumed that error in a movement\nresulted from a di#erence between the IM's approximation and the actual environ-\nment, an assumption that has recently been corroborated [3]. The error in turn\nchanged the IM, a#ecting subsequent movements:\n\n#\n\ny\n\n(n)\n\n= D k (n) F\n\n(n)\n\n-\n\nz\n\n(n)\n\nk\n\n(n)\n\nz\n\n(n+1)\n\nl = z\n\n(n)\n\nl +B l,k\n\n(n) y\n\n(n)\n\nl = 1,\n\n \n\n, 8\n(1)\nHere y\n\n(n)\n\nis the error on the nth movement, made in direction k\n\n(n)\n\n(8 possible\ndirections); F\n\n(n)\n\nis the actual force experienced in the movement, and it is scaled\nby an arm compliance D which is direction dependent; and z\n\n(n)\n\nk is the current output\nof the IM in the direction k. The di#erence between this output and reality results\nin movement errors. B is a matrix characterizing the e#ect of errors in one direction\non other directions. That is, B can provide the generalization function we sought.\nBy comparing the B produced by a fit to human data to the Bs produced from\nsimulated data (generated using a dynamical simulation of arm movements), we\nfound that the time sequence of the subjects' errors was similar to that generated\nby a simulation that represented the IM with gaussian basis elements that encoded\nvelocity with a # = 0.08 m/sec.\nBut why might this dynamical system be a good model of trial-to-trial behavior in a\nlearning paradigm? Here we demonstrate that, under reasonable assumptions, be-\nhavior in accordance with Eq. 1 can be derived within the framework of functional\napproximation, and that B is closely related to the basis functions in the approx-\nimation process. We find that this model gives accurate fits to human data, even\nwhen the number of parameters in the model is drastically reduced. Finally, we\ntest the prediction of Eq. 1 that learning involves simple movement-by-movement\ncorrections to the IM, and that these variations depend only on the shape of the\nbasis which the IM uses for representation. Remarkably, when subjects perform\nmovements in a force field that changes randomly from one movement to the next,\nthe pattern of errors predicts a generalization function, and therefore a set of basis\nelements, indistinguishable from the condition where the force field does not change.\nThat is, ``an unlearnable task is learned in exactly the same way as a learnable task.''\n\n2 Approach\n2.1 The Learning Process\n\nIn the current task, subjects grip the handle of a robot and make 10cm reaching\nmovements to targets presented visually. The robot produces a force field F( x) pro-\nportional and perpendicular to the velocity of the hand, such as F = (0 13;\n\n-13\n\n0) x\n\n(with F in Newtons and x in m/s). To simulate the process of learning an IM,\nwe assume that the IM uses scalar valued basis functions that encode velocity\n\ng = [g 1 ( x), . . . , g n ( x)]\n\nT\n\nso that the IM's expectation of force at a desired veloc-\nity is: F( x) = Wg( x), where W is a 2\n\n\n\nn matrix [4]. To move the hand to a\n\f\ntarget at direction k, a desired trajectory x k (t) is given as input to the IM, which\nin turn produces as output\n\n\nF( x k ) [5, 6]. As a result, forces are experienced F(t)\n\nso that a force error can be calculated as\n\n\nF(t) = F(t)\n\n-\n\n\nF( x k (t)). We adjust W in\nthe direction that minimizes a cost function e which is simply the magnitude of the\nforce error integrated over the entire movement:\ne =\n1\n2\n\n#\n\nT\n0\n\n F(t)\n\nT\n\n F(t) dt =\n1\n2\n\n#\n\nT\n0\n\n(F(t)\n\n-Wg(t))\n\nT\n\n(F(t)\n\n-Wg(t))\n\ndt\nChanging W to minimize this value requires that we calculate the gradient of e with\nrespect to the weights and move W in the direction opposite to the gradient:\n(#e) W ij\n\n=\n\n#e\n#W ij\n\n=\n\n-\n\n#\n\nT\n0\n\ng j (t) F i (t) dt\nW\n\n(n+1)\n\n= W\n\n(n)\n\n+ #\n\n#\n\nT\nt=0\n\n\nF\n\n(n)\n\n(t)g( x k (n) (t))\n\nT\n\ndt (2)\nwhere W\n\n(n)\n\nmeans the W matrix on the nth movement.\n\n2.2 Deriving the Dynamical System\n\nOur next step is to represent learning not in terms of weight changes, but in terms\nof changes in IM output, F. We do this for an arbitrary point in velocity space x 0\n\nby multiplying both sides of the Eq. 2 by g( x 0 ) with the result that:\n\n F\n\n(n+1)\n\n( x 0 ) = F\n\n(n)\n\n( x 0 ) + #\n\n#\n\nT\nt=0\n\n#\n\ng( x k\n\n(n) )\n\nT\n\ng( x 0 )\n\n#\n\n F\n\n(n)\n\ndt (3)\nFurther simplification will require approximation. Because we are considering a case\nwhere the actual force, F( x), is directly proportional to velocity, it is reasonable to\nmake the approximation that, along a reasonably straight desired trajectory, the\nforce error, F(t), is simply proportional to the velocity, F( x k\n\n(n) ) = F\n\n\n\n x k\n\n(n) . This\nmeans that the integral of Eq. 3 is actually of the form\n\n\nF\n\n#\n\nT\nt=0\n\n x k (n) (t)g( x k (n) )\n\nT\n\ng( x 0 ) dt (4)\nOne more assumption is required to make this tractable. If we approximate the\ndesired trajectory with a triangular function of time, and integrate only over the\nraising phase of the velocity curve (because the values are the same going up and\ngoing down) we can simplify the integral to an integral over speed, drawing out a\nconstant (2K\n\n#\n\n x= xk (250ms)\n x=0 G( x, x 0 ) d x). The integral has become a function of the\nvalues of x k (n) (250ms) and x 0 . Calling this function B, Eq. 4 becomes\n\n F\n\n(n+1)\n\n( x 0 ) = F\n\n(n)\n\n( x 0 ) +B( x k\n\n(n) , x 0 ) F\n\n(n)\n\n(5)\n\n x 0 is arbitrary. We restrict our attention to only x 0 that equals the peak velocity of\nthe desired trajectory associated with a movement direction l. Since we have only\neight di#erent points in velocity space to consider, F can be considered an eight-\nvalued vector, F l rather than a function F( x). Similarly, B( x l , x k ) will become an\n8x8 matrix, B l,k . The simpler notation allows us to write Eq. 5 as\n F\n\n(m+1)\n\nl\n\n= F\n\n(n)\n\nl\n\n+B l,k\n\n(n) F\n\n(n)\n\nl = 1, . . . , 8 (6)\n\f\n6 N\n\n9 N\n\n12 N\n3 cm\n\nFigure 1: We performed simulations\nto test the approximation that displace-\nment in arm motion at 250 msec toward\na target at 10 cm is proportional to er-\nror in the force estimate made by the IM.\nA system of equations describing a con-\ntroller, dynamics of a typical human arm,\nand robot dynamics [7] were simulated for\na 500 msec min jerk motion to 8 targets.\nThe simulated robot produced one of 8\nforce fields scaled to 3 di#erent magni-\ntudes, while the controller remained nave\nto the field. The errors in hand motion at\n250 msec were fitted to the robot forces\nusing a single compliance matrix. Lighter\ndashed lines are the displacement pre-\ndictions of the model, darker solid lines\nare the actual displacement in the simu-\nlations' movement.\nOne more approximation is to assume that force error F in a given movement will\nbe proportional to position error in that movement when both are evaluated at\n250ms. This approximation is justified by the data presented in Fig. 1 which shows\nthat the linear relationship holds for a wide range of movements and force errors.\nFinally, because the forces are perpendicular to the movement, we will disregard\nthe error parallel to the direction of movement, reducing Eq. 6 to a scalar equation.\nWe are now in a position to write our system of equations in its final form:\n\n#\n\ny\n\n(n)\n\n= D k (n) (F\n\n(n)\n\n-\n\n\n\nF\n\n(n)\n\nk\n\n(n) )\n\n\nF\n\n(m+1)\n\nl =\n\n\nF\n\n(n)\n\nl +B l,k (n) \n\nF\n\n(n)\n\nl = 1, . . . , 8\n(7)\nNote that this is a system of nine equations: a single movement causes a change in\nall 8 directions for which the IM has an expectation. Let us now introduce a new\nvariable z\n\n(n)\n\nk (n)\n\n#\n\nD k\n\n(n) F\n\n(n)\n\nk (n) , which represents the error (perpendicular displacement)\nthat would have been experienced during this movement if we had not compensated\nfor the expected field. With this substitution, Eq. 7 reduces to Eq. 1.\n2.3 The shape of the generalization function B\n\nOur task now is to give subjects a sequence of targets, observe the errors in their\nmovements, and ask whether there are parameters for which the system of Eq. 7\ngives a good fit. Given a sequence of N movement directions, forces imposed on each\nmovement, and the resulting errors ({k, F, y}\n\n(n)\n\n, j=1, . . . , N ), we search for values\nof B l,k , D k and initial conditions ( F\n\n(0)\n\nm , m=1, . . . , 8) that minimize the squared\ndi#erence, summed over the movements, between the y calculated in Eq. 7 and the\nmeasured errors. One concern is that, in fitting a model with 80 parameters (64\nfrom the B matrix, 8 from D, and 8 from F\n\n(0)\n\n), we are likely to be overfitting our\ndata. We address this concern by making the assumption that the B matrix has a\nspecial shape: B l,k = b(\n\n#\n\n x l x k ). That is, each entry in the B matrix is determined\naccording to the di#erence in angle between the two directions represented. This\nassumption implies that g( x k )\n\nT\n\ng( x l ) depends only on # x k x l . This reduces the B\n\nmatrix to 8 parameters, and reduces the number of parameters in the model to 24.\n\f\n0 100\n-20\n0\n20\nError\n(mm)\n\ns = 04 m/s\n0 100\n08 m/s\n0 100\n12 m/s\n0 100\n20 m/s\n0 100\nSubjects\n-180 -90 0 90 180\n-0.5\n0\n0.5\n1\nDifference in angle\n\nNormalized\n\nSimulated Bs\n0.04\n0.08\n0.12\n0.20\n-180 -90 0 90 180\n-0.5\n0\n0.5\n1\nDifference in angle\n\nNormalized\n\nComparison to subjects\n0.08\nSubjects\nFigure 2: We simulated a system of equations representing dynamics of robot, human\narm, and adaptive controller for movements to a total of 192 targets spanning 8 directions\nof movement. The adaptive controller learned by applying gradient descent (# = 0.002)\nto learn a gaussian basis encoding arm velocity with a # of 0.04, 0.08, 0.12, or 0.20 m/s.\nErrors, computed as displacement perpendicular to direction of target were measured at\n250 msec and are plotted for one direction of movement (45 deg) (a - d). Simulated data\nis the solid line and the fit is shown as a dashed line. Circles indicate error on no field\ntrials and triangles indicate error on fielded trials. The data for all 192 targets were then\nfit to Eq. 7 and the generalization matrix B was estimated (f ). Data was also collected\nfrom 76 subjects, and fit with the model (e), and it gave a generalization function that is\nnearly identicals to the generalization function of a controller using gaussians with a width\nof 0.08 m/s (g).\n3 Results\n\nWe first tested the validity of our approach in an artificial learning system that\nused a simulation of human arm and robot dynamics to learn an IM of the imposed\nforce field with gaussian basis elements. The result was a sequence of errors to a\nseries of targets. We fit Eq. 7 to the sequence of errors and found an estimate for\nthe generalization function (Fig. 2). As expected, when narrow basis elements are\nused, the generalization function is narrow. We next fit the same model to data\nthat had been collected from 76 subjects and again found an excellent fit.\nPlots f and g in Fig. 2 show the generalization function, B, as a function of the angle\nbetween x k and x l . The demonstrate that errors in one direction a#ect movements\nin other directions both in simulations errors and in the subjects' errors. The\ngreatest e#ect of error is in the direction in which the movement was made. The\nimmediately neighboring directions are also significantly a#ected but the e#ect drops\no# with increasing distance. The generalization function which matched the human\ndata was nearly identical to the one matching data produced by the simulation\nwhose gaussians had # = 0.08 m/sec.\nThe most interesting aspect of the success we had using the simple system in equa-\ntion 7 to explain human behavior is that the global learning process is being charac-\n\f\n0 200 400\n-20\n-10\n0\n10\n20\nError\n(mm)\n\nConsistent Field\nMvmt Num\n200 250 300 350\n-20\n-10\n0\n10\n20\nFit to Consistent Field\nError\n(mm)\nMvmt Num\nData\nFit\n0 200 400\n-20\n-10\n0\n10\n20\nRandom Field\nMvmt Num\n200 250 300 350\n-20\n-10\n0\n10\n20\nFit to Random Field\nMvmt Num\n-180 -90 0 90 180\n-0.5\n0\n0.5\n1\nDifference in angle\nB Matrix\nLearn 1\nLearn 2\nRand 1\nRand 2\nFigure 3: Fitting the model in Eq. 7 to a learning situation (a and c, 76 subjects) or\na situation where subjects are presented with a random sequence of fields (b and d, 6\nsubjects) produce nearly identical models. a and b show errors (binned to 5 movements\nper data point), measured as perpendicular distance from a straight line trajectory at\n250ms into the movement. Triangles are field A (F = [0 13;\n\n-13\n\n0]\n\n\n\n x) movements , wedges\nare field B (F = [0\n\n-\n\n13; 13 0]\n\n\n\n x), and filled circles are no field. The data is split into three\nsets of 192 movements. It can be seen that subjects in the learning paradigm learn to\ncounteract the field, and show after a#ects. Subjects in the random field do not improve\non either field, and do not show after a#ects. c and d show that the model fit both the\nlearning paradigm and the random field paradigm. The fit is plotted for movements made\nto 90 # during the first 192 movements following first exposure to the field (movements 193\nthrough 384 in a and b). r\n\n2\n\nfor the fits is 0.96 and 0.97 respectively. Fits to the last 192\nmovements in each paradigm gave r\n\n2\n\nof 0.96 and 0.98. Finally, in the bottom plot, we\ncompare the generalization function, B, given by each fit. The normalized generalization\nfunction is nearly identical for the all four sets. The size of the central peak is 0.21 for\nboth sets of the consistent field and 0.19 and 0.14, respectively, for the two sets of the\nrandom field.\n\f\nterized as the accretion of small changes in the state of the controller accumulated\nover a large number of movements. In order to challenge this surprising aspect of\nthe model, we decided to apply it to data in which human subjects performed move-\nments in fields that varied randomly from trial to trial. In this case, no cumulative\nlearning is possible. The important question is whether the model will still be able\nto fit the data. If it does fit the data, then the question is whether the parameters\nof the fit are similar to those derived from the learning paradigm.\nFig. 3 is a comparison of fitting a model to a consistent field and a random field.\nAs seen in a and b of the figure, subjects are able to improve their performance\nthrough learning in a consistent field but they do not improve in the random field.\nHowever, as shown in in c and d, the model is able to fit the performance in both\nfields. Although the fits of each type of field were performed independently, we can\nsee in e that the B matrixes are nearly identical which indicates that trial-by-trial\nlearning was the same for both types of fields. In the second set of the random\nparadigm, it seems as though the adjustment of state may slower. This raises\nthe possibility that the process of movement-by-movement adjustment of state is\ngradually abandoned when it consistently fails to produce improvement. It is likely\nthat in this case subjects come to rely on a feedback driven controller which would\nbe unable to compensate for the errors generated early in the movement but would\nallow them to more quickly adjust to those errors as information about the field\nthey are moving through is processed.\n\n4 Conclusions\n\nWe hypothesized that the process of learning an internal model of the arm's dy-\nnamics may be similar to mechanisms of gradient descent in the framework of ap-\nproximation theory. If so, then errors experienced in a given movement should\na#ect subsequent movements in a meaningful way, and perhaps as simply as those\npredicted by the dynamical system in Eq. 7. These equations appear to fit both\nsimulations and actual human data exceedingly well, making strong predictions\nabout the shape of the basis with which the IM is apparently learned. Here we find\nthat the shape of the basis remains invariant despite radical changes in pattern of\nerrors, as exhibited when subjects were exposed to a random field as compared to\na stationary field. We conclude that even when the task is unlearnable and errors\napproximate a flat line, the brain is attempting to learn with the same characteristic\nbasis which is used when the task is simple and errors exponentially approach zero.\n\nReferences\n\n[1] R. Shadmehr and F. A. Mussa-Ivaldi. Adaptive representation of dynamics during\nlearning of a motor task. J. Neurosci., 14(5 Pt 2):3208--3224, 1994.\n[2] K. Thoroughman and R. Shadmehr. Learning of action through adaptive combination\nof motor primitives. Nature, 407(6805):742--747, 2000.\n[3] R. A. Scheidt, J. B. Dingwell, and F. A. Mussa-Ivaldi. Learning to move amid uncer-\ntainty. The Journal of Neurophysiology, 86(2):971--985, 2001.\n[4] R. M. Sanner and M. Kosha. A mathematical model of the adaptive control of human\narm motions. Biol. Cybern., 80(5):369--382, 1999.\n[5] C. G. Atkeson. Learning arm kinematics and dynamics. Annu. Rev. Neurosci., 12:157--\n183, 1989.\n[6] Y. Uno, M. Kawato, and R. Suzuki. Formation and control of optimal trajectory\nin human multijoint arm movement. minimum torque-change model. Biol. Cybern.,\n\n61(2):89--101, 1989.\n[7] R. Shadmehr and H. H. Holcomb. Neural correlates of motor memory consolidation.\n\nScience, 277(5327):821--825, 1997.\n\f\n", "award": [], "sourceid": 1966, "authors": [{"given_name": "O.", "family_name": "Donchin", "institution": null}, {"given_name": "Reza", "family_name": "Shadmehr", "institution": null}]}