{"title": "A forward model at Purkinje cell synapses facilitates cerebellar anticipatory control", "book": "Advances in Neural Information Processing Systems", "page_first": 3828, "page_last": 3836, "abstract": "How does our motor system solve the problem of anticipatory control in spite of a wide spectrum of response dynamics from different musculo-skeletal systems, transport delays as well as response latencies throughout the central nervous system? To a great extent, our highly-skilled motor responses are a result of a reactive feedback system, originating in the brain-stem and spinal cord, combined with a feed-forward anticipatory system, that is adaptively fine-tuned by sensory experience and originates in the cerebellum. Based on that interaction we design the counterfactual predictive control (CFPC) architecture, an anticipatory adaptive motor control scheme in which a feed-forward module, based on the cerebellum, steers an error feedback controller with counterfactual error signals. Those are signals that trigger reactions as actual errors would, but that do not code for any current of forthcoming errors. In order to determine the optimal learning strategy, we derive a novel learning rule for the feed-forward module that involves an eligibility trace and operates at the synaptic level. In particular, our eligibility trace provides a mechanism beyond co-incidence detection in that it convolves a history of prior synaptic inputs with error signals. In the context of cerebellar physiology, this solution implies that Purkinje cell synapses should generate eligibility traces using a forward model of the system being controlled. From an engineering perspective, CFPC provides a general-purpose anticipatory control architecture equipped with a learning rule that exploits the full dynamics of the closed-loop system.", "full_text": "A Forward Model at Purkinje Cell Synapses\nFacilitates Cerebellar Anticipatory Control\n\nIvan Herreros-Alonso\n\nSPECS lab\n\nXerxes D. Arsiwalla\n\nSPECS lab\n\nUniversitat Pompeu Fabra\n\nUniversitat Pompeu Fabra\n\nBarcelona, Spain\n\nivan.herreros@upf.edu\n\nBarcelona, Spain\n\nPaul F.M.J. Verschure\n\nSPECS, UPF\n\nCatalan Institution of Research\nand Advanced Studies (ICREA)\n\nBarcelona, Spain\n\nAbstract\n\nHow does our motor system solve the problem of anticipatory control in spite\nof a wide spectrum of response dynamics from different musculo-skeletal sys-\ntems, transport delays as well as response latencies throughout the central nervous\nsystem? To a great extent, our highly-skilled motor responses are a result of a\nreactive feedback system, originating in the brain-stem and spinal cord, combined\nwith a feed-forward anticipatory system, that is adaptively \ufb01ne-tuned by sensory\nexperience and originates in the cerebellum. Based on that interaction we design\nthe counterfactual predictive control (CFPC) architecture, an anticipatory adaptive\nmotor control scheme in which a feed-forward module, based on the cerebellum,\nsteers an error feedback controller with counterfactual error signals. Those are\nsignals that trigger reactions as actual errors would, but that do not code for any cur-\nrent or forthcoming errors. In order to determine the optimal learning strategy, we\nderive a novel learning rule for the feed-forward module that involves an eligibility\ntrace and operates at the synaptic level. In particular, our eligibility trace provides\na mechanism beyond co-incidence detection in that it convolves a history of prior\nsynaptic inputs with error signals. In the context of cerebellar physiology, this\nsolution implies that Purkinje cell synapses should generate eligibility traces using\na forward model of the system being controlled. From an engineering perspective,\nCFPC provides a general-purpose anticipatory control architecture equipped with a\nlearning rule that exploits the full dynamics of the closed-loop system.\n\n1\n\nIntroduction\n\nLearning and anticipation are central features of cerebellar computation and function (Bastian, 2006):\nthe cerebellum learns from experience and is able to anticipate events, thereby complementing a\nreactive feedback control by an anticipatory feed-forward one (Hofstoetter et al., 2002; Herreros\nand Verschure, 2013). This interpretation is based on a series of anticipatory motor behaviors that\noriginate in the cerebellum. For instance, anticipation is a crucial component of acquired behavior in\neye-blink conditioning (Gormezano et al., 1983), a trial by trial learning protocol where an initially\nneutral stimulus such as a tone or a light (the conditioning stimulus, CS) is followed, after a \ufb01xed\ndelay, by a noxious one, such as an air puff to the eye (the unconditioned stimulus, US). During early\ntrials, a protective unconditioned response (UR), a blink, occurs re\ufb02exively in a feedback manner\nfollowing the US. After training though, a well-timed anticipatory blink (the conditioned response,\nCR) precedes the US. Thus, learning results in the (partial) transference from an initial feedback\naction to an anticipatory (or predictive) feed-forward one. Similar responses occur during anticipatory\npostural adjustments, which are postural changes that precede voluntary motor movements, such\nas raising an arm while standing (Massion, 1992). The goal of these anticipatory adjustments is to\ncounteract the postural and equilibrium disturbances that voluntary movements introduce. These\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fbehaviors can be seen as feedback reactions to events that after learning have been transferred to\nfeed-forward actions anticipating the predicted events.\nAnticipatory feed-forward control can yield high performance gains over feedback control whenever\nthe feedback loop exhibits transmission (or transport) delays (Jordan, 1996). However, even if a\nplant has negligible transmission delays, it may still have sizable inertial latencies. For example,\nif we apply a force to a visco-elastic plant, its peak velocity will be achieved after a certain delay;\ni.e. the velocity itself will lag the force. An ef\ufb01cient way to counteract this lag will be to apply\nforces anticipating changes in the desired velocity. That is, anticipation can be bene\ufb01cial even when\none can act instantaneously on the plant. Given that, here we address two questions: what is the\noptimal strategy to learn anticipatory actions in a cerebellar-based architecture? and how could it be\nimplemented in the cerebellum?\nTo answer that we design the counterfactual predictive control (CFPC) scheme, a cerebellar-based\nadaptive-anticipatory control architecture that learns to anticipate performance errors from experience.\nThe CFPC scheme is motivated from neuro-anatomy and physiology of eye-blink conditioning.\nIt includes a reactive controller, which is an output-error feedback controller that models brain\nstem re\ufb02exes actuating on eyelid muscles, and a feed-forward adaptive component that models the\ncerebellum and learns to associate its inputs with the error signals driving the reactive controller.\nWith CFPC we propose a generic scheme in which a feed-forward module enhances the performance\nof a reactive error feedback controller steering it with signals that facilitate anticipation, namely,\nwith counterfactual errors. However, within CFPC, even if these counterfactual errors that enable\npredictive control are learned based on past errors in behavior, they do not re\ufb02ect any current or\nforthcoming error in the ongoing behavior.\nIn addition to eye-blink conditioning and postural adjustments, the interaction between reactive\nand cerebellar-dependent acquired anticipatory behavior has also been studied in paradigms such\nas visually-guided smooth pursuit eye movements (Lisberger, 1987). All these paradigms can be\nabstracted as tasks in which the same predictive stimuli and disturbance or reference signal are\nrepeatedly experienced. In accordance to that, we operate our control scheme in trial-by-trial (batch)\nmode. With that, we derive a learning rule for anticipatory control that modi\ufb01es the well-known\nleast-mean-squares/Widrow-Hoff rule with an eligibility trace. More speci\ufb01cally, our model predicts\nthat to facilitate learning, parallel \ufb01bers to Purkinje cell synapses implement a forward model that\ngenerates an eligibility trace. Finally, to stress that CFPC is not speci\ufb01c to eye-blink conditioning, we\ndemonstrate its application with a smooth pursuit task.\n\n2 Methods\n\n2.1 Cerebellar Model\n\nFigure 1: Anatomical scheme of a Cerebellar Purkinje cell. The xj denote parallel \ufb01ber inputs to\nPurkinje synapses (in red) with weights wj. o denotes the output of the Purkinje cell. The error signal\ne, through the climbing \ufb01bers (in green), modulates synaptic weights.\n\nWe follow the simplifying approach of modeling the cerebellum as a linear adaptive \ufb01lter, while\nfocusing on computations at the level of the Purkinje cells, which are the main output cells of the\ncerebellar cortex (Fujita, 1982; Dean et al., 2010). Over the mossy \ufb01bers, the cerebellum receives\na wide range of inputs. Those inputs reach Purkinke cells via parallel \ufb01bers (Fig. 1), that cross\n\n2\n\nxj x1 xN oew1wjwN\fdendritic trees of Purkinje cells in a ratio of up to 1.5 \u00d7 106 parallel \ufb01ber synapses per cell (Eccles\net al., 1967). We denote the signal carried by a particular \ufb01ber as xj, j \u2208 [1, G], with G equal to the\ntotal number of inputs \ufb01bers. These inputs from the mossy/parallel \ufb01ber pathway carry contextual\ninformation (interoceptive or exteroceptive) that allows the Purkinje cell to generate a functional\noutput. We refer to these inputs as cortical bases, indicating that they are localized at the cerebellar\ncortex and that they provide a repertoire of states and inputs that the cerebellum combines to generate\nits output o. As we will develop a discrete time analysis of the system, we use n to indicate time (or\ntime-step). The output of the cerebellum at any time point n results from a weighted sum of those\ncortical bases. wj indicates the weight or synaptic ef\ufb01cacy associated with the \ufb01ber j. Thus, we\n(cid:124)\n(where the transpose, (cid:124), indicates\nhave x[n] = [x1[n], . . . , xG[n]]\nthat x[n] and w[n] are column vectors) containing the set of inputs and synaptic weights at time n,\nrespectively, which determine the output of the cerebellum according to\n\n(cid:124)\nand w[n] = [w1[n], . . . , wG[n]]\n\no[n] = x[n]\n\n(cid:124)\n\nw[n]\n\n(1)\n\nThe adaptive feed-forward control of the cerebellum stems from updating the weights according to a\nrule of the form\n\n\u2206wj[n + 1] = f (xj[n], . . . , xj[1], e[n], \u0398)\n\n(2)\nwhere \u0398 denotes global parameters of the learning rule; xj[n], . . . , xj[1], the history of its pre-\nsynaptic inputs of synapse j; and e[n], an error signal that is the same for all synapses, corresponding\nto the difference between the desired, r, and the actual output, y, of the controlled plant. Note that in\ndrawing an analogy with the eye-blink conditioning paradigm, we use the simplifying convention\nof considering the noxious stimulus (the air-puff) as a reference, r, that indicates that the eyelids\nshould close; the closure of the eyelid as the output of the plant, y; and the sensory response to the\nnoxious stimulus as an error, e, that encodes the difference between the desired, r, and the actual\neyelid closures, y. Given this, we advance a new learning rule, f, that achieves optimal performance\nin the context of eye-blink conditioning and other cerebellar learning paradigms.\n\n2.2 Cerebellar Control Architecture\n\nFigure 2: Neuroanatomy of eye-blink conditioning and the CFPC architecture. Left: Mapping of\nsignals to anatomical structures in eye-blink conditioning (De Zeeuw and Yeo, 2005); regular arrows\nindicate external inputs and outputs, arrows with inverted heads indicate neural pathways. Right:\nCFPC architecture. Note that the feedback controller, C, and the feed-forward module, F F , belong\nto the control architecture, while the plant, P , denotes an object controlled. Other abbreviations: r,\nreference signal; y, plant\u2019s output; e, output error; x, basis signals; o, feed-forward signal; and u,\nmotor command.\n\nWe embed the adaptive \ufb01lter cerebellar module in a layered control architecture, namely the CFPC\narchitecture, based on the interaction between brain stem motor nuclei driving motor re\ufb02exes and\nthe cerebellum, such as the one established between the cerebellar microcircuit responsible for\nconditioned responses and the brain stem re\ufb02ex circuitry that produces unconditioned eye-blinks\n(Hesslow and Yeo, 2002) (Fig. 2 left). Note that in our interpretation of this anatomy we assume\nthat cerebellar output, o, feeds the lower re\ufb02ex controller (Fig. 2 right). Put in control theory terms,\nwithin the CFPC scheme an adaptive feed-forward layer supplements a negative feedback controller\nsteering it with feed-forward signals.\n\n3\n\n+-US(airpu(cid:31))[r]Eyelids(Blink)[P][y]Facialnucleus[C]Trigeminalnucleus[e][e]CS(Context, e.g.: sound, light)[u]Cerebellum (cortex and nuclei) and Inferior olive [FF][x]Pons[o]FFxoreCuPy+-+ADAPTIVE(cid:31)ANTICIPATORY(cid:30)FEED(cid:31)FORWARD(cid:29) LAYER REACTIVE (cid:30)FEEDBACK(cid:29) LAYERFEEDBACK CLOSED(cid:31)LOOP SYSTEM\fOur architecture uses a single-input single-output negative-feedback controller. The controller\nreceives as input the output error e = r \u2212 y. For the derivation of the learning algorithm, we assume\nthat both plant and controller are linear and time-invariant (LTI) systems. Importantly, the feedback\ncontroller and the plant form a reactive closed-loop system, that mathematically can be seen as a\nsystem that maps the reference, r, into the plant\u2019s output, y. A feed-forward layer that contains the\nabove-mentioned cerebellar model provides the negative feedback controller with an additional input\nsignal, o. We refer to o as a counter-factual error signal, since although it mechanistically drives the\nnegative feedback controller analogously to an error signal it is not an actual error. The counterfactual\nerror is generated by the feed-forward module that receives an output error, e, as its teaching signal.\nNotably, from the point of view of the reactive layer closed-loop system, o can also be interpreted as\na signal that offsets r. In other words, even if r remains the reference that sets the target of behavior,\nr + o functions as the effective reference that drives the closed-loop system.\n\n3 Results\n\n3.1 Derivation of the gradient descent update rule for the cerebellar control architecture\n\nWe apply the CFPC architecture de\ufb01ned in the previous section to a task that consists in following\na \ufb01nite reference signal r \u2208 RN that is repeated trial-by-trial. To analyze this system, we use the\ndiscrete time formalism and assume that all components are linear time-invariant (LTI). Given this,\nboth reactive controller and plant can be lumped together into a closed-loop dynamical system, that\ncan be described with the dynamics A, input B, measurement C and feed-through D matrices. In\ngeneral, these matrices describe how the state of a dynamical system autonomously evolves with\ntime, A; how inputs affect system states, B; how states are mapped into outputs, C; and how inputs\ninstantaneously affect the system\u2019s output D (Astrom and Murray, 2012). As we consider a reference\nof a \ufb01nite length N, we can construct the N-by-N transfer matrix T as follows (Boyd, 2008)\n\n\uf8ee\uf8ef\uf8ef\uf8ef\uf8ef\uf8f0\n\nT =\n\nD\nCB\nCAB\n\n0\nD\nCB\n\n0\n0\nD\n\n...\n\n...\nCAN\u22122B CAN\u22123B CAN\u22124B . . . D\n\n...\n\n...\n\n\uf8f9\uf8fa\uf8fa\uf8fa\uf8fa\uf8fb\n\n0\n0\n0\n\n. . .\n. . .\n. . .\n...\n\nWith this transfer matrix we can map any given reference r into an output yr using yr = T r, obtaining\nwhat would have been the complete output trajectory of the plant on an entirely feedback-driven trial.\nNote that the \ufb01rst column of T contains the impulse response curve of the closed-loop system, while\nthe rest of the columns are obtained shifting that impulse response down. Therefore, we can build\nthe transfer matrix T either in a model-based manner, deriving the state-space characterization of\nthe closed-loop system, or in measurement-based manner, measuring the impulse response curve.\nAdditionally, note that (I \u2212 T )r yields the error of the feedback control in following the reference, a\nsignal which we denote with e0.\nLet o \u2208 RN be the entire feed-forward signal for a given trial. Given commutativity, we can consider\nthat from the point of view of the closed-loop system o is added directly to the reference r, (Fig. 2\nright). In that case, we can use y = T (r + o) to obtain the output of the closed-loop system\nwhen it is driven by both the reference and the feed-forward signal. The feed-forward module only\noutputs linear combinations of a set of bases. Let X \u2208 RN\u00d7G be a matrix with the content of the\nG bases during all the N time steps of a trial. The feed-forward signal becomes o = Xw, where\nw \u2208 RG contains the mixing weights. Hence, the output of the plant given a particular w becomes\ny = T (r + Xw).\nWe implement learning as the process of adjusting the weights w of the feed-forward module in a\ntrial-by-trial manner. At each trial the same reference signal, r, and bases, X, are repeated. Through\nlearning we want to converge to the optimal weight vector w\u2217 de\ufb01ned as\n(r \u2212 T (r + Xw))\n\n(r \u2212 T (r + Xw))\n\nw\u2217 = arg min\n\nc(w) = arg min\n\ne = arg min\n\n(3)\n\n(cid:124)\n\n(cid:124)\ne\n\n1\n2\n\n1\n2\n\nw\n\nw\n\nw\n\nwhere c indicates the objective function to minimize, namely the L2 norm or sum of squared errors.\nWith the substitution \u02dcX = T X and using e0 = (I \u2212 T )r, the minimization problem can be cast as a\n\n4\n\n\fcanonical linear least-squares problem:\n\nw\u2217 = arg min\n\nw\n\n1\n2\n\n(cid:124)\n(e0 \u2212 \u02dcXw)\n\n(e0 \u2212 \u02dcXw)\n\n(4)\n\nOne the one hand, this allows to directly \ufb01nd the least squares solution for w\u2217, that is, w\u2217 = \u02dcX\u2020e0,\nwhere \u2020 denotes the Moore-Penrose pseudo-inverse. On the other hand, and more interestingly, with\nw[k] being the weights at trial k and having e[k] = e0 \u2212 \u02dcXw[k], we can obtain the gradient of the\nerror function at trial k with relation to w as follows:\n\n\u2207wc = \u2212 \u02dcX\n\n(cid:124)\n\ne[k] = \u2212X\n\n(cid:124)T (cid:124)\n\ne[k]\n\nThus, setting \u03b7 as a properly scaled learning rate (the only global parameter \u0398 of the rule), we can\nderive the following gradient descent strategy for the update of the weights between trials:\n\nw[k + 1] = w[k] + \u03b7X\n\n(cid:124)T (cid:124)\n\ne[k]\n\n(5)\n\nThis solves for the learning rule f in eq. 2. Note that f is consistent with both the cerebellar anatomy\n(Fig. 2left) and the control architecture (Fig. 2right) in that the feed-forward module/cerebellum only\nrequires two signals to update its weights/synaptic ef\ufb01cacies: the basis inputs, X, and error signal, e.\n3.2 T (cid:124) facilitates a synaptic eligibility trace\nThe standard least mean squares (LMS) rule (also known as Widrow-Hoff or decorrelation learning\nrule) can be represented in its batch version as w[k + 1] = w[k] + \u03b7X\ne[k]. Hence, the only\ndifference between the batch LMS rule and the one we have derived is the insertion of the matrix\nfactor T (cid:124). Now we will show how this factor acts as a \ufb01lter that computes an eligibility trace at each\nweight/synapse. Note that the update of a single weight, according Eq. 5 becomes\n\n(cid:124)\n\n(6)\nwhere xj contains the sequence of values of the cortical basis j during the entire trial. This can be\nrewritten as\n\ne[k]\n\n(cid:124)\njT (cid:124)\nwj[k + 1] = wj[k] + \u03b7x\n\nwith hj \u2261 T xj. The above inner product can be expressed as a sum of scalar products\n\n(cid:124)\nwj[k + 1] = wj[k] + \u03b7h\nj e[k]\n\n(7)\n\nN(cid:88)\n\nwj[k + 1] = wj[k] + \u03b7\n\nhj[n]e[k, n]\n\n(8)\n\nn=1\n\nwhere n indexes the within trial time-step. Note that e[k] in Eq. 7 refers to the whole error signal\nat trial k whereas e[k, n] in Eq. 8 refers to the error value in the n-th time-step of the trial k. It is\nnow clear that each hj[n] weighs how much an error arriving at time n should modify the weight\nwj, which is precisely the role of an eligibility trace. Note that since T contains in its columns/rows\nshifted repetitions of the impulse response curve of the closed-loop system, the eligibility trace codes\nat any time n, the convolution of the sequence of previous inputs with the impulse-response curve of\nthe reactive layer closed-loop. Indeed, in each synapse, the eligibility trace is generated by a forward\nmodel of the closed-loop system that is exclusively driven by the basis signal.\nConsequently, our main result is that by deriving a gradient descent algorithm for the CFPC cerebellar\ncontrol architecture we have obtained an exact de\ufb01nition of the suitable eligibility trace. That\nde\ufb01nition guarantees that the set of weights/synaptic ef\ufb01cacies are updated in a locally optimal\nmanner in the weights\u2019 space.\n\n3.3 On-line gradient descent algorithm\n\nThe trial-by-trial formulation above allowed for a straightforward derivation of the (batch) gradient\ndescent algorithm. As it lumped together all computations occurring in a same trial, it accounted for\ntime within the trial implicitly rather than explicitly: one-dimensional time-signals were mapped onto\npoints in a high-dimensional space. However, after having established the gradient descent algorithm,\nwe can implement the same rule in an on-line manner, dropping the repetitiveness assumption inherent\nto trial-by-trial learning and performing all computations locally in time. Each weight/synapse must\n\n5\n\n\fhave a process associated to it that outputs the eligibility trace. That process passes the incoming\n(unweighted) basis signal through a (forward) model of the closed-loop as follows:\n\nsj[n + 1] = Asj[n] + Bxj[n]\nhj[n] = Csj[n] + Dxj[n]\n\nwhere matrices A, B, C and D refer to the closed-loop system (they are the same matrices that we\nused to de\ufb01ne the transfer matrix T ), and sj[n] is the state vector of the forward model of the synapse\nj at time-step n. In practice, each \u201csynaptic\u201d forward model computes what would have been the\neffect of having driven the closed-loop system with each basis signal alone. Given the superposition\nprinciple, the outcome of that computation can also be interpreted as saying that hj[n] indicates what\nwould have been the displacement over the current output of the plant, y[n], achieved feeding the\nclosed-loop system with the basis signal xj. The process of weight update is completed as follows:\n(9)\nAt each time step n, the error signal e[n] is multiplied by the current value of the eligibility trace\nhj[n], scaled by the learning rate \u03b7, and subtracted to the current weight wj[n]. Therefore whereas\nthe contribution of each basis to the output of the adaptive \ufb01lter depends only on its current value and\nweight, the change in weight depends on the current and past values passed through a forward model\nof the closed-loop dynamics.\n\nwj[n + 1] = wj[n] + \u03b7hj[n]e[n]\n\n3.4 Simulation of a visually-guided smooth pursuit task\n\nWe demonstrate the CFPC approach in an example of a visual smooth pursuit task in which the\neyes have to track a target moving on a screen. Even though the simulation does not capture all the\ncomplexity of a smooth pursuit task, it illustrates our anticipatory control strategy. We model the\nplant (eye and ocular muscles) with a two-dimensional linear \ufb01lter that maps motor commands into\nangular positions. Our model is an extension of the model in (Porrill and Dean, 2007), even though\nin that work the plant was considered in the context of the vestibulo-ocular re\ufb02ex. In particular, we\nuse a chain of two leaky integrators: a slow integrator with a relaxation constant of 100 ms drives the\neyes back to the rest position; the second integrator, with a fast time constant of 3 ms ensures that\nthe change in position does not occur instantaneously. To this basic plant, we add a reactive control\nlayer modeled as a proportional-integral (PI) error-feedback controller, with proportional gain kp and\nintegral gain ki. The control loop includes a 50 ms delay in the error feedback, to account for both\nthe actuation and the sensing latency. We choose gains such that reactive tracking lags the target by\napproximately 100 ms. This gives kp = 20 and ki = 100. To complete the anticipatory and adaptive\ncontrol architecture, the closed-loop system is supplemented by the feed-forward module.\n\nFigure 3: Behavior of the system. Left: Reference (r) and output of the system before (y[1]) and\nafter learning (y[50]). Right: Error before e[1] and after learning e[50] and output acquired by\ncerebellar/feed-forward component (o[50])\n\nThe architecture implementing the forward model-based gradient descent algorithm is applied to a\ntask structured in trials of 2.5 sec duration. Within each trial, a target remains still at the center of\nthe visual scene for a duration 0.5 sec, next it moves rightwards for 0.5 sec with constant velocity,\nremains still for 0.5 sec and repeats the sequence of movements in reverse, returning to the center.\nThe cerebellar component receives 20 Gaussian basis signals (X) whose receptive \ufb01elds are de\ufb01ned\nin the temporal domain, relative to trial onset, with a width (standard-deviation) of 50 ms and spaced\nby 100 ms. The whole system is simulated using a 1 ms time-step. To construct the matrix T we\ncomputed closed-loop system impulse response.\n\n6\n\n00.511.522.500.20.40.60.81time (s)angular position (a.u.) ry[1]y[50]00.511.522.5\u22120.100.10.2time (s)angular position (a.u.) e[1]e[50]o[50]\fAt the \ufb01rst trial, before any learning, the output of the plant lags the reference signal by approximately\n100 ms converging to the position only when the target remains still for about 300 ms (Fig. 3 left). As\na result of learning, the plant\u2019s behavior shifts from a reactive to an anticipatory mode, being able to\ntrack the reference without any delay. Indeed, the error that is sizable during the target displacement\nbefore learning, almost completely disappears by the 50th trial (Fig. 3 right). That cancellation\nresults from learning the weights that generate a feed-forward predictive signal that leads the changes\nin the reference signal (onsets and offsets of target movements) by approximately 100 ms (Fig. 3\nright). Indeed, convergence of the algorithm is remarkably fast and by trial 7 it has almost converged\nto the optimal solution (Fig. 4).\n\nFigure 4: Performance achieved with different learning rules. Representative learning curves of the\nforward model-based eligibility trace gradient descent (FM-ET), the simple Widrow-Hoff (WH) and\nthe Widrow-Hoff algorithm with a delta-eligibility trace matched to error feedback delay (WH+50\nms) or with an eligibility trace exceeding that delay by 20 ms (WH+70 ms). Error is quanti\ufb01ed as the\nrelative root mean-squared error (rRMSE), scaled proportionally to the error in the \ufb01rst trial. Error of\nthe optimal solution, obtained with w\u2217 = (T X)\u2020e0, is indicated with a dashed line.\n\nTo assess how much our forward-model-based eligibility trace contributes to performance, we test\nthree alternative algorithms. In both cases we employ the same control architecture, changing the\nplasticity rule such that we either use no eligibility trace, thus implementing the basic Widrow-Hoff\nlearning rule, or use the Widrow-Hoff rule extended with a delta-function eligibility trace that matches\nthe latency of the error feedback (50 ms) or slightly exceeds it (70 ms). Performance with the basic\nWH model worsens rapidly whereas performance with the WH learning rule using a \u201cpure delay\u201d\neligibility trace matched to the transport delay improves but not as fast as with the forward-model-\nbased eligibility trace (Fig. 4). Indeed, in this case, the best strategy for implementing a delayed\ndelta eligibility trace is setting a delay exceeding the transport delay by around 20 ms, thus matching\nthe peak of the impulse response. In that case, the system performs almost as good as with the\nforward-model eligibility trace (70 ms). This last result implies that, even though the literature\nusually emphasizes the role of transport delays, eligibility traces also account for response lags due\nto intrinsic dynamics of the plant.\nTo summarize our results, we have shown with a basic simulation of a visual smooth pursuit task\nthat generating the eligibility trace by means of a forward model ensures convergence to the optimal\nsolution and accelerates learning by guaranteeing that it follows a gradient descent.\n\n4 Discussion\n\nIn this paper we have introduced a novel formulation of cerebellar anticipatory control, consistent\nwith experimental evidence, in which a forward model has emerged naturally at the level of Purkinje\ncell synapses. From a machine learning perspective, we have also provided an optimality argument\nfor the derivation of an eligibility trace, a construct that was often thought of in more heuristic terms\nas a mechanism to bridge time-delays (Barto et al., 1983; Shibata and Schaal, 2001; McKinstry et al.,\n2006).\nThe \ufb01rst seminal works of cerebellar computational models emphasized its role as an associative\nmemory (Marr, 1969; Albus, 1971). Later, the cerebellum was investigates as a device processing\ncorrelated time signals(Fujita, 1982; Kawato et al., 1987; Dean et al., 2010). In this latter framework,\n\n7\n\n0102030405000.20.40.60.81#trialrRMSEWHWH+50msWH+70msFM\u2212ET\fthe use of the computational concept of an eligibility trace emerged as a heuristic construct that\nallowed to compensate for transmission delays in the circuit(Kettner et al., 1997; Shibata and Schaal,\n2001; Porrill and Dean, 2007), which introduced lags in the cross-correlation between signals.\nConcretely, that was referred to as the problem of delayed error feedback, due to which, by the time\nan error signal reaches a cell, the synapses accountable for that error are no longer the ones currently\nactive, but those that were active at the time when the motor signals that caused the actual error were\ngenerated. This view has however neglected the fact that beyond transport delays, response dynamics\nof physical plants also in\ufb02uence how past pre-synaptic signals could have related to the current\noutput of the plant. Indeed, for a linear plant, the impulse-response function of the plant provides the\ncomplete description of how inputs will drive the system, and as such, integrates transmission delays\nas well as the dynamics of the plant. Recently,\nEven though cerebellar microcircuits have been used as models for building control architectures,\ne.g., the feedback-error learning model (Kawato et al., 1987), our CFPC is novel in that it links\nthe cerebellum to the input of the feedback controller, ensuring that the computational features of\nthe feedback controller are exploited at all times. Within the domain of adaptive control, there are\nremarkable similarities at the functional level between CFPC and iterative learning control (ILC)\n(Amann et al., 1996), which is an input design technique for learning optimal control signals in\nrepetitive tasks. The difference between our CFPC and ILC lies in the fact that ILC controllers\ndirectly learn a control signal, whereas, the CFPC learns a conterfactual error signal that steers a\nfeedback controller. However the similarity between the two approaches can help for extending\nCFPC to more complex control tasks.\nWith our CFPC framework, we have modeled the cerebellar system at a very high level of abstraction:\nwe have not included bio-physical constraints underlying neural computations, obviated known\nanatomical connections such as the cerebellar nucleo-olivary inhibition (Bengtsson and Hesslow,\n2006; Herreros and Verschure, 2013) and made simpli\ufb01cations such as collapsing cerebellar cortex and\nnuclei into the same computational unit. On the one hand, such a choice of high-level abstraction may\nindeed be bene\ufb01cial for deriving general-purpose machine learning or adaptive control algorithms.\nOn the other hand, it is remarkable that in spite of this abstraction our framework makes \ufb01ne-grained\npredictions at the micro-level of biological processes. Namely, that in a cerebellar microcircuit (Apps\nand Garwicz, 2005), the response dynamics of secondary messengers (Wang et al., 2000) regulating\nplasticity of Purkinje cell synapses to parallel \ufb01bers must mimic the dynamics of the motor system\nbeing controlled by that cerebellar microcircuit. Notably, the logical consequence of this prediction,\nthat different Purkinje cells should display different plasticity rules according to the system that they\ncontrol, has been validated recording single Purkinje cells in vivo (Suvrathan et al., 2016).\nIn conclusion, we \ufb01nd that a normative interpretation of plasticity rules in Purkinje cell synapses\nemerges from our systems level CFPC computational architecture. That is, in order to generate\noptimal eligibility traces, synapses must include a forward model of the controlled subsystem. This\nconclusion, in the broader picture, suggests that synapses are not merely components of multiplicative\ngains, but rather the loci of complex dynamic computations that are relevant from a functional\nperspective, both, in terms of optimizing storage capacity (Benna and Fusi, 2016; Lahiri and Ganguli,\n2013) and \ufb01ne-tuning learning rules to behavioral requirements.\n\nAcknowledgments\n\nThe research leading to these results has received funding from the European Commission\u2019s Horizon\n2020 socSMC project (socSMC-641321H2020-FETPROACT-2014) and by the European Research\nCouncil\u2019s CDAC project (ERC-2013-ADG 341196).\n\nReferences\nAlbus, J. S. (1971). A theory of cerebellar function. Mathematical Biosciences, 10(1):25\u201361.\n\nAmann, N., Owens, D. H., and Rogers, E. (1996). Iterative learning control for discrete-time systems with\n\nexponential rate of convergence. IEE Proceedings-Control Theory and Applications, 143(2):217\u2013224.\n\nApps, R. and Garwicz, M. (2005). Anatomical and physiological foundations of cerebellar information\n\nprocessing. Nature reviews. Neuroscience, 6(4):297\u2013311.\n\nAstrom, K. J. and Murray, R. M. (2012). Feedback Systems: An Introduction for Scientists and Engineers.\n\nPrinceton university press.\n\n8\n\n\fBarto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that can solve dif\ufb01cult\n\nlearning control problems. IEEE transactions on systems, man, and cybernetics, SMC-13(5):834\u2013846.\n\nBastian, A. J. (2006). Learning to predict the future: the cerebellum adapts feedforward movement control.\n\nCurrent Opinion in Neurobiology, 16(6):645\u2013649.\n\nBengtsson, F. and Hesslow, G. (2006). Cerebellar control of the inferior olive. Cerebellum (London, England),\n\n5(1):7\u201314.\n\nBenna, M. K. and Fusi, S. (2016). Computational principles of synaptic memory consolidation. Nature\n\nneuroscience.\n\nBoyd, S. (2008). Introduction to linear dynamical systems. Online Lecture Notes.\n\nDe Zeeuw, C. I. and Yeo, C. H. (2005). Time and tide in cerebellar memory formation. Current opinion in\n\nneurobiology, 15(6):667\u201374.\n\nDean, P., Porrill, J., Ekerot, C.-F., and J\u00f6rntell, H. (2010). The cerebellar microcircuit as an adaptive \ufb01lter:\n\nexperimental and computational evidence. Nature reviews. Neuroscience, 11(1):30\u201343.\n\nEccles, J., Ito, M., and Szent\u00e1gothai, J. (1967). The cerebellum as a neuronal machine. Springer Berlin.\n\nFujita, M. (1982). Adaptive \ufb01lter model of the cerebellum. Biological cybernetics, 45(3):195\u2013206.\n\nGormezano, I., Kehoe, E. J., and Marshall, B. S. (1983). Twenty years of classical conditioning with the rabbit.\n\nHerreros, I. and Verschure, P. F. M. J. (2013). Nucleo-olivary inhibition balances the interaction between the\n\nreactive and adaptive layers in motor control. Neural Networks, 47:64\u201371.\n\nHesslow, G. and Yeo, C. H. (2002). The functional anatomy of skeletal conditioning. In A neuroscientist\u2019s guide\n\nto classical conditioning, pages 86\u2013146. Springer.\n\nHofstoetter, C., Mintz, M., and Verschure, P. F. (2002). The cerebellum in action: a simulation and robotics\n\nstudy. European Journal of Neuroscience, 16(7):1361\u20131376.\n\nJordan, M. I. (1996). Computational aspects of motor control and motor learning. In Handbook of perception\n\nand action, volume 2, pages 71\u2013120. Academic Press.\n\nKawato, M., Furukawa, K., and Suzuki, R. (1987). A hierarchical neural-network model for control and learning\n\nof voluntary movement. Biological Cybernetics, 57(3):169\u2013185.\n\nKettner, R. E., Mahamud, S., Leung, H. C., Sitkoff, N., Houk, J. C., Peterson, B. W., and Barto, a. G. (1997).\nPrediction of complex two-dimensional trajectories by a cerebellar model of smooth pursuit eye movement.\nJournal of neurophysiology, 77:2115\u20132130.\n\nLahiri, S. and Ganguli, S. (2013). A memory frontier for complex synapses. In Advances in neural information\n\nprocessing systems, pages 1034\u20131042.\n\nLisberger, S. (1987). Visual Motion Processing And Sensory-Motor Integration For Smooth Pursuit Eye\n\nMovements. Annual Review of Neuroscience, 10(1):97\u2013129.\n\nMarr, D. (1969). A theory of cerebellar cortex. The Journal of physiology, 202(2):437\u2013470.\n\nMassion, J. (1992). Movement, posture and equilibrium: Interaction and coordination. Progress in Neurobiology,\n\n38(1):35\u201356.\n\nMcKinstry, J. L., Edelman, G. M., and Krichmar, J. L. (2006). A cerebellar model for predictive motor control\ntested in a brain-based device. Proceedings of the National Academy of Sciences of the United States of\nAmerica, 103(9):3387\u20133392.\n\nPorrill, J. and Dean, P. (2007). Recurrent cerebellar loops simplify adaptive control of redundant and nonlinear\n\nmotor systems. Neural computation, 19(1):170\u2013193.\n\nShibata, T. and Schaal, S. (2001). Biomimetic smooth pursuit based on fast learning of the target dynamics. In\nIntelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Conference on, volume 1,\npages 278\u2013285. IEEE.\n\nSuvrathan, A., Payne, H. L., and Raymond, J. L. (2016). Timing rules for synaptic plasticity matched to\n\nbehavioral function. Neuron, 92(5):959\u2013967.\n\nWang, S. S.-H., Denk, W., and H\u00e4usser, M. (2000). Coincidence detection in single dendritic spines mediated by\n\ncalcium release. Nature neuroscience, 3(12):1266\u20131273.\n\n9\n\n\f", "award": [], "sourceid": 1908, "authors": [{"given_name": "Ivan", "family_name": "Herreros", "institution": "Universitat Pompeu Fabra"}, {"given_name": "Xerxes", "family_name": "Arsiwalla", "institution": "Pompeu Fabra University"}, {"given_name": "Paul", "family_name": "Verschure", "institution": "ICREA - Universitat Pompeu Fabra"}]}