{"title": "Adaptive Synchronization of Neural and Physical Oscillators", "book": "Advances in Neural Information Processing Systems", "page_first": 109, "page_last": 116, "abstract": null, "full_text": "Adaptive Synchronization of \n\nNeural and Physical Oscillators \n\nKenji Doya \n\nUniversity of California, San Diego \n\nLa Jolla, CA 92093-0322, USA \n\nShuji Yoshizawa \nUniversity of Tokyo \n\nBunkyo-ku, Tokyo 113, Japan \n\nAbstract \n\nAnimal locomotion patterns are controlled by recurrent neural networks \ncalled central pattern generators (CPGs). Although a CPG can oscillate \nautonomously, its rhythm and phase must be well coordinated with the \nstate of the physical system using sensory inputs. In this paper we propose \na learning algorithm for synchronizing neural and physical oscillators with \nspecific phase relationships. Sensory input connections are modified by the \ncorrelation between cellular activities and input signals. Simulations show \nthat the learning rule can be used for setting sensory feedback connections \nto a CPG as well as coupling connections between CPGs. \n\n1 CENTRAL AND SENSORY MECHANISMS IN \n\nLOCOMOTION CONTROL \n\nPatterns of animal locomotion, such as walking, swimming, and fiying, are generated \nby recurrent neural networks that are located in segmental ganglia of invertebrates \nand spinal cords of vertebrates (Barnes and Gladden, 1985). These networks can \nproduce basic rhythms of locomotion without sensory inputs and are called central \npattern generators (CPGs). The physical systems of locomotion, such as legs, fins, \nand wings combined with physical environments, have their own oscillatory char(cid:173)\nacteristics. Therefore, in order to realize efficient locomotion, the frequency and \nthe phase of oscillation of a CPG must be well coordinated with the state of the \nphysical system. For example, the bursting patterns of motoneurons that drive a \nleg muscle must be coordinated with the configuration of the leg, its contact with \nthe ground, and the state of other legs. \n\n109 \n\n\f110 \n\nDoya and Yoshizawa \n\nThe oscillation pattern of a ePG is largely affected by proprioceptive inputs. It has \nbeen shown in crayfish (Siller et al., 1986) and lamprey (Grillner et aI, 1990) that the \noscillation of a ePG is entrained by cyclic stimuli to stretch sensory neurons over a \nwide range of frequency. Both negative and positive feedback pathways are found in \nthose systems. Elucidation of the function of the sensory inputs to CPGs requires \ncomputational studies of neural and physical dynamical systems. Algorithms for \nthe learning of rhythmic patterns in recurrent neural networks have been derived by \nDoya and Yoshizawa (1989), Pearlmutter (1989), and Williams and Zipser (1989). \nIn this paper we propose a learning algorithm for synchronizing a neural oscillator \nto rhythmic input signals with a specific phase relationship. \nIt is well known that a coupling between nonlinear oscillators can entrainment their \nfrequencies. The relative phase between oscillators is determined by the parameters \nof coupling and the difference of their intrinsic frequencies. For example, either \nin-phase or anti-phase oscillation results from symmetric coupling between neural \noscillators with similar intrinsic frequencies (Kawato and Suzuki, 1980). Efficient \nlocomotion involves subtle phase relationships between physical variables and motor \ncommands. Accordingly, our goal is to derive a learning algorithm that can finely \ntune the sensory input connections by which the relative phase between physical \nand neural oscillators is kept at a specific value required by the task. \n\n2 LEARNING OF SYNCHRONIZATION \n\nWe will deal with the following continuous-time model of a CPG network. \n\ndes \n\nTi dtXi(t) = -Xi(t) + L Wijgj(Xj(t)) + L Vi1:yA:(t) , \n\nj=1 \n\n1:=1 \n\n(1) \n\nwhere Xi(t) and gi(Xi(t)) (i = 1, ... , C) represent the states and the outputs ofCPG \nneurons and Y1:(t) (k = 1, ... , S) represents sensory inputs. We assume that the \nconnection weights W = {Wij} are already established so that the network oscillates \nwithout sensory inputs. The goal oflearning is to find the input connection weights \nV = {Vij} that make the network state x(t) = (Xl (t), ... ,xc(t))t entrained to the \ninput signal yet) = (Yl(t), .. . ,Ys(t))t with a specific relative phase. \n\n2.1 AN OBJECTIVE FUNCTION FOR PHASE-LOCKING \n\nThe standard way to derive a learning algorithm is to find out an objective function \nto be minimized. If we can approximate the waveforms of Xi(t) and Y1:(t) by sine \nwaves, a linear relationship \n\nx(t) = Py(t) \n\nspecifies a phase-locked oscillation of x(t) and Yet). For example, if we have Yl = \nsin wt and Y2 = cos wt, then a matrix P = (~ fi) specifies Xl = v'2 sine wt +1r /4) and \nX2 = 2 sine wt + 1r /3). Even when the waveforms are not sinusoidal, minimization of \nan objective function \n\n1 \n\nE(t) = \"2l1x(t) - py(t)1I2 = \"2 2: {Xi(t) - L Pi1:Y1:(t)}2 \n\n1 c \n\ns \n\n(2) \n\ni=l \n\n1:=1 \n\n\fAdaptive Synchronization of Neural and Physical Oscillators \n\n111 \n\ndetermines a specific relative phase between x(t) and y(t). Thus we call P = {Pik} \na phase-lock matrix. \n\n2.2 LEARNING PROCEDURE \n\nUsing the above objective function, we will derive a learning procedure for phase(cid:173)\nlocked oscillation of x(t) and y(t). First, an appropriate phase-lock matrix P is \nidentified while the relative phase between x(t) and y(t) changes gradually in time. \nThen, a feedback mechanism can be applied so that the network state x(t) is kept \nclose to the target waveform P y(t). \nSuppose we actually have an appropriate phase relationship between x(t) and y(t), \nthen the phase-lock matrix P can be obtained by gradient descent of E(t) with \nrespect to PH: as follows (Widrow and Stearns, 1985). \n\nd \ndtPik = -TJ {}. = TJ {Xi(t) - LPijYj(t)}Yk(t). \n\nS \n\n{}E(t) \nP,k \n\nj=1 \n\n(3) \n\nIf the coupling between x(t) and y(t) are weak enough, their relative phase changes \nin time unless their intrinsic frequencies are exactly equal and the systems are \ncompletely noiseless. By modulating the learning coefficient TJ by some performance \nindex of the total system, for example, the speed of locomotion, it is possible to \nobtain a matrix P that satisfies the requirement of the task. \nOnce a phase-lock matrix is derived, we can control x(t) close to Py(t) using the \ngradient of E(t) with respect to the network state \n\n{}E(t) \n{} .() = Xi(t) - L PikYk(t). \nX, t \n\nk=1 \n\nS \n\nThe simplest feedback algorithm is to add this term to the CPG dynamics as follows. \n\nd e s \n\nTi dtXi(t) = -Xi(t) + L Wijgj(Xj(t)) - O'{Xi(t) - LPikYk(t)}. \n\nj=1 \n\nk=1 \n\nThe feedback gain 0' (> 0) must be set small enough so that the feedback term \ndoes not destroy the intrinsic oscillation of the CPG. In that case, by neglecting the \nsmall additional decay term O'Xi(t), we have \n\nd e s \nTj dt Xi(t) = -Xj(t) + L Wijgj(Xj (t)) + L O'PikYk(t), \nwhich is equivalent to the equation (1) with input weights Vik = O'Pik. \n\nk=1 \n\nj=1 \n\n(4) \n\n\f112 \n\nDoya and Yoshizawa \n\n3 DELAYED SYNCHRONIZATION \n\nWe tested the above learning scheme on a delayed synchronization task; to find \ncoupling weights between neural oscillators so that they synchronize with a specific \ntime delay. We used the following coupled CPG model. \nc \n\nc \n\nTdd xi(t) = -xi(t) + L wijyj(t) + ~ Lpi1:y~-n(t), \n\n(5) \n\nt \n\n. \nJ=1 \nyi(t) = g(xi(t)), \n\n1:=1 \n\n(i = 1, . .. , C), \n\nwhere superscripts denote the indices of two CPGs (n = 1,2). The goal of learning \nwas to synchronize the waveforms yHt) and y~(t) with a time delay ~T. We used \n\nz(t) = -Iy~(t - ~T) - y~(t)1 \n\nas the performance index. The learning coefficient 7] of equation (3) was modulated \nby the deviation of z(t) from its running average z(t) using the following equations. \n\n7](t) = 7]0 {z(t) - z(t)}, \n\nd \n\nTa dt z(t) = -z(t) + z(t). \n\n(6) \n\na \n\nb \ny1 \n\ny2 \n\n0.0 \n\n4. 0 \n\n8. 0 \n\n12. 0 \n\n16. 0 \nd \n\n20. 0 \n\n24. 0 \n\n28. 0 \n\n32. 0 \n\n.... \n\n..... \n\ny2~rvl\\\u00b7 \n\nO. 0 \n\n4. 0 \n\n8. 0 \n\n12. 0 \n\n16. 0 \n\n0.'-;;' O---;4~: 0i\\\"\"\"\"'\"\"-\"8.\"A' o---:-l-;!-i\"\"\"o -~1S: 0 \n\nc \n\ne \n\ny1 . \n\ny2~y2 \n\n0.0 \n\n4.0 \n\n8.0 \n\n12. 0 \n\n16. 0 o .... 'o-----:4:-'-::o::---~8: ..... 0---:-1-;:1-i-:-0 --1~6: 0 \n\nFigure 1: Learning of delayed synchronization of neural oscillators. The dotted and \nsolid curves represent yf(t) and y;(t) respectively. a:without coupling. b:~T = 0.0. \nc:~T = 1.0. c:~T = 2.0. d : ~T = 3.0. \n\n\fAdaptive Synchronization of Neural and Physical Oscillators \n\n113 \n\nFirst, two CPGs were trained independently to oscillate with sinusoidal waveforms \nof period Tl = 4.0 and T2 = 5.0 using continuous-time back-propagation learning \n(Doyaand Yoshizawa, 1989). Each CPG was composed of two neurons (C = 2) with \ntime constants T = 1.0 and output functions g() = tanh() . Instead of following \nthe two step procedure described in the previous section, the network dynamics (5) \nand the learning equations (3) and (6) were simulated concurrently with parameters \na = 0.1, '10 = 0.2, and To = 20.0. \nFigure 1 a shows the oscillation of two CPGs without coupling. Figures 1 b through \ne show the phase-locked waveforms after learning for 200 time units with different \ndesired delay times. \n\n4 ZERO-LEGGED LOCOMOTION \n\nN ext we applied the learning rule to the simplest locomotion system that in(cid:173)\nvolves a critical phase-lock between the state of the physical system and the motor \ncommand-a zero-legged locomotion system as shown in Figure 2 a. \nThe physical system is composed of a wheel and a weight that moves back and \nforth on a track fixed radially in the wheel. It rolls on the ground by changing its \nbalance with the displacement of the weight. In order to move the wheel in a given \ndirection, the weight must be moved at a specific phase with the rotation angle of \nthe wheel. The motion equations are shown in Appendix. \nFirst, a CPG network was trained to oscillate with a sinusoidal waveform of period \nT = 1.0 (Doya and Yoshizawa, 1989). The network consisted of one output and \ntwo hidden units (C = 3) with time constants Ti = 0.2 and output functions giO = \ntanh(). Next, the output of the CPG was used to drive the weight with a force \n/ = /max gl(Xl(t\u00bb. The position T and the velocity T of the weight and the rotation \nangle (cos 0, sin 0) and the angular velocity of the wheel iJ were used as sensory \nfeedback inputs Yl:(t) (k = 1, .. . ,5) after scaling to [-1,1]. \nIn order to eliminate the effect of biases in x(t) and yet), we used the following \nlearni~g equations. \n\nd \ndtPil: = '1 ((Xi(t) - Xi(t\u00bb - L Pi; (y;(t) - y;(t\u00bb}(Yl:(t) - Yl:(t\u00bb, \n\nS \n\n;=1 \n\nd \n\nTtl: dt Xi(t) = -Xi(t) + Xi(t), \nTy dtYl:(t) = -Yl:(t) + Yl:(t). \n\nd \n\n(7) \n\nThe rotation speed of the wheel was employed as the performance index z(t) after \nsmoothing by the following equation. \n\nd \n\nT, dt z(t) = -z(t) + OCt). \n\n. \n\nThe learning coefficient '1 was modulated by equations (6). The time constants were \nTtl: = 4.0, Ty = 1.0, T, = 1.0, and To = 4.0. Each training run was started from a \nrandom configuration of the wheel and was finished after ten seconds. \n\n\f114 \n\nDoya and Yoshizawa \n\na \n\nb \n\nc \n\nsin90 \n\n\u2022 \n\ncos9O----\n\n9~ \n\npos \nvel \ncos \nSID \nrot \n\n,perle-\n\n6.0 0.0 \n\n;-= \n\n3. 0 4.0 \n\n, \n1.0 \n\n, \n2.0 \n\n5.0 \n\n6.0 \n\n, \n1.0 \n\n, \n, \n2.0 3.0 \n\n, \n4.0 \n\n, \n5.0 \n\n0.0 \n\n/' /' \n\n-0.5 \n\n/' \n\n/' /' /' \n\n0.5 \n\n0.0 \n\npos \"------' \nvel \ncos \nsm \n\nO. 0 \n\n1. 0 \n\nbidS ~ :r-----\n..... , _ ....... ' ___ ,'-,-----'-, ,..----'-:-' _ , -1 - ' _-::-I' \n\n2. 0 \n\n3. 0 4. 0 \n\n5. 0 6. 0 O. 0 \n\n1. 0 \n\n2. 0 3. 0 \n\n4. 0 \n\n5. 0 6. 0 \n\n/' /' /' /' \n\n-0.5 \n\n0.0 \n\n0.5 \n\nFigure 2: Learning of zero-legged locomotion. \n\n\fAdaptive Synchronization of Neural and Physical Oscillators \n\n115 \n\nFigure 2 b is an example of the motion of the wheel without sensory feedback. \nThe rhythms of the CPG and the physical system were not entrained to each other \nand the wheel wandered left and right. Figure 2 c shows an example of the wheel \nmotion after 40 runs of training with parameters Tlo = 0.1 and (}' = 0.2. At first, the \noscillation of the CPG was slowed down by the sensory inputs and then accelerated \nwith the rotation of the wheel in the right direction. \nWe compared the patterns of sensory input connections made after learning with \nwheels of different sizes. Table 1 shows the connection weights to the output unit. \nThe positive connection from sin 0 forces the weight to the right-hand side of the \nwheel and stabilize clockwise rotation. The negative connection from cos 0 with \nsmaller radius fastens the rhythm of the CPG when the wheel rotates too fast and \nthe weight is lifted up. The positive input from r with larger radius makes the \nweight stickier to both ends of the track and slows down the rhythm of the CPG. \n\nTable 1: Sensory input weights to the output unit (Plk; k = 1, ... ,5). \n\nradius \n2cm \n4cm \n6cm \n8cm \n10cm \n\nr \n\n0.15 \n0.28 \n0.67 \n0.70 \n0.90 \n\nr \n\n-0.53 \n-0.55 \n-0.21 \n-0.33 \n-0.12 \n\ncosO \n-1.35 \n-1.09 \n-0.41 \n-0.40 \n-0.30 \n\nsinO \n1.32 \n1.22 \n0.98 \n0.92 \n0.93 \n\n0 \n0.07 \n0.01 \n0.00 \n0.03 \n-0.02 \n\n5 DISCUSSION \n\nThe architectures of CPGs in lower vertebrates and invertebrates are supposed to \nbe determined by genetic information. Nevertheless, the wayan animal utilizes the \nsensory inputs must be adaptive to the characteristics of the physical environments \nand the changing dimensions of its body parts. \nBack-propagation through forward models of physical systems can also be applied \nto the learning of sensory feedback (Jordan and Jacobs, 1990). However, learning of \nnonlinear dynamics of locomotion systems is a difficult task; moreover, multi-layer \nback-propagation is not appropriate as a biological model of learning. The learning \nrule (7) is similar to the covariance learning rule (Sejnowski and Stanton, 1990), \nwhich is a biological model of long term potentiation of synapses. \n\nAcknowledgements \n\nThe authors thank Allen Selverston, Peter Rowat, and those who gave comments \nto our poster at NIPS Conference. This work was partly supported by grants from \nthe Ministry of Education, Culture, and Science of Japan. \n\n\f116 \n\nDoya and Yoshizawa \n\nReferences \n\nBarnes, W. J. P. & Gladden, M. H. (1985) Feedback and Motor Control in Inverte(cid:173)\nbrates and Vertebrates. Beckenham, Britain: Croom Helm. \nDoya, K. & Yoshizawa, S. (1989) Adaptive neural oscillator using continuous-time \nback-propagation learning. Neural Networks, 2, 375-386. \nGrillner, S. & Matsushima, T. (1991) The neural network underlying locomotion in \nLamprey-Synaptic and cellular mechanisms. Neuron, 7(July), 1-15. \nJordan, M. I. & Jacobs, R. A. (1990) Learning to control an unstable system with \nforward modeling. In Touretzky, D. S. (ed.), Advances in Neural Information Pro(cid:173)\ncessing Systems 2. San Mateo, CA: Morgan Kaufmann. \nKawato, M. & Suzuki, R. (1980) Two coupled neural oscillators as a model of the \ncircadian pacemaker. Journal of Theoretical Biology, 86, 547-575. \nPearlmutter, B. A. (1989) Learning state space trajectories in recurrent neural net(cid:173)\nworks. Neural Computation, 1, 263-269. \nSejnowski, T. J. & Stanton, P. K. (1990) Covariance storage in the Hippocampus. \nIn Zornetzer, S. F. et aI. (eds.), An Introduction to Neural and Electronic Networks, \n365-377. San Diego, CA: Academic Press. \nSiller, K. T., Skorupski, P., Elson, R. C., & Bush, M. H. (1986) Two identified \nafferent neurones entrain a central locomotor rhythm generator. Nature, 323, 440-\n443. \nWidrow, B. & Stearns, S. D. (1985) Adaptive Signal Processing. Englewood Cliffs, \nN J: Prentice Hall. \nWilliams, R. J. & Zipser, D. (1989) A learning algorithm for continually running \nfully recurrent neural networks. Neural Computation, 1, 270-280. \n\nAppendix \n\nThe dynamics of the zero-legged locomotion system: \n\n.. \n\nmr = JO + \n\n.f.(1 mR2 sin2 0) \n\n10 \n\n- mgc cos + \nR \u00b7 Ov+2mr(r+RcosO)0' \n\n(0 mRsin20(r+RcosO\u00bb \n\n10 \n\n10 \n\n+m sm \n-loR sin 0 + mgcsinO(r + RcosO) - (v + 2mr(r + RcosO\u00bbO, \nImax g(Xl(t\u00bb - ur3 - /Jr, \n1+ MR2 + m(r + RcoSO)2. \n\n+mr \n\n0'2 \n\n, \n\n100 \n10 \n10 \n\nParameters: the masses of the weight m = 0.2[kg) and the wheel M = 0.8[kg); \nthe radius of the wheel R = 0.02throughO.l[m)j the inertial moment of the wheel \nI = t M R2 j the maximum force to the weight 1 max = 5[N) j the stiffness of the \nlimiter of the weight u = 20/ R3 [N/m3); the damping coefficients of the weight \nmotion /J = 0.2/ R [N/(m/s\u00bb) and the wheel rotation v = 0.05(M +m)R [N/(rad/s\u00bb). \n\n\f", "award": [], "sourceid": 537, "authors": [{"given_name": "Kenji", "family_name": "Doya", "institution": null}, {"given_name": "Shuji", "family_name": "Yoshizawa", "institution": null}]}