{"title": "Synchronization and Grammatical Inference in an Oscillating Elman Net", "book": "Advances in Neural Information Processing Systems", "page_first": 236, "page_last": 243, "abstract": null, "full_text": "Synchronization and Grammatical Inference \n\nin an Oscillating Elman Net \n\nBill Baird \n\nDept Mathematics, \n\nU .C.Berkeley, \n\nBerkeley, Ca. 94720, \n\nbaird@math.berkeley.edu \n\nTodd Troyer \n\nDept Mathematics, \n\nU .C.Berkeley, \n\nBerkeley, Ca. 94720 \n\nFrank Eeckman \nLawrence Livermore \nNational Laboratory, \nP.O. Box 808 (L-426), \nLivermore, Ca. 94551 \n\nAbstract \n\nWe have designed an architecture to span the gap between bio(cid:173)\nphysics and cognitive science to address and explore issues of how \na discrete symbol processing system can arise from the continuum, \nand how complex dynamics like oscillation and synchronization can \nthen be employed in its operation and affect its learning. We show \nhow a discrete-time recurrent \"Elman\" network architecture can \nbe constructed from recurrently connected oscillatory associative \nmemory modules described by continuous nonlinear ordinary dif(cid:173)\nferential equations. The modules can learn connection weights be(cid:173)\ntween themselves which will cause the system to evolve under a \nclocked \"machine cycle\" by a sequence of transitions of attractors \nwithin the modules, much as a digital computer evolves by transi(cid:173)\ntions of its binary flip-flop attractors. The architecture thus em(cid:173)\nploys the principle of \"computing with attractors\" used by macro(cid:173)\nscopic systems for reliable computation in the presence of noise. We \nhave specifically constructed a system which functions as a finite \nstate automaton that recognizes or generates the infinite set of six \nsymbol strings that are defined by a Reber grammar. It is a symbol \nprocessing system, but with analog input and oscillatory subsym(cid:173)\nbolic representations. The time steps (machine cycles) of the sys(cid:173)\ntem are implemented by rhythmic variation (clocking) of a bifurca(cid:173)\ntion parameter. This holds input and \"context\" modules clamped \nat their attractors while 'hidden and output modules change state, \nthen clamps hidden and output states while context modules are \nreleased to load those states as the new context for the next cycle of \ninput. Superior noise immunity has been demonstrated for systems \nwith dynamic attractors over systems with static attractors, and \nsynchronization (\"binding\") between coupled oscillatory attractors \nin different modules has been shown to be important for effecting \nreliable transitions. \n\n236 \n\n\fSynchronization and Grammatical Inference in an Oscillating Elman Net \n\n237 \n\nIntroduction \n\n1 \nPatterns of 40 to 80 Hz oscillation have been observed in the large scale ac(cid:173)\ntivity (local field potentials) of olfactory cortex [Freeman and Baird, 1987] and \nvisual neocortex [Gray and Singer, 1987], and shown to predict the olfactory \n[Freeman and Baird, 1987] and visual pattern recognition responses of a trained \nanimal. Similar observations of 40 Hz oscillation in auditory and motor cortex (in \nprimates), and in the retina and EMG have been reported. It thus appears that \ncortical computation in general may occur by dynamical interaction of resonant \nmodes, as has been thought to be the case in the olfactory system. \nThe oscillation can serve a macroscopic clocking function and entrain or \"bind\" \nthe relevant microscopic activity of disparate cortical regions into a well defined \nphase coherent collective state or \"gestalt\". This can overide irrelevant microscopic \nactivity and produce coordinated motor output. There is further evidence that \nalthough the oscillatory activity appears to be roughly periodic, it is actually chaotic \nwhen examined in detail. \nIf this view is correct, then oscillatory I chaotic network modules form the actual cor(cid:173)\ntical substrate of the diverse sensory, motor, and cognitive operations now studied \nin static networks. It must then be shown how those functions can be accomplished \nwith oscillatory and chaotic dynamics, and what advantages are gained thereby. It \nis our expectation that nature makes ~ood use of this dynamical complexity, and \nour intent is to search here for novel deSign principles that may underly the superior \ncomputational performance of biological systems over man made devices in many \ntask domains. These principles may then be applied in artificial systems to engi(cid:173)\nneering problems to advance the art of computation. We have therefore constructed \na parallel distributed processing architecture that is inspired by the structure and \ndynamics of cerebral cortex, and applied it to the problem of grammatical inference. \nThe construction assumes that cortex is a set of coupled oscillatory associative \nmemories, and is also guided by the principle that at tractors must be used by \nmacroscopic systems for reliable computation in the presence of noise. Present day \ndigital computers are built of flip-flops which, at the level of their transistors, are \ncontinuous dissipative dynamical systems with different attractors underlying the \nsymbols we call \"0\" and \"1\". \n\n2 Oscillatory Network Modules \nThe network modules of this architecture were developed previously as models of \nolfactory cortex, or caricatures of \"patches\"of neocortex [Baird, 1990a]. A partic(cid:173)\nular subnetwork is formed by a set of neural populations whose interconnections \nalso contain higher order synapses. These synapses determine at tractors for that \nsubnetwork independent of other subnetworks. Each subnetwork module assumes \nonly minimal coupling justified by known olfactory anatomy. An N node module \ncan be shown to function as an associative memory for up to N 12 oscillatory and \nNI3 chaotic memory attractors [Baird, 1990b, Baird and Eeckman, 1992b). Single \nmodules with static, oscillatory, and three types of chaotic attractors - Lorenz, \nRoessler, Ruelle-Takens - have been sucessfully used for recognition of handwritten \ncharacters [Baird and Eeckman, 1992b]. \nWe have shown in these modules a superior stability of oscillatory attractors over \nstatic attractors in the presence of additive Gaussian noise perturbations with \nthe 1 If spectral character of the noise found experimentally by Freeman in the \nbrain[Baird and Eeckman, 1992a]. This may be one reason why the brain uses \ndynamic attractors. An oscillatory attractor acts like a a bandpass filter and is \n\n\f238 \n\nBaird, Troyer, and Eeckman \n\neffectively immune to the many slower macroscopic bias perturbations in the theta(cid:173)\nalpha-beta range (3 - 25 Hz) below its 40 -80 Hz passband, and the more microscopic \nperturbations of single neuron spikes in the 100 - 1000 Hz range. \nThe mathematical foundation for the construction of network modules is contained \nin the normal form projection algorithm[Baird, 1990b]. This is a learning algo(cid:173)\nrithm for recurrent analog neural networks which allows associative memory storage \nof analog patterns, continuous periodic sequences, and chaotic attractors in the same \nnetwork. A key feature of a net constructed by this algorithm is that the underly(cid:173)\ning dynamics is explicitly isomorphic to any of a class of standard, well understood \nnonlinear dynamical systems - a \"normal form\" [Guckenheimer and Holmes, 1983]. \nThis system is chosen in advance, independent of both the patterns to be stored \nand the learning algorithm to be used. This control over the dynamics permits the \ndesign of important aspects of the network dynamics independent of the particu(cid:173)\nlar patterns to be stored. Stability, basin geometry, and rates of convergence to \nattractors can be programmed in the standard dynamical system. \nBy analyzing the network in the polar form of these \"normal form coordinates\", \nthe amplitude and phase dynamics have a particularly simple interaction. When \nthe input to a module is synchronized with its intrinsic oscillation, the amplitudes \nof the periodic activity may be considered separately from the phase rotation, and \nthe network of the module may be viewed as a static network with these amplitudes \nas its activity. We can further show analytically that the network modules we have \nconstructed have a strong tendency to synchronize as required. \n\n3 Oscillatory Elman Architecture \n\nBecause we work with this class of mathematically well-understood associative mem(cid:173)\nory networks, we can take a constructive approach to building a cortical computer \narchitecture, using these networks as modules in the same way that digital com(cid:173)\nputers are designed from well behaved continuous analog flip-flop circuits. The \narchitecture is such that the larger system is itself a special case of the type of \nnetwork of the submodules, and can be analysed with the same tools used to design \nthe subnetwork modules. \nEach module is described in normal form or \"mode\" coordinates as a k-winner(cid:173)\ntake-all network where the winning set of units may have static, periodic or chaotic \ndynamics. By choosing modules to have only two attractors, networks can be built \nwhich are similar to networks using binary units. There can be fully recurrent con(cid:173)\nnections between modules. The entire super-network of connected modules, how(cid:173)\never, is itself a polynomial network that can be projected into standard network \ncoordinates. The attractors within the modules may then be distributed patterns \nlike those described for the biological model [Baird, 1990a], and observed exper(cid:173)\nimentally in the olfactory system [Freeman and Baird, 1987]. The system is still \nequivalent to the architecture of modules in normal form, however, and may easily \nbe designed, simulated, and theoretically evaluated in these coordinates. In this \npaper all networks are discussed in normal form coordinates. \nAs a benchmark for the capabilities of the system, and to create a point of contact \nto standard network architectures, we have constructed a discrete-time recurrent \n\"Elman\" network [Elman, 1991] from oscillatory modules defined by ordinary dif(cid:173)\nferential equations. We have at present a system which functions as a finite state \nautomaton that perfectly recognizes or generates the infinite set of strings defined by \nthe Reber grammar described in Cleeremans et. al. [Cleeremans et al., 1989]. The \nconnections for this network were found by psuedo-inverting to find the connection \nmatrices between a set of pre-chosen automata states for the hidden layer modules \n\n\fSynchronization and Grammatical Inference in an Oscillating Elman Net \n\n239 \n\nand the proper possible output symbols of the Reber grammar, and between the \nproper next hidden state and each legal combination of a new input symbol and the \npresent state contained in the context modules. \nWe use two types of modules in implementing the Elman network architecture. \nThe input and output layer each consist of a single associative memory module \nwith six oscillatory at tractors (six competing oscillatory modes), one for each of the \nsix possible symbols in the grammar. An attractor in these winner-take-all normal \nform cordinates is one oscillator at its maximum amplitude, with the others near \nzero amplitude. The hidden and context layers consist of binary \"units\" composed \nof a two competing oscillator module. We think of one mode within the unit as \nrepresenting \"I\" and the other as representing\"O\" (see fig. 1). \nA \"weight\" for this unit is simply defined to be the weight of a driving unit to the \ninput of the 1 attractor. The weights for the 0 side of the unit are then given as \nthe compliment of these, w O = A - WI. This forces the input to the 0 side of the \nunit be the complement of the input to the 1 side, If = A - If, where A is a bias \nconstant chosen to divide input equally between the oscillators at the midpoint of \nactivation. \n\nI \n\nI \n\nFigure 1. \n\nOUTPUT \n\n.---------------------------\n: \n: \n\n\\(!)@00 = (Jo - (JI = (Jo - wIt = 0, and a repellor at 180 degrees in the \nphase difference equations ;p for either side of a unit driven by an input of the same \nfrequency, WI - Wo = o. \n\n;p = Wo -WI + (rI/ro)sin(-