{"title": "Grammatical Inference by Attentional Control of Synchronization in an Oscillating Elman Network", "book": "Advances in Neural Information Processing Systems", "page_first": 67, "page_last": 74, "abstract": null, "full_text": "Grammatical Inference by \n\nAttentional Control of Synchronization \n\nin an Oscillating Elman Network \n\nBill Baird \n\nDept Mathematics, \n\nU.C.Berkeley, \n\nBerkeley, Ca. 94720, \n\nbaird@math.berkeley.edu \n\nTodd Troyer \nDept of Phys., \n\nU.C.San Francisco, \n513 Parnassus Ave. \n\nSan Francisco, Ca. 94143, \n\ntodd@phy.ucsf.edu \n\nFrank Eeckman \nLawrence Livermore \nNational Laboratory, \nP.O. Box 808 (L-270), \nLivermore, Ca. 94550, \n\neeckman@.llnl.gov \n\nAbstract \n\nWe show how an \"Elman\" network architecture, constructed from \nrecurrently connected oscillatory associative memory network mod(cid:173)\nules, can employ selective \"attentional\" control of synchronization \nto direct the flow of communication and computation within the \narchitecture to solve a grammatical inference problem. \nPreviously we have shown how the discrete time \"Elman\" network \nalgorithm can be implemented in a network completely described \nby continuous ordinary differential equations. The time steps (ma(cid:173)\nchine cycles) of the system are implemented by rhythmic variation \n(clocking) of a bifurcation parameter. In this architecture, oscilla(cid:173)\ntion amplitude codes the information content or activity of a mod(cid:173)\nule (unit), whereas phase and frequency are used to \"softwire\" the \nnetwork. Only synchronized modules communicate by exchang(cid:173)\ning amplitude information; the activity of non-resonating modules \ncontributes incoherent crosstalk noise. \nAttentional control is modeled as a special subset of the hidden \nmodules with ouputs which affect the resonant frequencies of other \nhidden modules. They control synchrony among the other mod(cid:173)\nules and direct the flow of computation (attention) to effect transi(cid:173)\ntions between two subgraphs of a thirteen state automaton which \nthe system emulates to generate a Reber grammar. The internal \ncrosstalk noise is used to drive the required random transitions of \nthe automaton. \n\n67 \n\n\f68 \n\nBaird, Troyer, and Eeckman \n\n1 \n\nIntroduction \n\nRecordings of local field potentials have revealed 40 to 80 Hz oscillation in vertebrate \ncortex [Freeman and Baird, 1987, Gray and Singer, 1987]. The amplitude patterns \nof such oscillations have been shown to predict the olfactory and visual pattern \nrecognition responses of a trained animal. There is further evidence that although \nthe oscillatory activity appears to be roughly periodic, it is actually chaotic when \nexamined in detail. This preliminary evidence suggests that oscillatory or chaotic \nnetwork modules may form the cortical substrate for many of the sensory, motor, \nand cognitive functions now studied in static networks. \nIt remains be shown how networks with more complex dynamics can performs these \noperations and what possible advantages are to be gained by such complexity. We \nhave therefore constructed a parallel distributed processing architecture that is in(cid:173)\nspired by the structure and dynamics of cerebral cortex, and applied it to the prob(cid:173)\nlem of grammatical inference. The construction views cortex as a set of coupled \noscillatory associative memories, and is guided by the principle that attractors must \nbe used by macroscopic systems for reliable computation in the presence of noise. \nThis system must function reliably in the midst of noise generated by crosstalk from \nit's own activity. Present day digital computers are built of flip-flops which, at the \nlevel of their transistors, are continuous dissipative dynamical systems with differ(cid:173)\nent attractors underlying the symbols we call \"0\" and \"1\". In a similar manner, the \nnetwork we have constructed is a symbol processing system, but with analog input \nand oscillatory subsymbolic representations. \n\nThe architecture operates as a thirteen state finite automaton that generates the \nsymbol strings of a Reber grammar. It is designed to demonstrate and study the \nfollowing issues and principles of neural computation: (1) Sequential computation \nwith coupled associative memories. (2) Computation with attractors for reliable \noperation in the presence of noise. (3) Discrete time and state symbol processing \narising from continuum dynamics by bifurcations of attractors. (4) Attention as \nselective synchronization controling communication and temporal program flow. (5) \nchaotic dynamics in some network modules driving randomn choice of attractors in \nother network modules. The first three issues have been fully addressed in a previous \npaper [Baird et. al., 1993], and are only briefly reviewed. \".le focus here on the last \ntwo. \n\n1.1 Attentional Processing \n\nAn important element of intra-cortical communication in the brain, and between \nmodules in this architecture, is the ability of a module to detect and respond to \nthe proper input signal from a particular module, when inputs from other modules \nirrelevant to the present computation are contributing crosstalk noise. This is smilar \nto the problem of coding messages in a computer architecture like the Connection \nMachine so that they can be picked up from the common communication buss line \nby the proper receiving module. \nPeriodic or nearly periodic (chaotic) variation of a signal introduces additional de(cid:173)\ngrees of freedom that can be exploited in a computational architecture. We investi(cid:173)\ngate the principle that selective control of synchronization, which we hypopthesize \nto be a model of \"attention\", can be used to solve this coding problem and control \ncommunication and program flow in an architecture with dynamic attractors. \n\nThe architecture illust.rates the notion that synchronization not only \"binds\" sen-\n\n\fGrammatical Inference by Attentional Control of Synchronization \n\n69 \n\nsory inputs into \"objects\" [Gray and Singer, 1987], but binds the activity of selected \ncortical areas into a functional whole that directs behavior. It is a model of \"at(cid:173)\ntended activity\" as that subset which has been included in the processing of the \nmoment by synchronization. This is both a spatial and temporal binding. Only the \ninputs which are synchronized to the internal oscillatory activity of a module can \neffect previously learned transitions of at tractors within it. For example, consider \ntwo objects in the visual field separately bound in primary visual cortex by synchro(cid:173)\nnization of their components at different phases or frequencies. One object may be \nselectively attended to by its entrainment to oscillatory processing at higher levels \nsuch as V4 or IT. These in turn are in synchrony with oscillatory activity in motor \nareas to select the attractors there which are directing motor output. \nIn the architecture presented here, we have constrained the network dynamics so \nthat there exist well defined notions of amplitude, phase, and frequency. The net(cid:173)\nwork has been designed so that amplitude codes the information content or activity \nof a module, whereas phase and frequency are used to \"softwire\" the network. An \noscillatory network module has a passband outside of which it will not synchro(cid:173)\nnize with an oscillatory input. Modules can therefore easily be de synchronized \nby perturbing their resonant frequencies. Furthermore, only synchronized modules \ncommunicate by exchanging amplitude information; the activity of non-resonating \nmodules contributes incoherant crosstalk or noise. The flow of communication be(cid:173)\ntween modules can thus be controled by controlling synchrony. By changing the \nintrinsic frequency of modules in a patterned way, the effective connectivity of the \nnetwork is changed. The same hardware and connection matrix can thus subserve \nmany different computations and patterns of interaction between modules without \ncrosstalk problems. \nThe crosstalk noise is actually essential to the function of the system. It serves as \nthe noise source for making random choices of output symbols and automaton state \ntransitions in this architecture, as we discuss later. In cortex there is an issue as to \nwhat may constitute a source of randomness of sufficient magnitude to perturb the \nlarge ensemble behavior of neural activity at the cortical network level. It does not \nseem likely that the well known molecular fluctuations which are easily averaged \nwithin one or a few neurons can do the job. The architecture here models the \nhypothesis that deterministic chaos in the macroscopic dynamics of a network of \nneurons, which is the same order of magnitude as the coherant activity, can serve \nthis purpose. \nIn a set of modules which is desynchronized by perturbing the resonant frequencies \nof the group, coherance is lost and \"random\" phase relations result. The character \nof the model time traces is irregular as seen in real neural ensemble activity. The be(cid:173)\nhavior of the time traces in different modules of the architecture is similar to the tem(cid:173)\nporary appearance and switching of synchronization between cortical areas seen in \nobservations of cortical processing during sensory/motor tasks in monkeys and hu(cid:173)\nmans [Bressler and Nakamura, 1993]. The structure of this apparently chaotic sig(cid:173)\nnal and its use in network learning and operation are currently under investigation. \n\n2 Normal Form Associative Memory Modules \n\nThe mathematical foundation for the construction of network modules is contained \nin the normal form projection algorithm [Baird and Eeckman, 1993]. This is a \nlearning algorithm for recurrent analog neural networks which allows associative \nmemory storage of analog patterns, continuous periodic sequences, and chaotic \n\n\f70 \n\nBaird, Troyer, and Eeckman \n\nattractors in the same network. An N node module can be shown to function \nas an associative memory for up to N /2 oscillatory, or N /3 chaotic memory at(cid:173)\ntractors [Baird and Eeckman, 1993]. A key feature of a net constructed by this \nalgorithm is that the underlying dynamics is explicitly isomorphic to any of a \nclass of standard, well understood nonlinear dynamical systems - a normal form \n[Guckenheimer and Holmes, 1983]. \nThe network modules of this architecture were developed previously as models of \nolfactory cortex with distributed patterns of activity like those observed experimen(cid:173)\ntally [Baird, 1990, Freeman and Baird, 1987]. Such a biological network is dynami(cid:173)\ncally equivalent to a network in normal form and may easily be designed, simulated, \nand theoretically evaluated in these coordinates. When the intramodule competi(cid:173)\ntion is high, they are \"memory\" or winner-take-all cordinates where attractors have \none oscillator at maximum amplitude, with the other amplitudes near zero. In fig(cid:173)\nure two, the input and output modules are demonstrating a distributed amplitude \npattern ( the symbol \"T\"), and the hidden and context modules are two-attractor \nmodules in normal form coordinates showing either a right or left side active. \nIn this paper all networks are discussed in normal form coordinates. By analyz(cid:173)\ning the network in these coordinates, the amplitude and phase dynamics have a \nparticularly simple interaction. When the input to a module is synchronized with \nits intrinsic oscillation, the amplitude of the periodic activity may be considered \nseparately from the phase rotation. The module may then be viewed as a static \nnetwork with these amplitudes as its activity. \nTo illustrate the behavior of individ ualnetwork modules, we examine a binary (two(cid:173)\nattractor) module; the behavior of modules with more than two attractors is similar. \nSuch a unit is defined in polar normal form coordinates by the following equations \nof the Hopf normal form: \n\nrli \n\nrOi \n\nOli \n\nOOi \n\nj \n\n1l.irli - Cdi + (d - bsin(wclockt))rlir5i + L wtlj cos(Oj - Oli) \n1l.jr Oi - crgi + (d - bsin(wclockt))roirii + L wijlj cos(Oj - OOi) \nWi + L wt(Ij /1\u00b7li) sin(Oj - Oli) \nWi + L wij(Ij/rOi) sin(Oj - OOi) \n\nj \n\nj \n\nj \n\nThe clocked parameter bsin(wclockt) is used to implement the discrete time machine \ncycle of the Elman architecture as discussed later. It has lower frequency (1/10) \nthan the intrinsic frequency of the unit Wi. \nExamination of the phase equations shows that a unit has a strong tendency \nto synchronize with an input of similar frequency. Define the phase difference \ncp = 00 - OJ = 00 - wJt between a unit 00 and it's input OJ. For either side of a \nat zero phase difference cp = 00 - OJ = \u00b0 and a repellor at cp = 180 degrees. In \nunit driven by an input of the same frequency, WJ = Wo, There is an attractor \n\nsimulations, the interconnected network of these units described below synchro(cid:173)\nnizes robustly within a few cycles following a perturbation. If the frequencies of \nsome modules of the architecture are randomly dispersed by a significant amount, \nWJ - Wo #- 0, phase-lags appear first, then synchronization is lost in those units. An \noscillating module therefore acts as a band pass filter for oscillatory inputs. \n\n\fGrammatical Inference by Attentional Control of Synchronization \n\n71 \n\nWhen the oscillators are sychronized with the input, OJ - Oli = 0, the phase terms \ncos(Oj - Oli) = cos(O) = 1 dissappear. This leaves the amplitude equations rli \nand rOi with static inputs E j wt;Ij and E j wijlj. Thus we have network modules \nwhich emulate static network units in their amplitude activity when fully phase(cid:173)\nlocked to their input. Amplitude information is transmitted between modules, with \nan oscillatory carrier. \nFor fixed values of the competition, in a completely synchronized system, the in(cid:173)\nternal amplitude dynamics define a gradient dynamical system for a fourth order \nenergy fUllction. External inputs that are phase-locked to the module's intrinsic \noscillation simply add a linear tilt to the landscape. \nFor low levels of competition, there is a broad circular valley. When tilted by \nexternal input, there is a unique equilibrium that is determined by the bias in tilt \nalong one axis over the other. Thinking of Tli as the \"acitivity\" of the unit, this \nacitivity becomes a monotonically increasing function of input. The module behaves \nas an analog connectionist unit whose transfer function can be approximated by a \nsigmoid. We refer to this as the \"analog\" mode of operation of the module. \nWith high levels of competition, the unit will behave as a binary (bistable) digital \nflip-flop element. There are two deep potential wells, one on each axis. Hence the \nmodule performs a winner-take-all choice on the coordinates of its initial state and \nmaintains that choice \"clamped\" and independent of external input. This is the \n\"digital\" or \"quantized\" mode of operation of a module. We think of one attractor \nwithin the unit as representing \"1\" (the right side in figure two) and the other as \nrepresenting \"0\" . \n\n3 Elman Network of Oscillating Associative Memories \n\nPreviously we con-\n\nAs a benchmark for the capabilities of the system, and to create a point of con(cid:173)\ntact to standard network architectures, we have constructed a discrete-time recur(cid:173)\nrent \"Elman\" network [Elman, 1991] from oscillatory modules defined by ordinary \ndifferential equations. \nstructed a system which functions as the six Figure 1. \nstate finite automaton that perfectly recog-\nnizes or generates the set of strings defined by \nthe Reber grammar described in Cleeremans \net. al. \n[Cleeremans et al., 1989]. We found \nthe connections for this network by using the \nbackpropagation algorithm in a static network \nthat approximates the behavior of the ampli(cid:173)\ntudes of oscillation in a fully synchronized dy(cid:173)\nnamic network [Baird et al., 1993]. \nHere we construct a system that emulates \nthe larger 13 state automata similar (less one \nstate) to the one studied by Cleermans, et al \nin the second part of their paper. The graph \nof this automaton consists of two subgraph \nbranches each of which has the graph struc(cid:173)\nture of the automaton learned as above, but \nwith different assignments of transition out-\nput symbols (see fig. 1). \n\ns \n\nT \n\n\f72 \n\nBaird, Troyer, and Beckman \n\nWe use two types of modules in implementing the Elman network architecture shown \nin figure two below. The input and output layer each consist of a single associative \nmemory module with six oscillatory attractors (six competing oscillatory modes), \none for each of the six symbols in the grammar. The hidden and context layers \nconsist of the binary \"units\" above composed of a two oscillatory attractors. The \narchitecture consists of 14 binary modules ill the hidden and context layers - three \nof which are special frequency control modules. The hidden and context layers are \ndivided into four groups: the first three correspond to each of the two subgraphs plus \nthe start state, and the fourth group consists of three special control modules, each \nof which has only a special control output that perturbs the resonant frequencies of \nthe modules (by changing their values in the program) of a particular state coding \ngroup when it is at the zero attractor, as illustrated by the dotted control lines in \nfigure two. This figure shows control unit two is at the one attractor (right side \nof the square active) and the hidden units coding for states of subgraph two are \nin synchrony with the input and output modules. Activity levels oscillate up and \ndown through the plane of the paper. Here in midcycle, competition is high in all \nmodules. \n\nFigure 2. \n\nOSCILLATING ELMAN NETWORK \n\nOUTPUT \n\nINPUT \n\nThe discrete machine cycle of the Elman algorithm is implemented by the sinusoidal \nvariation (clocking) of the bifurcation parameter in the normal form equations that \ndetermines the level of intramodule competition [Baird et al., 1993]. At the begin(cid:173)\nning of a machine cycle, when a network is generating strings, the input and context \nlayers are at high competition and their activity is clamped at the bottom of deep \nbasins of attraction. The hidden and output modules are at low competition and \ntherefore behave as a traditional feedforward network free to take on analog values. \nIn this analog mode, a real valued error can be defined for the hidden and output \nunits and standard learning algorithms like backpropagation can be used to train \nthe connections. \nThen the situation reverses. For a Reber grammar there are always two equally pos(cid:173)\nsible next symbols being activated in the output layer, and we let the crosstalk noise \n\n\fGrammatical Inference by Attentional Control of Synchronization \n\n73 \n\nbreak this symmetry so that the winner-take-all dynamics of the output module can \nchose one. High competition has now also \"quantized\" and clamped the activity in \nthe hidden layer to a fixed binary vector. Meanwhile, competition is lowered in the \ninput and context layers, freeing these modules from their attractors. An identity \nmapping from hidden to context loads the binarized activity of the hidden layer \ninto the context layer for the next cycle, and an additional identity mapping from \nthe output to input module places the chosen output symbol into the input layer \nto begin the next cycle. \n\n4 Attentional control of Synchrony \n\nWe introduce a model of attention as control of program flow by selective synchro(cid:173)\nnization. The attentional controler itself is modeled in this architecture as a special \nset of three hidden modules with ouputs that affect the resonant frequencies of the \nother corresponding three subsets of hidden modules. Varying levels of intramodule \ncompetition control the large scale direction of information flow between layers of the \narchitecture. To direct information flow on a finer scale, the attention mechanism \nselects a subset of modules within each layer whose output is effective in driving the \nstate transition behavior of the system. \nBy controling the patterns of synchronization within the network we are able to \ngenerate the grammar obtained from an automaton consisting of two subgraphs \nconnected by a single transition state (figure 1). During training we enforce a seg(cid:173)\nregation of the hidden layer code for the states of the separate subgraph branches of \nthe automaton so that different sets of synchronized modules learn to code for each \nsubgraph of the automaton. Then the entire automaton is hand constructed with \nan additional hidden module for the start state between the branches. Transitions \nin the system from states in one subgraph of the automaton to the other are made \nby \"attending\" to the corresponding set of nodes in the hidden and context layers. \nThis switching of the focus of attention is accomplished by changing the patterns \nof synchronization within the network which changes the flow of communication \nbetween modules. \nEach control module modulates the intrinsic frequency of the units coding for the \nstates a single su bgraph or the unit representing the start state. The control modules \nrespond to a particular input symbol and context to set the intrinsic frequency of \nthe proper subset of hidden units to be equal to the input layer frequency. As \ndescribed earlier, modules can easily be desynchronized by perturbing their resonant \nfrequencies. By perturbing the frequencies of the remaining modules away from the \ninput frequency, these modules are no longer communicating with the rest of the \nnetwork. Thus coherent information flows from input to output only through one \nof three channels. Viewing the automata as a behavioral program, the control \nof synchrony constitutes a control of the program flow into its subprograms (the \nsubgraphs of the automaton). \nWhen either exit state of a subgraph is reached, the \"B\" (begin) symbol is then \nemitted and fed back to the input where it is connected through the first to second \nlayer weight matrix to the attention control modules. It turns off the synchrony \nof the hidden states of the subgraph and allows entrainment of the start state to \nbegin a new string of symbols. This state in turn activates both a \"T\" and a \"P' in \nthe output module. The symbol selected by the crosstalk noise and fed back to the \ninput module is now connected to the control modules through the weight matrix. \nIt desynchronizes the start state module, synchronizes in the subset of hidden units \n\n\f74 \n\nBaird. Troyer. and Eeckman \n\ncoding for the states of the appropriate subgraph, and establishes there the start \nstate pattern for that subgraph. \nFuture work will investigate the possibilities for self-organization of the patterns of \nsynchrony and spatially segregated coding in the hidden layer during learning. The \nweights for entire automata, including the special attention control hidden units, \nshould be learned at once. \n\n4.1 Acknowledgments \n\nSupported by AFOSR-91-0325, and a grant from LLNL. It is a pleasure to acknowl(cid:173)\nedge the invaluable assistance of Morris Hirsch, and Walter Freeman. \n\nReferences \n\n[Baird, 1990] Baird, B. (1990). Bifurcation and learning in network models of oscil(cid:173)\n\nlating cortex. In Forest, S., editor, Emergent Computation, pages 365-384. North \nHolland. also in Physica D, 42. \n\n[Baird and Eeckman, 1993] Baird, B. and Eeckman, F. H. (1993). A normal form \n\nprojection algorithm for associative memory. In Hassoun, M. H., editor, Asso(cid:173)\nciative Neural Memories: Theory and Implementation, New York, NY. Oxford \nUniversity Press. \n\n[Baird et al., 1993] Baird, B., Troyer, T., and Eeckman, F. H. (1993). Synchro(cid:173)\n\nnization and gramatical inference in an oscillating elman network. In Hanson, \nS., Cowan, J., and Giles, C., editors, Advances in Neural Information Processing \nSystems S, pages 236-244. Morgan Kaufman. \n\n[Bressler and Nakamura, 1993] Bressler, S. and Nakamura. (1993). Interarea syn(cid:173)\nchronization in Macaque neocortex during a visual discrimination task. In Eeck(cid:173)\nman,F. H., and Bower, J., editors, Computation and Neural Systems, page 515. \nKluwer. \n\n[Cleeremans et al., 1989] Cleeremans, A., Servan-Schreiber, D., and McClelland, J. \n\n(1989). Finite state automata and simple recurrent networks. Neural Computa(cid:173)\ntion, 1(3):372-381. \n\n[Elman, 1991] Elman, J. (1991). Distributed representations, simple recurrent net(cid:173)\n\nworks and grammatical structure. Machine Learning, 7(2/3):91. \n\n[Freeman and Baird, 1987] Freeman, W. and Baird, B. (1987). Relation of olfactory \n\nEEG to behavior: Spatial analysis. Behavioral Neuroscience, 101:393-408. \n\n[Gray and Singer, 1987] Gray, C. M. and Singer, W. (1987). Stimulus dependent \nneuronal oscillations in the cat visual cortex area 17. Neuroscience [Supplj, \n22:1301P. \n\n[Guckenheimer and Holmes, 1983] Guckenheimer, J. and Holmes, D. (1983). Non(cid:173)\nlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields. \nSpringer, New York. \n\n\f", "award": [], "sourceid": 817, "authors": [{"given_name": "Bill", "family_name": "Baird", "institution": null}, {"given_name": "Todd", "family_name": "Troyer", "institution": null}, {"given_name": "Frank", "family_name": "Eeckman", "institution": null}]}