{"title": "Associative Memory in a Simple Model of Oscillating Cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 68, "page_last": 75, "abstract": null, "full_text": "68 \n\nBaird \n\nAssociative Memory in a Simple Model of \n\nOscillating Cortex \n\nBill Baird \n\nDept Molecular and Cell Biology, \nU .C.Berkeley, Berkeley, Ca. 94720 \n\nABSTRACT \n\nA generic model of oscillating cortex, which assumes \"minimal\" \ncoupling justified by known anatomy, is shown to function as an as(cid:173)\nsociative memory, using previously developed theory. The network \nhas explicit excitatory neurons with local inhibitory interneuron \nfeedback that forms a set of nonlinear oscillators coupled only by \nlong range excitatofy connections. Using a local Hebb-like learning \nrule for primary and higher order synapses at the ends of the long \nrange connections, the system learns to store the kinds of oscil(cid:173)\nlation amplitude patterns observed in olfactory and visual cortex. \nThis rule is derived from a more general \"projection algorithm\" \nfor recurrent analog networks, that analytically guarantees content \naddressable memory storage of continuous periodic sequences -\ncapacity: N /2 Fourier components for an N node network -\n\"spurious\" attractors. \n\nno \n\nIntroduction \n\n1 \nThis is a sketch of recent results stemming from work which is discussed completely \nin [1, 2, 3]. Patterns of 40 to 80 hz oscillation have been observed in the large \nscale activity of olfactory cortex [4] and visual neocortex [5], and shown to predict \nthe olfactory and visual pattern recognition responses of a trained animal. It thus \nappears that cortical computation in general may occur by dynamical interaction of \nresonant modes, as has been thought to be the case in the olfactory system. Given \nthe sensitivity of neurons to the location and arrival times of dendritic input, the \n\n\fAssociative Memory in a Simple Model of Oscillating Cortex \n\n69 \n\nsucessive volleys of pulses that are generated by the collective oscillation of a neu(cid:173)\nral net may be ideal for the formation and reliable longe range transmission of the \ncollective activity of one cortical area to another. The oscillation can serve a macro(cid:173)\nscopic clocking function and entrain the relevant microscopic activity of disparate \ncortical regions into well defined phase coherent macroscopic collective states which \noveride uncorrelated microscopic activity. If this view is correct, then oscillatory \nnetwork modules form the actual cortical substrate of the diverse sensory, motor, \nand cognitive operations now studied in static networks, and it must ultimately be \nshown how those functions can be accomplished with these dynamic networks. \n\nIn particular, we are interested here in modeling category learning and object recog(cid:173)\nnition, after feature preprocessing. Equivalence classes of ratios of feature outputs \nin feature space must be established as prototype \"objects\" or categories that are \ninvariant over endless sensory instances. Without categories, the world never re(cid:173)\npeats. This is the kind of function generally hypothesized for prepyriform cortex \nin the olfactory system[6}, or inferotemporal cortex in the visual system. It is a \ndifferent oscillatory network function from the feature \"binding\", or clustering role \nthat is hypothesized for \"phase labels\" in primary visual cortex [5], or from the \n\"decision states\" hypothesized for the olfactory bulb by Li and Hopfield. In these \npreprocessing systems, there is no modification of connections, and no learning of \nparticular perceptual objects. For category learning, full adaptive cross coupling \nis required so that all possible input feature vectors may be potential attractors. \nThis is the kind of anatomical structure that characterizes prepyriform and infer(cid:173)\notemporal cortex. The columns there are less structured, and the associational \nfiber system is more prominent than in primary cortex. Man shares this same high \nlevel \"association\" cortex structure with cats and rats. Phylogenetic ally, it is the \npreprocessing structures of primary cortex that have grown and evolved to give us \nour expanded capabilities. While the bulk of our pattern recognition power may be \ncontributed by the clever feature preprocessing that has developed, the object clas(cid:173)\nsification system seems the most likely locus of the learning changes that underlie \nour daily conceptual evolution. That is the phenomenon of ultimate interest in this \nwork. \n\n2 Minimal Model of Oscillating Cortex \nAnalog state variables, recurrence, oscillation, and bifurcation are hypothesized \nto be essential features of cortical networks which we explore in this approach. \nExplicit modeling of known excitatory and inhibitory neurons, and use of only \nknown long range connections is also a basic requirement to have a biologically \nfeasible network architecture. We analyse a \"minimal\" model that is intended to \nassume the least coupling that is justified by known anatomy, and use simulations \nand analytic results proved in [1, 2] to argue that an oscillatory associative memory \nfunction can be realized in such a system. The network is meant only as a cartoon \nof the real biology, which is designed to reveal the general mathematical principles \nand mechanisms by which the actual system might function. Such principles can \nthen be observed or applied in other contexts as well. \n\n\f70 \n\nBaird \n\nLong range excitatory to excitatory connections are well known as \"associational\" \nconnections in olfactory cortex [6] , and cortic~cortico connections in neocortex. \nSince our units are neural populations, we know that some density of full cross(cid:173)\ncoupling exists in the system [6] , and our weights are the average synaptic strengths \nof these connections. There is little problem at the population level with coupling \nsymmetry in these average connection strenghts emerging from the operation of an \nouter product learning rule on initially random connections. When the network \nunits are neuron pools, analog state variables arise naturally as continuous local \npulse densities and cell voltage averages. Smooth sigmoidal population input-output \nfunctions, whose slope increases with arousal of the animal, have been measured in \nthe olfactory system [4] . Local inhibitory \"interneurons\" are a ubiquitous feature \nof the anatomy of cortex throughout the brain [5] . It is unlikely that they make \nlong range connections (> 1 mm) by themselves. These connections, and even the \ndebated interconnections between them, are therefore left out of a minimal model. \nThe resUlting network is actually a fair caricature of the well studied circuitry of \nolfactory (prepyriform) cortex. This is thought to be one of the clearest cases of a \nreal biological network with associative memory function [6]. Although neocortex \nis far more complicated, it may roughly be viewed as two olfactory cortices stacked \non top of each other. We expect that analysis of this system will lend insight into \nmechanisms of associative memory there as well. In [3] we show that this model \nis capable of storing complicated multifrequency spati~temporal trajectories, and \nargue that it may serve as a model of memory for sequences of actions in motor \ncortex. \nFor an N dimensional system, the \"minimal\" coupling structure is described math(cid:173)\nematically by the matrix \n\nT=[~ -hI] o \n' \n\nwhere W is the N /2 x N /2 matrix of excitatory interconnections, and gI and hI are \nN /2 x N /2 identity matrices multiplied by the positive scalars g, and h. These give \nthe strength of coupling around local inhibitory feedback loops. A state vector is \ncomposed of local average cell voltages for N /2 excitatory neuron populations x and \nN/2 inhibitory neuron populations y (hereafter notated as x, Y E RN/2). Standard \nnetwork equations with this coupling might be, in component form, \n\nN/2 \n\n-TXj - hU(Yi) + L WijU(Xj) + hi \n-TYi + gU(Xi), \n\nj=l \n\nYi \n\n(1) \n\n(2) \n\nwhere u(x) = tanh(x) or some other sigmoidal function symmetric about O. In(cid:173)\ntuitively, since the inhibitory units Yi receive no direct input and give no direct \noutput, they act as hidden units that create oscillation for the amplitude patterns \nstored in the excitatory cross-connections W. This may be viewed as a simple gen(cid:173)\neralization of the analog \"Hopfield\" network architecture to store periodic instead \nof static attractors. \n\n\fAssociative Memory in a Simple Model of Oscillating Cortex \n\n71 \n\nIf we expand this network to third order in a Taylors series about the origin, we get \na network that looks something like, \n\n-TXi - hYi + L WijXj - L WijklXjXkXl + bi, \n\nNI2 \n\nNI2 \n\nj=l \n\njkl=l \n\n(3) \n\nYi \n\n(4) \nwhere 0\"(0) = 1, and ~O''''(O)( < 0) is absorbed into Wijkl. A sigmoid symmetric \nabout zero has odd symmetry, and the even order terms of the expansion vanish, \nleaving the cubic terms as the only nonlinearity. The actual expansion of the ex(cid:173)\ncitatory sigmoids in (1,2) (in this coordinate system) will only give cubic terms of \nthe form Ef~~ WijXl- The competitive (negative) cubic terms of (3) therefore con(cid:173)\nstitute a more general and directly programmable nonlinearity that is independent \nof the linear terms. They serve to create multiple periodic at tractors by causing \nthe oscillatory modes of the linear term to compete, much as the sigmoidal non(cid:173)\nlinearity does for static modes in a Hopfield network. Intuitively, these terms may \nbe thought of as sculpting the maxima of a \"saturation\" landscape into which the \nstored linear modes with positive eigenvalues expand, and positioning them to lie \nin the directions specified by the eigenvectors of these modes to make them stable. \nA precise definition of this landscape is given by a strict Liapunov function in a \nspecial polar coordinate system[l, 3]. Since we have had no success storing multiple \noscillatory at tractors in the sigmoid net (1,2) by any learning rule, we are driven \nto take this very effective higher order net seriously as a biological model. From a \nphysiological point of view, (3,4) may be considered a model of a biological network \nwhich is operating in the linear region of the known axonal sigmoid nonlinearities[4], \nand contains instead sigma-pi units or higher order synaptic nonlinearities. \n\n2.1 Biological justification of the higher order synapses \n\nUsing the long range excitatory connections available, the higher order synaptic \nweights Wijkl can conceivably be realized locally in the ax~dendritic interconnec(cid:173)\ntion plexus known as \"neuropil\". This a feltwork of tiny fibers so dense that it's \nexact circuitry is impossible to investigate with present experimental techniques. \nSingle axons are known to bifurcate into multiple branches that contribute separate \nsynapses to the dendrites of target cells. It is also well known that neighboring \nsynapses on a dendrite can interact in a nonlinear fashion that has been modeled \nas higher order synaptic terms by some researchers. It has been suggested that the \nneuropil may be dense enough to allow the crossing of every possible combination of \njk/ axons in the vicinity of some dendritic branch of at least one neuron in neuron \npool i (B. Mel). Trophic factors stimulated by the coactivation of the axons and the \ndendrite could cause these axons to form of a \"cluster\" of nearby synapses on the \ndendrite to realize a jk/ product synapse. The required higher order terms could \nthus be created by a Hebb-like process. The use of competitive cubic cross terms \nmay therefore be viewed physiologically as the use of this complicated nonlinear \nsynaptic/dendritic processing, as the decision making nonlinearity in the system, as \n\n\f72 \n\nBaird \n\nopposed to the usual sigmoidal axonal nonlinearity. There are more weights in the \ncubic synaptic terms, and the network nonlinearity can be programmed in detail. \n\n3 Analysis \nThe real eigenvectors of W give the magnitudes of the complex eigenvectors of T. \n\nTheorem 3.1 If a is a real eigenvalue of the N/2 x N/2 matrix W, with corre(cid:173)\nsponding eigenvector x, then the N x N matrix \n\nhas a pair of complex conjugate eigenvalues ~1,2 = 1/2(a\u00b1.ja2 - 4hg) = 1/2(a\u00b1iw), \nfor a 2 < 4hg , where w = .j4hg - a 2. The corresponding complex conjugate pair of \neigenvectors are \n\n[ ~ ] \u00b1 i [cr!w ]. \n\n2h X \n\n2h X \n\nThe proof of this theorem is given in [2]. To more clearly see the amplitude and \nphase patterns, we can convert to a magnitUde and phase representation~/ 2 Izl~i9, \nwhere IZj 1 = .j~t + ~t, and OJ = arctan(~zJ/(~zJ. We get, IZXi 1 = \nxi + xi = \nv'2lxil , and \n\n1 1 \nZYi = \n\n2(a2 + w2) ~ -_ /4h 9 1 .1__ f2i1 .1 \n2h2 XI V h XI \u2022 \n\n4h2 \n\nXI \n\nNow Ox = arctan 1 = 7r/4, Oy = arctan ~+~. Dividing out the common v'2 factor in \nthe magnitudes, we get eigenvectors that clearly display the amplitude patterns of \ninterest. \n\nBecause of the restricted coupling, the oscillations possible in this network are \nstanding waves, since the phase Ox, Oy is constant for each kind of neuron X and y, \nand differs only between them. This is basically what is observed in the olfactory \nbulb (primary olfactory cortex) and prepyriform cortex. The phase of inhibitory \ncomponents Oy in the bulb lags the phase of the excitatory components Ox by ap(cid:173)\nproximately 90 degrees. It is easy to choose a and w in this model to get phase lags \nof nearly 90 degrees. \n\n3.1 Learning by the projection algorithm \n\nFrom the theory detailed in [1], we can program any linearly independent set of \neigenvalues and eigenvectors into W by the \"projection\" operation W = BDB-l, \nwhere B has the desired eigenvectors as columns, and D is a diagonal matrix of \nthe desired eigenvalues. Because the complex eigenvectors of T follow from these \n\n\fAssociative Memory in a Simple Model of Oscillating Cortex \n\n73 \n\nlearned for W, we can form a projection matrix P with those eigenvectors of T as \ncolumns. Forming also a matrix J of the complex eigenvalues of T in blocks along the \ndiagonal, we can project directly to get T. If general cubic terms Iij'\" XjX\"X\" also \ngiven by a specific projection operation, are added to network equations with linear \nterms Ii; x;, the complex modes (eigenvectors) of the linearization are analytically \nguaranteed by the projection theorem[l] to characterize the periodic attractors of \nthe network vector field. Chosen \"normal form\" coeficients Amn [1] are projected to \nget the higher order synaptic weights Ii;\", for these general cubic terms. Together, \nthese operations constitute the \"normal form projection algorithm\": \n\nT=PJP- l , \n\nN \n\nIi;\",= L PimAmnP;;;]P;;\"lp;;/. \n\nm,n=l \n\nEither member of the pair of complex eigenvectors shown above will suffice as the \neigenvector that is entered in the P matrix for the projection operation. For real \nand imaginary component columns in P, \n\n\u2022 \n\n- [ \n\n=> X (t) -\n\np_ \n-\n\n[ \n\n... J \n... \n\nIx\u00b71 sin 0; \nJflx\u00b71 sin 0; \n\nIx\u00b7 I cos o\u00b7 \nx \nJflx\u00b71 cosO; \n\nIx\u00b7lei9!+iw\u00b7t J \n, \nJflx.lei9~+iw't \nwhere x\u00b7 (t) is an expression for the periodic attractor established for pattern s \nwhen this P matrix is used in the projection algorithm. \nThe general cubic terms Tij'\" x;x\"x\" however, require use of unlikely long range \ninhibitory connections. Simulations of two and four oscillator networks thus far \n(N=4 and N=8), reveal that use of the higher order terms for only the anatomically \njustified long range excitatory connections Wij\"', as in the cubic net (3,4), is effective \nin storing randomly chosen sets of desired patterns. The behavior of this network \nis very close to the theoretical ideal guaranteed above for a network with general \nhigher order terms. There is no alteration of stored oscillatory patterns when the \nreduced coupling is used. \nWe have at least general analytic justification for this. \"Normal form\" theory[l, 3] \nguarantees that many other choices of weights will do the same job as the those found \nby the projection operation, but does not in general say how to find them. Latest \nwork shows that a perturbation theory calculation of the normal form coefficients \nfor general high dimensional cubic nets is tractable and in principle permits the \nremoval of all but N2 of the N4 higher order weights normally produced by the \nprojection algorithm. We have already incorporated this in an improved learning \nrule (non-Hebbian thus far) which requires even fewer of the excitatory higher order \nweights \u00abN)2 instead of the (N /2)4 used in (3\u00bb, and are exploring the size of the \n\"neighborhood\" of state space about the origin in which the rule is effective. This \nshould lead as well to a rigorous proof of the performance of these networks. \n\n3.2 Learning by local Hebb rules \nWe show further in [2, 1] that for orthonormal static patterns x\u00b7, the projection \noperation for the W matrix reduces to an outer product, or \"Hebb\" rule, and the \n\n\f74 \n\nBaird \n\nprojection for the higher order weights becomes a multiple outer product rule: \n\nWi; = La'xix} , Wi;1:l = c Oij01:l - d Lxi xjXkx; . \n\n(5) \n\nN/2 \n\n,=1 \n\nN/2 \n\n.=1 \n\nThe first rule is guaranteed to establish desired patterns x' as eigenvectors of the \nmatrix W with corresponding eigenvalues a'. The second rule, with c > d, gives \nhigher order weights for the cubic terms in (3) that ensure the patterns defined by \nthese eigenvectors will appear as at tractors in the network vectorfield. The outer \nproduct is a local synapse rule for synapse ij, that allows additive and incremental \nlearning. The system can be truly self-organizing because the net can modify itself \nbased on its own activity. The rank of the coupling matrix Wand T grows as \nmore memories are learned by the Hebb rule, and the unused capacity appears as \na degenerate subspace with all zero eigenvalues. The flow is thus directed toward \nregions of the state space where patterns are stored. \nIn the minimal net, real eigenvectors learned for Ware converted by the network \nstructure to standing wave oscillations (constant phase) with the absolute value \nof those eigenvectors as amplitudes. From the mathematical perspective, there are \n(N /2)! eigenvectors with different permutations of the signs of the same components, \nwhich lead to the same positive amplitude vector. This means that nonorthogonal \namplitude patterns may be stored by the Hebb rule on the excitatory connections, \nsince there may be many ways to find a perfectly orthonormal set of eigenvectors for \nW that stores a given set of nonorthogonal amplitude vectors. Given the complexity \nof dendritic processing discussed previously, it is not impossible that there is some \ndistribution of the signs of the final effect of synapses from excitatory neurons that \nwould allow a biological system to make use of this mathematical degree of freedom. \nFor different input objects, feature preprocessing in primary and secondary sensory \ncortex may be expected to orthogonalize outputs to the object recognition systems \nmodeled here. When the rules above are used for nonorthogonal patterns, the \neigenvectors of Wand T are no longer given directly by the Hebb rule, and we \nexpect that the kind of performance found in Hopfield networks for nonorthogonal \nmemories will obtain, with reduced capacity and automatic clustering of similar \nexemplars. Investigation of this unsupervised induction of categories from training \nexamples will be the subject of future work[3). \n\n3.3 Architectural Variations - Olfactory Bulb Model \n\nAnother biologically interesting architecture which can store these kinds of patterns \nis one with associational excitatory to inhibitory cross-coupling. This may be a \nmore plausible model of the olfactory bulb (primary olfactory cortex) than the one \nabove. Experimental work of Freeman suggests an associative memory function for \nthis cortex as well[4). The evidence for long range excitatory to excitatory coupling \nin the olfactory bulb is much weaker than that for the prepyriform cortex. Long \nrange excitatory tracts connecting even the two halves of the bulb are known, but \nanatomical data thus far show these axons entering only the inhibitory granuel cell \n\n\fAssociative Memory in a Simple Model of Oscillating Cortex \n\n7S \n\nlayers. \n\nT = [fJ., -~1] , A1,2 = 1/2(g \u00b1 y'g2 - 4ag) = 1/2(g \u00b1 iw), \n\n. [ x ] \n\n,::} P = \n\n.. . ] , \n... \n\nIx' I sin O! \ny'flx'i sin 0; \n\nfor g2 < 4ag , where w = y'4ag - g2. The eigenvectors are, \n[ x] \n[ Ix'i cos O! \n{}' \nI?\u00a3I'I \ng+w \u00b1. g-w \nV 1l x cos f/ \n2h x \n2h X \nin polar form, where O~ = 7r /4, and 0; = arctan ~+~ . \nIf we add inhibitory population self-feedback - f to either model, this additional \nterm appears subtracted from a or 9 in the real part of the complex eigenvalues, \nand added to them in all other expressions[2]. Further extensions of this line of \nanalysis will consider lateral inhibitory fan out of the inhibitory - excitatory feedback \nconnections. The -hI block of the coupling matrix T becomes a banded matrix. \nSimilarly, the gl and - fI may be banded, or both full excitatory to excitatory \nWand full excitatory to inhibitory V coupling blocks may be considered. We \nconjecture that the phase restrictions of the minimal model will be relaxed with \nthese further degrees of freedom available, so that traveling waves may exist. \n\n3.3.1 Acknowledgements \n\nSupported by AFOSR-87-0317. It is a pleasure to acknowledge the support of \nWalter Freeman and invaluable assistance of Morris Hirsch. \n\nReferences \n[1] B Baird. A bifurcation theory approach to vector field programming for periodic \nattractors. In Proc. Int. Joint Conf. on Neural Networks, Wash. D. C., page \n1381, June 18 1989. \n\n[2] B. Baird. Bifurcation and learning in network models of oscillating cortex. In \nS. Forest, editor, Proc. Conf. on Emergent Computation, Los Alamos, May 1989, \n1990. to appear-Physica D. \n\n[3] B. Baird. Bifurcation Theory Approach to the Analysis and Synthesis of Neural \nNetworks for Engineering and Biological Modelling. Research Notes in Neural \nComputing. Springer, 1990. \n\n[4] W.J. Freeman. Mass Action in the Nervous System. Academic Press, New York, \n\n1975. \n\n[5] C. M. Grey and W. Singer. Stimulus dependent neuronal oscillations in the cat \n\nvisual cortex area 17. Neuroscience {Suppl}, 22:1301P, 1987. \n\n[6] Lewis B. Haberly and James M. Bower. Olfactory cortex: model circuit for \n\nstudy of associative memory? Trends in Neuroscience, 12(7):258, 1989. \n\n\f", "award": [], "sourceid": 200, "authors": [{"given_name": "Bill", "family_name": "Baird", "institution": null}]}