{"title": "Rate- and Phase-coded Autoassociative Memory", "book": "Advances in Neural Information Processing Systems", "page_first": 769, "page_last": 776, "abstract": null, "full_text": "Rate- and Phase-coded Autoassociative Memory\n\n\n Mt Lengyel Peter Dayan\n Gatsby Computational Neuroscience Unit, University College London\n 17 Queen Square, London WC1N 3AR, United Kingdom\n {lmate,dayan}@gatsby.ucl.ac.uk\n\n\n Abstract\n\n Areas of the brain involved in various forms of memory exhibit patterns\n of neural activity quite unlike those in canonical computational models.\n We show how to use well-founded Bayesian probabilistic autoassociative\n recall to derive biologically reasonable neuronal dynamics in recurrently\n coupled models, together with appropriate values for parameters such as\n the membrane time constant and inhibition. We explicitly treat two cases.\n One arises from a standard Hebbian learning rule, and involves activity\n patterns that are coded by graded firing rates. The other arises from a\n spike timing dependent learning rule, and involves patterns coded by the\n phase of spike times relative to a coherent local field potential oscillation.\n Our model offers a new and more complete understanding of how neural\n dynamics may support autoassociation.\n\n\n1 Introduction\n\nAutoassociative memory in recurrently coupled networks seems fondly regarded as hav-\ning been long since solved, at least from a computational perspective. Its neurobiological\nimportance, as a model of episodic (event) memory storage and retrieval (from noisy and\npartial inputs) in structures such as area CA3 in the hippocampus, is of course clear [1].\nThis perhaps suggests that it is only the exact mapping of the models to the neural substrate\nthat holds any remaining theoretical interest.\n\nHowever, the characteristic patterns of activity in areas such as CA3 that are involved in\nmemory are quite unlike those specified in the bulk of models. In particular neurons (for\ninstance hippocampal place cells) show graded activity during recall [2], prominent theta\nfrequency oscillations [3] and an apparent variety of rules governing synaptic plasticity\n[4, 5]. The wealth of studies of memory capacity of attractor networks of binary units does\nnot give many clues to the specification, analysis or optimization of networks acting in\nthese biologically relevant regimes. In fact, even theoretical approaches to autoassociative\nmemories with graded activities are computationally brittle.\n\nHere, we generalize previous analyses [6, 7] to address these issues. Formally, these mod-\nels interpret recall as Bayesian inference based on information given by the noisy input,\nthe synaptic weight matrix, and prior knowledge about the distribution of possible activ-\nity patterns coding for memories. More concretely (see section 2), the assumed activity\npatterns and synaptic plasticity rules determine the term in neuronal update dynamics that\ndescribes interactions between interconnected cells. Different aspects of biologically rea-\nsonable autoassociative memories arise from different assumptions. We show (section 3)\n\n We thank Boris Gutkin for helpful discussions on the phase resetting characteristics of different\nneuron types. This work was supported by the Gatsby Charitable Foundation.\n\n\f\nthat for neurons are characterized by their graded firing rates, the regular rate-based charac-\nterization of neurons effectively approximates optimal Bayesian inference. Optimal values\nfor parameters of the update dynamics, such as level of inhibition or leakage conductance,\nare inherently provided by our formalism. We then extend the model (section 4) to a set-\nting involving spiking neurons in the context of a coherent local field potential oscillation\n(LFPO). Memories are coded by the the phase of the LFPO at which each neuron fires, and\nare stored by spike timing dependent plasticity. In this case, the biophysically plausible\nneuronal interaction function takes the form of a phase reset curve: presynaptic firing ac-\ncelerates or decelerates the postsynaptic cell, depending on the relative timing of the two\nspikes, to a degree that is proportional to the synaptic weight between the two cells.\n\n\n2 MAP autoassociative recall\n\nThe first requirement is to specify the task for autoassociative recall in a probabilistically\nsound manner. This specification leads to a natural account of the dynamics of the neurons\nduring recall, whose form is largely determined by the learning rule. Unfortunately, the\nfull dynamics includes terms that are not purely local to the information a post-synaptic\nneuron has about pre-synaptic activity, and we therefore consider approximations that re-\nstore essential characteristics necessary to satisfy the most basic biological constraints. We\nvalidate the quality of the approximations in later sections.\n\nThe construction of the objective function: Consider an autoassociative network which\nhas stored information about M memories x1 . . . xM in a synaptic weight matrix, W be-\ntween a set of N neurons. We specify these quantities rather generally at first to allow for\ndifferent ways of construing the memories later. The most complete probabilistic descrip-\ntion of its task is to report the conditional distribution P [x|~\n x, W] over the activities x given\nnoisy inputs ~\n x and the weights. The uncertainty in this posterior distribution has two roots.\nFirst, the activity pattern referred to by the input is unclear unless there is no input noise.\nSecond, biological synaptic plasticity rules are data-lossy `compression algorithms', and\nso W specifies only imprecise information about the stored memories.\n\nIn an ideal case, P [x|~\n x, W] would have support only on the M stored patterns x1 . . . xM .\nHowever, biological storage methods lead to weights W that permit a much greater range\nof possibilities. We therefore consider methods that work in the full space of activities x.\nIn order to optimize the probability of extracting just the correct memory, decision theory\nencourages us to maximize the posterior probability [8]:\n\n ^\n x := argmax P [x|~\n x, W] , P [x|~\n x, W] P [x] P [~\n x|x] P [W|x] (1)\n x\n\nThe first term in Eq.1 imports prior knowledge of the statistical characteristics of the memo-\nries, and is assumed to factorize: P [x] := P\n i x [xi]. The second term describes the noise\nprocess corrupting the inputs. For unbiased noise it will be a term in x that is effectively\ncentered on ~\n x. We assume that the noise corrupting each element of the patterns is indepen-\ndent, and independent of the original pattern, so P [~\n x|x] := P [~\n x P [~\n x\n i i|x] := i i|xi].\n\nThe third term assesses the likelihood that the weight matrix came from a training set of\nsize M including pattern x.1 Biological constraints encourage consideration of learning\nupdates for the synapse from neuron j to neuron i that are local to the pre-synaptic (xm)\n j\nand post-synaptic (xm) activities of connected neurons when pattern xm is stored:\n i\n\n wm := xm, xm\n i,j i j (2)\n\nWe assume the contributions of individual training patterns are additive, Wi,j :=\n wm , and that there are no autapses in the network, W\n m i,j i,i := 0.\n\n\n 1Uncertainty about M could also be incorporated into the model, but is neglected here.\n\n\f\nStoring a single random pattern drawn from the prior distribution will result in a synaptic\nweight change with a distribution determined by the prior and the learning rule, having\nw = (x1, x2) mean, and 2 = 2 (x - 2\n P 1, x2)\n x [x1 ]Px [x2 ] w P w\n x [x1 ]Px [x2 ]\nvariance. Storing M - 1 random patterns means adding M - 1 iid. random variables and\nthus, for moderately large M , results in a synaptic weight with an approximately Gaussian\ndistribution P [Wi,j] G (Wi,j; W , W ), with mean W = (M - 1) w and variance\n2 = (M - 1) 2 . Adding a further particular pattern x is equivalent to adding a\n W w\nrandom variable with a mean determined by the learning rule, and zero variance, thus:\n\n P [Wi,j|xi, xj] G (Wi,j; W + (xi, xj) , W ) (3)\n\nWe also make the approximation that elements of the synaptic weight matrix are indepen-\ndent, and thus write: P [W|x] := P [W\n i,j=i i,j |xi, xj ].\n\nHaving restricted our horizons to maximum a posteriori (MAP) inference, we can consider\nas an objective function the log of the posterior distribution. In the light of our factorizabil-\nity assumptions, this is\n\n O (x) = log P [x] + log P [~\n x|x] + log P [W|x] (4)\n = log P [x log P [~\n x log P [W\n i i] + i i|xi] + i,j=i i,j |xi, xj ]\n\n\nNeuronal update dynamics: Finding the global maximum of the objective function, as\nstated in equation 1, is computationally extravagant, and biologically questionable. We\ntherefore specify neuronal dynamics arising from gradient ascent on the objective function:\n\n x x xO (x) . (5)\n\nCombining equations 4 and 5 we get\n\n dxi\n x = log P [x] + log P [~\n x|x] + log P [W|x] , where (6)\n dt xi xi xi\n log P [W|x] = log P [W log P [W\n x i,j |xi, xj ] + \n j,i|xj , xi] . (7)\n i j=i xi xi\n\nThe first two terms in equation 6 only depend on the activity of the neuron itself and its\ninput. For example, for a Gaussian prior Px [xi] = G (Wi,j; x, x) and unbiased Gaussian\nnoise on the input P [~\n xi|xi] = G (~\n xi; xi, ~x), these would be:\n\n d log P [x log P [~\n x + ~\n xi-xi = x - 1 + 1 x (8)\n dx i] + d\n i|xi] = x-xi i + ~\n xi\n i dxi 2 2 2\n x 2 2 2\n ~\n x x x ~\n x ~\n x\n\n\nThe first term on the right-hand side of the last equality expresses a constant bias; the\nsecond involves self-decay; and the third describes the effect of the input.\n\nThe terms in equation 7 indicate how a neuron should take into account the activity of other\nneurons based on the synaptic weights. From equation 3, the terms are\n\n log P [W (W (x (x\n x i,j |xi, xj ]= 1\n i,j - W ) \n i, xj ) - (xi, xj ) \n i, xj ) (9)\n i 2 x x\n W i i\n\n log P [W (W (x (x\n x j,i|xj , xi]= 1\n j,i - W ) \n j , xi) - (xj , xi) \n j , xi) (10)\n i 2 x x\n W i i\n\nTwo aspects of the above formul are biologically troubling. The last terms in each ex-\npress the effects of other cells, but without there being corresponding synaptic weights. We\napproximate these terms using their mean values over the prior distribution. In this case\n+ = (x (x = (x (x\n i i, xj ) \n x i, xj ) P j , xi) \n j , xi) P\n i x [xj ] and -\n i xi x [xj ] con-\ntribute terms that only depend on the activity of the updated cell, and so can be lumped\nwith the prior- and input-dependent terms of Eq.8.\n\nFurther, equation 10 includes synaptic weights, Wj,i, that are postsynaptic with respect\nto the updated neuron. This would require the neuron to change its activity depending\non the weights of its postsynaptic synapses. One simple work-around is to approximate\n\n\f\na postsynaptic weight by the mean of its conditional distribution given the corresponding\npresynaptic weight: Wj,i P [Wj,i|Wi,j] . In the simplest case of perfectly symmetric\nor anti-symmetric learning, with (xi, xj) = (xj, xi), we have Wj,i = Wj,i and\n+ = - = \n i i i. In the anti-symmetric case w = 0.\n\nMaking these assumptions, the neuronal interaction function simplifies to\n\n H (xi, xj) = (Wi,j - W ) (x\n x i, xj ) (11)\n i\n\n\nand 2 H (x\n 2 j=i i, xj ) - (N - 1) i is the weight-dependent term of equation 7.\n W\nEquation 11 shows that there is a simple relationship between the synaptic plasticity rule,\n (xi, xj), and the neuronal interaction function, H (xi, xj), that is approximately optimal\nfor reading out the information that is encoded in the synaptic weight matrix by that synap-\ntic plasticity rule. It also shows that the magnitude of this interaction should be proportional\nto the synaptic weight connecting the two cells, Wi,j.\n\nWe specialize this analysis to two important cases with (a) graded, rate-based, or (b) spik-\ning, oscillatory phase-based, activities. We derive appropriate dynamics from learning\nrules, and show that, despite the approximations, the networks have good recall perfor-\nmance.\n\n\n3 Rate-based memories\n\nThe most natural assumption about pattern encoding is that the activity of each unit is\ninterpreted directly as its firing rate. Note, however, that most approaches to autoassociative\nmemory assume binary patterns [9], sitting ill with the lack of saturation in cortical or\nhippocampal neurons in the appropriate regime. Experiments [10] suggest that regulating\nactivity levels in such networks is very tricky, requiring exquisitely carefully tuned neuronal\ndynamics. There has been work on graded activities in the special case of line or surface\nattractor networks [11, 12], but these also pose dynamical complexitiese. By contrast,\ngraded activities are straightforward in our framework.\n\nConsider Hebbian covariance learning: cov (xi, xj) := Acov (xi - x) (xj - x), where\nAcov > 0 is a normalizing constant and x is the mean of the prior distribution of the\npatterns to be stored. The learning rule is symmetric, and so, based on Eq.11, the optimal\nneuronal interaction function is Hcov (xi, xj) = Acov (Wi,j - W ) (xj - x). This leads\nto a term in the dynamics which is the conventional weighted sum of pre-synaptic firing\nrates. The other key term in the dynamics is i = -A2 2 (x\n cov x i - x), where 2\n x is the\nvariance of the prior distribution, expressing self-decay to a baseline activity level deter-\nmined by x. The prior- and input-dependent terms also contribute to self-decay as shown\nin Eq.8. Integration of the weighted sum of inputs plus decay to baseline constitute the\nwidely used leaky integrator reduction of a single neuron [10].\n\nThus, canonical models of synaptic plasticity (the Hebbian covariance rule) and single neu-\nron firing rate dynamics are exactly matched for autoassociative recall. Optimal values for\nall parameters of single neuron dynamics (except the membrane time constant determin-\ning the speed of gradient ascent) are directly implied. This is important, since it indicates\nhow to solve the problem for graded autoassociative memories (as opposed to saturing ones\n[14, 15]), that neuronal dynamics have to be finely tuned. As examples, the leak conduc-\ntance is given by the sum of the coefficients of all terms linear in xi, the optimal bias current\nis the sum of all terms independent of xi, and the level of inhibition can be determined from\nthe negative terms in the interaction function, -W and -x.\n\nSince our derivation embodies a number of approximations, we performed numerical sim-\nulations. To gauge the performance of the Bayes-optimal network we compared it to\nnetworks of increasing complexity (Fig. 1A,B). A trivial lower bound of performance is\n\n\f\n A B\n 3 prior\n input\n ideal observer\n Bayesian: prior + input 0.7\n 2 prior\n Bayesian: prior + input + synapses input\n 0.6 Bayesian: prior + input\n Bayesian: prior + input + synapses\n 1\n 0.5\n\n\n 0 0.4\n\n\n Recalled activity 0.3\n Frequency\n -1\n\n 0.2\n\n -2\n 0.1\n\n\n -3 0\n -3 -2 -1 0 1 2 3 -5 -4 -3 -2 -1 0 1 2 3 4 5\n Stored activity Error\n\n\n C D\n prior\n prior 1.4 input\n 1.4 input ideal observer\n ideal observer 1.2 Bayesian\n Bayesian\n 1.2 Treves\n Treves\n\n 1\n 1\n\n 0.8\n 0.8\n\n 0.6\n 0.6\n Average error\n 0.4\n 0.4 Average normalized error\n\n 0.2 0.2\n\n\n 0 0\n 1 10 100 1 10 100\n Number of stored patterns Number of stored patterns\n\n\n\n\nFigure 1: Performance of the rate-coded Bayesian inference network ( ), compared to a Bayesian\nnetwork that only takes into account evidence from the prior and the input but not from the synaptic\nweight matrix (), a network that randomly generates patterns from the prior distribution (), a\nnetwork that transmits its input to its output (+), and the `ideal observer' having access to the list\nof stored patterns (). A. Firing rates of single units at the end of the recall process (y-axis) against\nfiring rates in the original pattern (x-axis). B. Frequency histograms of errors (difference between\nrecalled and stored firing rates). The ideal observer is not plotted because its error distribution was a\nDirac-delta at 0. C, D. Benchmarking the Bayesian network against the network of Treves [13] ()\non patterns of non-negative firing rates. Average error is the square root of the mean squared error\n(C), average normalized error measures only the angle difference between true and recalled activities\n(D). (These measures are not exactly the same as that used to derive the dynamics (equation 1), but\nare reasonably appropriate.) The prior distribution was Gaussian with x = 0 mean and 2x = 1\nvariance (A,B), or a Gaussian with x = 0.5 mean and 2x = 0.25 variance truncated below 0 (C)\n(yielding approximately a = 0.5 density), or ternary with a = 0.5 mean and density (D). The input\nwas corrupted by unbiased Gaussian noise of 2~x = 1 variance (A,B), or 2~x = 1.5 variance (C,D) and\ncut at 0 (C,D). The learning rule was the covariance rule with Acov = 1 (A,B), or with Acov = 1/N a2\n(C,D). The number of cells in the network was N = 50 (A,B) and N = 100 (C,D), and the number of\nmemories stored was M = 2 (A,B) or varied between M = 2 . . . 100 (C,D, note logarithmic scale).\nFor each data point, 10 different networks were simulated with a different set of stored patterns, and\nfor each network, 10 attempts at recall were made, with a noisy version of a randomly chosen pattern\nas the input and with activities initialized at this input.\n\n\ngiven by a network that generates random patterns from the same prior distribution from\nwhich the patterns to be stored were drawn (P [x]). Another simple alternative is a net-\nwork that simply transmits its input (~\n x) to its output. (Note that the `input only' network\nis not necessarily superior to the `prior only' network: their relative effectiveness depends\non the relative variances of the prior and noise distributions, a narrow prior with a wide\nnoise distribution would make the latter perform better, as in Fig. 1D). The Bayesian infer-\nence network performs considerably better than any of these simple networks. Crucially,\n\n\f\nthis improvement depends on the information encoded in synaptic weights: the network\npractically falls back to the level of the `input only' network (or the `prior only' network,\nwhichever is the better, data not shown) if this information is ignored at the construction of\nthe recall dynamics (by taking the third term in Eq. 6 to be 0).\n\nAn upper bound on the performance of any network using some biological form of synaptic\nplasticity comes from an `ideal observer' which knows the complete list of stored patterns\n(rather than its distant reflection in the synaptic weight matrix) and computes and compares\nthe probability that each was corrupted to form the input ~\n x to find the best match (rather\nthan using neural dynamics). Such an ideal observer only makes errors when both the\nnumber of patterns stored and the noise in the input is sufficiently large, so that corrupting\na stored pattern is likely to make it more similar to another stored pattern. In the case\nshown in Fig. 1A,B, this is not the case, since only two patterns were stored, and the ideal\nobserver performs perfectly as expected. Nevertheless, there may be situations in which\nperfect performance is out of reach even for an ideal observer (Fig. 1C,D), which makes it\na meaningful touchstone. In summary, the performance of any network can be assessed by\nmeasuring where it lies between the better one of the `prior only' and `input only' networks\nand the ideal observer.\n\nAs a further challenge, we also benchmarked our model against the model of Treves [13]\n(Fig. 1C,D), which we chose because it is a rare example of a network that was de-\nsigned to have near optimal recall performance in the face of non-binary patterns. In\nthis work, Treves considered ternary patterns, drawn from the distribution P [xi] :=\n 1 - 4 a (x + a x , where (x) is the Dirac-delta function.\n 3 i) + a xi - 12 3 i - 3\n 2\nHere, a = x quantifies the density of the patterns (i.e. how non-sparse they are). The\npatterns are stored using the covariance rule as stated above (with Acov := 1 ). Neuronal\n N a2\nupdate in the model is discrete, asynchronous, and involves two steps. First the `local field'\nis calculated as hi := W x\n j=i i,j xj - k ( i i - N )3 + Input, then the output of the\nneuron is calculated as a threshold linear function of the local field: xi := g (hi - hThr)\nif hi > hThr and xi := 0 otherwise, where g := 0.53 a/ (1 - a) is the gain parameter,\nand hThr := 0 is the threshold, and the value of k is set by iterative search to optimize\nperformance.\n\nThe comparison between Treves' network as we implemented it and our network is imper-\nfect, since the former is optimized for recalling ternary patterns while, in the absence of\nneural evidence for ternary patterns, we used the simpler and more reasonable neural dy-\nnamics for our network that emerge from an assumption that the distribution over the stored\npatterns is Gaussian. Further, we corrupted the inputs by unbiased additive Gaussian noise\n(with variance 2 = 1.5), but truncated the activities at 0, though did not adjust the dy-\n ~\n x\nnamics of our network in the light of the truncation. Of course, these can only render our\nnetwork less effective. Still, the Bayesian network clearly outperformed the Treves network\nwhen the patterns were drawn from a truncated Gaussian (Fig. 1C). The performance of the\nBayesian network stayed close to that of an ideal observer assuming non-truncated Gaus-\nsian input, showing that most of the errors were caused by this assumption and not from\nsuboptimality of neural interactions decoding the information in synaptic weights. Despite\nextensive efforts to find the optimal parameters for the Treves network, its performance did\nnot even reach that of the `input only' network.\n\nFinally, again for ternary patterns, we also considered only penalizing errors about the di-\nrection of the vectors of recalled activities ignoring errors about their magnitudes (Fig. 1D).\nThe Treves network did better in this case, but still not as well as the Bayesian network.\nImportantly, in both cases, in the regime where synaptic weights were saturated in the\nM N limit and thus it was no longer possible to extract any useful information from the\nsynaptic weights, the Bayesian network still only fell back to the level of the `prior only'\nnetwork, but the Treves network did not seem to have any such upper bound on its errors.\n\n\f\n4 Phase-based memories\n\nBrain areas known to be involved in memory processing demonstrate prominent oscilla-\ntions (LFPOs) under a variety of conditions, including both wake and sleep states [16].\nUnder these conditions, the phases of the spikes of a neuron relative to the LFPO have been\nshown to be carefully controlled [17], and even to convey meaningful stimulus information,\ne.g. about the position of an animal in its environment [3] or retrieved odor identity [18].\nThe discovery of spike timing dependent plasticity (STDP) in which the relative timing\nof pre- and postsynaptic firings determines the sign and extent of synaptic weight change,\noffered new insights into how the information represented by spike times may be stored\nin neural networks [19]. However, bar some interesting suggestions about neuronal reso-\nnance [20], it is less clear how one might correctly recall information thereby stored in the\nsynaptic weights.\n\nThe theory laid out in Section 2 allows us to treat this problem systematically. First,\nneuronal activities, xi, will be interpreted as firing times relative to a reference phase\nof the ongoing LFPO, such as the peak of theta oscillation in the hippocampus, and\nwill thus be circular variables drawn from a circular Gaussian. Next, our learning rule\nis an exponentially decaying Gabor-function of the phase difference between pre- and\npostsynaptic firing: STDP (xi, xj) := ASTDP exp[STDP cos(i,j)] sin(i,j - STDP)\nwith i,j = 2 (xi - xj) /TSTDP. STDP characteristics in different brain regions are\nwell captured by this general formula, but the parameters determining their exact shapes\ngreatly differ among regions. We constrain our analysis to the antisymmetric case, so\nthat STDP = 0, and set other parameters to match experimental data on hippocampal\nSTDP [5]. The neuronal interaction function that satisfies Eq.11 is HSTDP (xi, xj) =\n2ASTDP/TSTDP Wi,j exp[STDP cos(i,j)] cos(i,j) - STDP sin2(i,j) . This in-\nteraction function decreases firing phase, and thus accelerates the postsynaptic cell if the\npresynaptic spike precedes postsynaptic firing, and delays the postsynaptic cell if the presy-\nnaptic spike arrives just after the postsynaptic cell fired. This characteristic is the essence\nof the biphasic phase reset curve of type II cells [21], and has been observed in various\ntypes of neurons, including neocortical cells [22]. Thus again, our derivation directly cou-\nples STDP, a canonical model of synaptic plasticity, and phase reset curves in a canonical\nmodel of neural dynamics.\n\nNumerical simulations tested again the various approximations. Performance of the net-\nwork is shown in Fig.2 as is comparable to that of the rate coded network (Fig.2). Further\nsimulations will be necessary to map out the performance of our network over a wider\nrange of parameters, such as the signal-to-noise ratio.\n\n\n5 Discussion\n\nWe have described a Bayesian approach to recall in autoassociative memories. This permits\nthe derivation of neuronal dynamics appropriate to a synaptic plasticity rule, and we used\nthis to show a coupling between canonical Hebbian and STDP plasticity rules and canonical\nrate-based and phase-based neuronal dynamics respectively. This provides an unexpectedly\nclose link between optimal computations and actual implementations. Our method also\nleads to networks that are highly competent at recall.\n\nThere are a number of important direction for future work. First, even in phase-based\nnetworks, not all neurons fire in each period of the oscillation. This suggests that neurons\nmay employ a dual code the more rate-based probability of being active in a cycle, and\nthe phase-based timing of the spike relative to the cycle [24]. The advantages of such a\nscheme have yet to be fully characterized.\n\nSecond, in the present framework the choice of the learning rule is arbitrary, as long as the\n\n\f\n A B\n 0.35 7\n prior\n input\n 0.3 Bayesian 6\n\n\n 0.25 5\n\n\n 0.2 4\n\n\n 0.15 3\n Frequency\n Average error input\n 0.1 2 ideal observer\n Bayesian\n\n 0.05 1\n\n\n 0 0\n -60 -40 -20 0 20 40 60 1 10 100\n Error Number of stored patterns\n\n\n\n\nFigure 2: Performance of the phase-coded network. Error distribution for the ideal observer was\na Dirac-delta at 0 (B) and was thus omitted from A. Average error of the `prior only' network was\ntoo large (A) to be plotted in B. The prior was a von Mises distribution with x = 0 mean, x =\n0.5 concentration on a T = 125 ms long cycle matching data on theta frequency modulation\nof pyramidal cell population activity in the hippocampus [23]. Input was corrupted by unbiased\ncircular Gaussian (von Mises) noise with x = 10 concentration. Learning rule was circular STDP\nrule with ASTDP = 0.03, STDP = 4 and TSTDP = T parameters matching experimental data on\nhippocampal STDP [5] and theta periodicity. The network consisted of N = 100 cells, and the\nnumber of memories stored was M = 10 (A) or varied between M = 2 . . . 100 (B, note logarithmic\nscale). For further explanation of symbols and axes, see Figure 1.\n\n\nrecall dynamics is optimally matched to it. Our formalism also suggests that there may be\na way to optimally choose the learning rule itself in the first place, by matching it to the\nprior distribution of patterns. This approach would thus be fundamentally different from\nthose seeking `globally' optimal learning rules [25], and may be more similar to those used\nto find optimal tuning curves appropriately matching stimulus statistics [26].\n\n\nReferences\n\n [1] Marr D. Philos Trans R Soc Lond B Biol Sci 262:23, 1971.\n [2] O'Keefe J. Exp Neurol 51:78, 1976.\n [3] O'Keefe J, Recce ML. Hippocampus 3:317, 1993.\n [4] Bliss TVP, Lmo T. J Physiol (Lond) 232:331, 1973.\n [5] Bi GQ, Poo MM. J Neurosci 18:10464, 1998.\n [6] MacKay DJC. In Maximum entropy and Bayesian methods, 237, 1990.\n [7] Sommer FT, Dayan P. IEEE Trans Neural Netw 9:705, 1998.\n [8] Jaynes ET. Probability theory: the logic of science. Cambridge University Press, 2003.\n [9] Amit DJ. Modeling brain function. Cambridge University Press, 1989.\n[10] Dayan P, Abbott LF. Theoretical neuroscience. MIT Press, 2001.\n[11] Zhang K. J Neurosci 16:2112, 1996.\n[12] Seung HS. Proc Natl Acad Sci USA 93:13339, 1996.\n[13] Treves A. Phys Rev A 42:2418, 1990.\n[14] Hopfield JJ. Proc Natl Acad Sci USA 76:2554, 1982.\n[15] Hopfield JJ. Proc Natl Acad Sci USA 81:3088, 1984.\n[16] Buzski Gy. Neuron 33:325, 2002.\n[17] Harris KD, et al. Nature 424:552, 2003.\n[18] Li Z, Hopfield JJ. Biol Cybern 61:379, 1989.\n[19] Abbott LF, Nelson SB. Nat Neurosci 3:1178, 2000.\n[20] Scarpetta S, et al. Neural Comput 14:2371, 2002.\n[21] Ermentrout B, et al. Neural Comput 13:1285, 2001.\n[22] Reyes AD, Fetz FE. J Neurophysiol 69:1673, 1993.\n[23] Klausberger T, et al. Nature 421:844, 2003.\n[24] Mueller R, et al. In BioNet'96 , 70, 1976.\n[25] Gardner E, Derrida B. J Phys A 21:271, 1988.\n[26] Laughlin S. Z Naturforsch 36:901, 1981.\n\n\f\n", "award": [], "sourceid": 2683, "authors": [{"given_name": "M\u00e1t\u00e9", "family_name": "Lengyel", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}]}