t'. \n\nIn the context of input coming from visually presented objects, WB suggest us(cid:173)\ning white noise N ( w) = N, 'if w, and consider two possibilities for S ( w), based \non the assumptions that objects persist for either fixed, or randomly variable, \nlengths of time. We summarize their main result in the first three rows of \nfigure 1. Figure l(A3) shows the assumed, scale-free, magnitude spectrum \nIS(w)l = 1/w for the signal. Figure l(Al) shows the (truly optimal) purely \ncausal version of the filter that results - it can be shown to involve exactly \nan exponential decay, with a rate constant which depends on the level of the \nnoise N. In WB's self-supervised setting, it is rather unclear a priori whether \nthe assumption of white noise is valid; WB's experiments bore it out to a rough \napproximation, and showed that the filter of figure l(Al) worked well on a task \ninvolving digit representation and recognition. \n\nFigure l(Bl;B3) repeat the analysis, with the same signal spectrum, but for \nthe optimal purely acausal filter as used in reinforcement learning's synaptic \neligibility traces. Of course, the true TDP kernels (shown in figure l(Dl-Gl)) \nare neither purely casual nor acausal; figure l(Cl) shows the normal low pass \nfilter that results from assuming phase 0 for all frequency components. \n\nAlthough the WB filter of figure l(Cl) somewhat resembles a Hebbian version \nof the anti-Hebbian rule for layer IV spiny stellate cells shown in figure l(Gl), \nit is clearly not a good match for the standard forms of TDP. One might also \nquestion the relationship between the time constants of the kernels and the \nsignal spectrum that comes from object persistence. The next section consid(cid:173)\ners two alternative possibilities for interpreting TDP kernels. \n\n3 Signalling and Whitening \n\nThe main intent of this paper is to combine WB's idea about the role of filtering \nin synaptic plasticity with the actual forms of the kernels that have been re(cid:173)\nvealed in the experiments. Under two different models for the computational \ngoal of filtering, we work back from the experimental kernels to the implied \nforms of the statistics of the signals. The first method employs WB's Wiener \nfiltering idea. The second method can be seen as using a more stringent defin(cid:173)\ntion of statistical significance. \n\n\f~ phase for E1 ~ phase=-\u00a5 \n\n(g DoG kernel @] DoG signals \n\nb ,, \n\n\\ \n\\ Wiener \n\nw \n\nt \n\nt \n\nw \n\nFigure 2: Kernel manipulation. A) The phase spectrum (ie kernel phase as a function \nof frequency) for the kernel (shown in figure l(El)) with asymmetric LTP and LTD.l 6 \nB) The kernel that results from the power spectrum of figure l(E2) but constant phase \n-rr/2. This kernel has symmetric LTP and LTD, with an intermediate time constant. \nC) Plasticity kernel that is exactly a difference of two Gaussians (DoG; compare fig(cid:173)\nure l(Fl)). White (solid; from equation 4) and Wiener (dashed; from equation 3) signal \nspectra derived from the DoG kernel in (C). Here, the signal spectrum in the case of \nwhitening has been vertically displaced so it is clearer. Both signal spectra show dear \nperiodicities. \n\n3.1 Reverse engineering signals from Wiener filtering \n\nAccepting equation 2 as the form of the filter (note that this implies that \nlci>(w)l ~ 1), and, with WB, making the assumption that the noise is white, \nso IN(w)l = N, Vw, the assumed amplitude spectrum of the signal process \ns(t) is \n\n(3) \nImportantly, the assumed power of the noise does not affect the form of the \nsignal power, it only scales it. \n\nIS(w)l = N~lci>(w)l/(1-lci>(i.o)i). \n\nFigure 1(D2-G2) shows the magnitude of the Fourier transform of the experi(cid:173)\nmental kernels (which are shown in figure 1(D1-G 1)), and figure 1(D3-G3) show \nthe implied signal spectra. Since there is no natural data that specify the abso(cid:173)\nlute scale of the kernels (ie the maximum value of lci>(w) 1), we set it arbitrarily \nto 0.5. Any value less than ~ 0.9 leads to similar predictions for the signal \nspectra. We can relate figure 1(D3-G3) to to the heuristic criteria mentioned \nabove for the signal power spectrum. In two cases (D3;F3), the clear peaks \nin the signal power spectra imply strong periodicities. For layer V pyramids \n(D3), the time constant for the kernel is ~ 20ms, implying a peak frequency of \nw =50Hz in they band. In the hippocampal case, the frequency may be a lit(cid:173)\ntle lower. Certainly, the signal power spectra underlying the different kernels \nhave quite different forms. \n\n3.2 Reverse engineering signals from whitening \n\nWB's suggestion that the underlying signal s(t) should be extracted from the \noutput y ( t) far from exhausts the possibilities for filtering. In particular, there \nhave been various suggestions36 that learning should be licensed by statistical \nsurprise, ie according to how components of the output differ from expecta(cid:173)\ntions. A simple form of this that has gained recent currency is the Z-score \ntransformation,8\u202215\u202236 which implies considering components of the signal in \nunits of (ie normalized by) their standard deviations. Mechanistically, this is \nclosely related to whitening in the face of input noise, but with a rather differ(cid:173)\nent computational rationale. \n\n\fA simple formulation of a noise-sensitive Z-score is Dong & Atick's12 whitening \nfilter. Under the same formulation as WB (equation 2), this suggests multiply(cid:173)\ning the Wiener filter by 1 I IS ( w) I, giving \n\nI*(w)l = IS(w)II(IS(w)l 2 + N(w) 2). \n\n(4) \n\nAs in equation 3, it is possible to solve for the signal power spectra implied \nby the various kernels. The 4th column of figure 1 shows the result of doing \nthis for the experimental kernels. In particular, it shows that the clear spectral \npeaks suggested by the Wiener filter (in the 3rd column) may be artefactual -\nthey can arise from a form of whitening. Unlike the case of Wiener filtering, the \nsignal statistics derived from the assumption of whitening have the common \ncharacteristic of monotonically decreasing signal powers as a function of fre(cid:173)\nquency w, which is a common finding for natural scene statistics, for instance. \n\nThe case of the layer V pyramids25 (row Din figure 1) is particularly clear. If \nthe time constants of potentiation (LTP) and depression (LTD) are T, and LTP \nand LTD are matched, then the Fourier transform of the plasticity kernel is \n\n W) = -\n\n( \n\n1 \n\nVEi \n\n( \n\n1 \n\niw + 1. \n\nT \n\n+ \n\n) \n\n1 \n\niw - 1. \nT \n\n-~ w \n\n= -t -\n\nrr w2 + ~ \n\nT \n\n. 2~ tJ \n\n= -tT \n\n-\nrr ~ + T2 \n\nW \n\n( ) \n5 \n\nwhich is exactly the form of equation 4 for S ( w) = 1 I w (which is duly shown \nin figure 1(D4)). Note the factor of -i in (w). This is determined by the \nphases of the frequency components, and comes from the anti-symmetry of \nthe kernel. The phase of the components (L(w) = -rr 12, by one convention) \nimplies the predictive nature of the kernel: Xi(t) is being correlated with led \n(ie future) values of noise-filtered, significance-normalized, outputs. \n\nThe other cases in figure 1 follow in a similar vein. Row E, from cortical layer \nII/ll, with its asymmetry between LTP and LID, has similar signal statistics, \nbut with an extra falloff constant w0, making S(w) = 11(w + w 0 ). Also, it \nhas a phase spectrum L(w) which is not constant with w (see figure 2A). \nRow F, from hippocampal GABAergic cells in culture, has a form that can arise \nfrom an exponentially decreasing signal power and little assumed noise (small \nN ( w) ). Conversely, row G, in cortical layer IV spiny-stellate cells, arises from \nthe same signal statistics, but with a large noise term N(w). Unlike the case \nof the Wiener filter (equation 3), the form of the signal statistics, and not just \ntheir magnitude, depends on the amount of assumed noise. \n\nFigure 2B-C show various aspects of how these results change with the param(cid:173)\neters or forms of the kernels. Figure 2B shows that coupling the power spec(cid:173)\ntrum (of figure 1E2) for the rule with asymmetric LTP and LTD with a constant \nphase spectrum (-rr 12) leads to a rule with the same filtering characteristic, \nbut with symmetric LTP and LTD. The phase spectrum concerns the predictive \nrelationship between pre- and post-synaptic frequency components; it will be \ninteresting to consider the kernels that result from other temporal relation(cid:173)\nships between pre- and post-synaptic activities. Figure 2C shows the kernel \ngenerated as a difference of two Gaussians (DoG). Although this kernel resem(cid:173)\nbles that of figure 1F1, the signal spectra (figure 2D) calculated on the basis of \nwhitening (solid; vertically displaced) or Wiener filtering (dashed) are similar \nto each other, and both involve strong periodicity near the spectral peak of the \nkernel. \n\n\f4 Discussion \n\nTemporal asymmetries in synaptic plasticity have been irresistibly alluring to \ntheoretical treatments. We followed the suggestion that the kernels indicate \nthat learning is not based on simple correlation between pre- and post -synaptic \nactivity, but rather involves filtering in the light of prior information, either to \nremove noise from the signals (Wiener filtering), or to remove noise and boost \ncomponents of the signals according to their statistical significance. \n\nAdopting this view leads to new conclusions about the kernels, for instance re(cid:173)\nvealing how the phase spectrum differentiates rules with symmetric and asym(cid:173)\nmetric potentiation and depression components (compare figures l(El); 2B). \nMaking some further assumptions about the characteristics of the assumed \nnoise, it permits us to reverse engineer the assumed statistics of the signals, ie \nto give a window onto the priors at synapses or cells (columns 3;4 of figure 1). \nStructural features in these signal statistics, such as strong periodicities, may \nbe related to experimentally observable characteristics such as oscillatory ac(cid:173)\ntivity in relevant brain regions. Most importantly, on this view, the detailed \ncharacteristics of the filtering might be expected to adapt in the light of pat(cid:173)\nterns of activity. This suggests the straightforward experimental test of ma(cid:173)\nnipulating the input and/or output statistics and recording the consequences. \n\nVarious characteristics of the rules bear comment. Since we wanted to focus on \nstructural features of the rules, the graphs in the figures all lack precise time \nor frequency scales. In some cases we know the time constants of the kernels, \nand they are usually quite fast (on the order of tens of milliseconds). This can \nsuggest high frequency spectral peaks in assumed signal statistics. However, it \n\u00b7also hints at the potential inadequacy of our rate-based treatment that we have \ngiven, and suggests the importance of a spike-based treatment. 22\u2022 30 Recent \nevidence that successive pairs of pre- and post-synaptic spikes do not interact \nadditively in determining the magnitude and direction of plasticity18 make the \naveraging inherent in the rate-based approximation less appealing. Further, \nwe commented at the outset that pre- and post-synaptic filtering have similar \neffects, provided that all the filters on one post-synaptic cell are the same. If \nthey are different, then synapses might well be treated as individual filters, \nascertaining important signals for learning. In our framework, it is interesting \nto speculate about the role of (pre-)synaptic depression itself as a form of noise \nfilter (since noise should be filt\u20acred before it can affect the activity of the post(cid:173)\nsynaptic cell, rather than just its plasticity); leaving the kernel as a significance \nfilter, as in the whitening treatment. Finally, largely because of the separate \nroles of signal and noise, we have been unable to think of a simple experiment \nthat would test between Wiener and whitening filtering. However, it is a quite \ncritical issue in further exploring computational accounts of plasticity. \n\nAcknowledgements \n\nWe are very grateful to Odelia Schwartz for helpful discussions. Funding was \nfrom the Gatsby Charitable Foundation, the Wellcome Trust (MH) and an HFSP \nLong Term Fellowship (ML). \n\nReferences \n\u00b7 [1] Abbott, LF, & Blum, KI (1996) Functional significance of long-term potentiation for sequence learning \n\nand prediction. Cerebral Cortex 6:406-416. \n\n[2] Abbott, LF & Nelson, SB (2000) Synaptic plasticity: taming the beast. Nature Neuroscience 3:1178-1183. \n\n\f[3] Atick, JJ, li, Z, & Redlich, AN (1992) Understanding retinal color coding from first principles. Neural \n\nComputation 4:559-572. \n\n[4] Bell, CC, Han, VZ, Sugawara, Y & Grant K (1997) Synaptic plasticity in a cerebellum-like structure \n\ndepends on temporal order. Nature 387:278-81. \n\n[5] Bi, GQ & Poo, MM (1998) Synaptic modifications in cultured hippocampal neurons: dependence on \n\nspike timing, synaptic strength, and postsynaptic cell type. Journal of Neurosdence 18:10464-10472. \n[6] Bienenstock, EL, Cooper, IN, & Munro, PW (1982) Theory for the development of neuron selectivity: \n\nOrientation specificity and binocular interaction in visual cortex. Journal of Neuroscience 2:32-48. \n\n[7] Blum, KI, & Abbott, LF (1996) A model of spatial map formation in the hippocampus of the rat. Neural \n\nComputation 8:85-93. \n\n[8] Buiatti, M & Van Vreeswijk, C (2003) Variance normalisation: a key mechanism for temporal adaptation \n\nin natural vision? VISion Research, in press. \n\n[9] Cateau, H & Fukai, T (2003) A stochastic method to predict the consequence of arbitrary forms of \n\nspike-timing-dependent plasticity. Neural Computation 15:597-620. \n\n[1 0] Chechik, G (2003). Spike time dependent plasticity and information maximization. Neural Computation \n\nin press. \n\n[11] Debanne, D, Gahwiler, BH & Thompson, SM (1998) Long-term synaptic plasticity between pairs of \n\nindividual CA3 pyramidal cells in rat hippocampal slice cultures. Journal of Physiology 507:237-247. \n\n[12] Dong, DW, & A tick, JJ (1995) Temporal decorrelation: A theory of lagged and nonlagged responses in \n\nthe lateral geniculate nucleus. Network: Computation in Neural Systems 6:159-178. \n\n[13] Edelman, S & Weinshall, D (1991) A self-organizing multiple-view representation of 3D objects. Biolog(cid:173)\n\nical Cybernetics 64:209-219. \n\n[14] Egger, V, Feldmeyer, D & Sakmann, B (1999) Coincidence detection and changes of synaptic efficacy in \n\nspiny stellate neurons in rat barrel cortex. Nature Neurosdence 2:1098-1105. \n\n[15] Fairhall, AL, Lewen, GD, Bialek, W & de Ruyter Van Steveninck, RR (2001) Efficiency and ambiguity in \n\nan adaptive neural code. Nature 412:787-792. \n\n[16] Feldman, DE (2000) Timing-based LTP and LTD at vertical inputs to layer II/III pyramidal cells in rat \n\nbarrel cortex. Neuron 27:45-56. \n\n[17] Foldiflk, P (1991) Learning invariance from transformed sequences. Neural Computation 3:194-200. \n[18] Froemke, RC & Dan, Y (2002) Spike-timing-dependent synaptic modification induced by natural spike \n\ntrains. Nature 416:433-438. \n\n[19] Ganguly K, Kiss, L & Poo, M (2000) Enhancement of presynaptic neuronal excitability by correlated \n\npresynaptic and postsynaptic spiking. Nature Neuroscience 3:1018-1026. \n\n[20] Gerstner, W & Abbott, LF (1997) Learning navigational maps through potentiation and modulation of \n\nhippocampal place cells. Journal of Computational Neurosdence 4:79-94. \n\n[21] Gerstner, W, Kempter, R, van Hemmen, JL & Wagner, H (1996) A neuronal learning rule for sub(cid:173)\n\nmillisecond temporal coding. Nature 383:76-81. \n\n[22] Gerstner, W & Kistler, WM (2002) Mathematical formulations ofHebbianlearning. Biological Cybernetics \n\n87:404-15 .. \n\n[23] Hull, CL (1943) Prindples of Behavior New York, NY: Appleton-Century. \n[24] Levy, WB & Steward, D (1983) Temporal contiguity requirements for long-term associative potentia(cid:173)\n\ntion/depression in the hippocampus. Neurosdence 8:791-797 \n\n[2 5] Markram, H, Lubke, J, Frotscher, M, & Sakmann, B (1997) Regulation of synaptic efficacy by coincidence \n\nof postsynaptic APs and EPSPs. Science 275:213-215. \n\n[26] Minai, AA, & Levy, WB (1993) Sequence learning in a single trial. International Neural Network Society \n\nWorld Congress of Neural Networks II. Portland, OR: International Neural Network Society, 505-508. \n\n[2 7] Pavlov, PI (192 7) Conditioned Reflexes Oxford, England, OUP. \n[28] Porr, B & Worgotter, F (2003) Isotropic sequence order learning. Neural Computation 15:831-864. \n[29] Rao, RP & Sejnowski, TJ (2001) Spike-timing-dependentHebbian plasticity as temporal difference learn(cid:173)\n\ning. Neural Computation 13:2221-2237. \n\n\u00b7 \n\n[30] Sjostrom, PJ, Turrigiano, GG & Nelson, SB (2001) Rate, timing, and cooperativity jointly determine \n\ncortical synaptic plasticity. Neuron 32:1149-1164. \n\n[31] Sutton, RS (1988) Learning to predict by the methods of temporal difference. Machine Learning 3:9-44. \n[3 2] Sutton, RS & Barto, AG (1981) Toward a modern theory of adaptive networks: Expectation and predic(cid:173)\n\ntion. Psychological Review 88:135-170. \n\n[33] van Rossum, MC, Bi, GQ & Turrigiano, GG (2000) Stable Hebbian learning from spike timing-dependent \n\nplasticity. journal of Neurosdence 20:8812-21. \n\n[34] Wallis, G & Baddeley, R (1997) Optimal, unsupervised learning in invariant object recognition. Neural \n\nComputation 9:883-894. \n\n[35] Wallis, G & Rolls, ET (1997). Invariant face and object recognition in the visual system. it Progress in \n\n[36] Yu, AJ & Dayan, P (2003) Expected and unexpected uncertainty: ACh & NE in the neocortex. In NIPS \n\nNeurobiology 51:167-194. \n\n2002 Cambridge, MA: MIT Press. \n\n\f", "award": [], "sourceid": 2392, "authors": [{"given_name": "Peter", "family_name": "Dayan", "institution": null}, {"given_name": "Michael", "family_name": "H\u00e4usser", "institution": null}, {"given_name": "Michael", "family_name": "London", "institution": null}]}*