{"title": "Multiple Timescales of Adaptation in a Neural Code", "book": "Advances in Neural Information Processing Systems", "page_first": 124, "page_last": 130, "abstract": null, "full_text": "Multiple times cales of adaptation in a neural \n\ncode \n\nAdrienne L. Fairhall, Geoffrey D. Lewen, William Bialek, \n\nand Robert R. de Ruyter van Steveninck \n\nNEe Research Institute \n4 Independence Way \n\nPrinceton, New Jersey 08540 \n\nadrienne!geofflbialeklruyter@ research. nj. nec. com \n\nAbstract \n\nMany neural systems extend their dynamic range by adaptation. We ex(cid:173)\namine the timescales of adaptation in the context of dynamically mod(cid:173)\nulated rapidly-varying stimuli, and demonstrate in the fly visual system \nthat adaptation to the statistical ensemble of the stimulus dynamically \nmaximizes information transmission about the time-dependent stimulus. \nFurther, while the rate response has long transients, the adaptation takes \nplace on timescales consistent with optimal variance estimation. \n\n1 Introduction \n\nAdaptation was one of the first phenomena discovered when Adrian recorded the responses \nof single sensory neurons [1, 2]. Since that time, many different forms of adaptation have \nbeen found in almost all sensory systems. The simplest forms of adaptation, such as light \nand dark adaptation in the visual system, seem to involve just discarding a large constant \nbackground signal so that the system can maintain sensitivity to small changes. The idea \nof Attneave [3] and Barlow [4] that the nervous system tries to find an efficient representa(cid:173)\ntion of its sensory inputs implies that neural coding strategies should be adapted not just to \nconstant parameters such as the mean light intensity, but to the entire distribution of input \nsignals [5]; more generally, efficient strategies for processing (not just coding) of sensory \nsignals must also be matched to the statistics of these signals [6]. Adaptation to statistics \nmight happen on evolutionary time scales, or, at the opposite extreme, it might happen in \nreal time as an animal moves through the world. There is now evidence from several sys(cid:173)\ntems for real time adaptation to statistics [7, 8, 9], and at least in one case it has been shown \nthat the form of this adaptation indeed does serve to optimize the efficiency of represen(cid:173)\ntation, maximizing the information that a single neuron transmits about its sensory inputs \n[10]. \nPerhaps the simplest of statistical adaptation experiments, as in Ref [7] and Fig. 1, is to \nswitch between stimuli that are drawn from different probability distributions and ask how \nthe neuron responds to the switch. When we 'repeat' the experiment we repeat the time \ndependence of the parameters describing the distribution, but we choose new signals from \nthe same distributions; thus we probe the response or adaptation to the distributions and not \nto the particular signals. These switching experiments typically reveal transient responses \nto the switch that have rather long time scales, and it is tempting to identify these long time \nscales as the time scales of adaptation. On the other hand, one can also view the process \nof adapting to a distribution as one of learning the parameters of that distribution, or of \naccumulating evidence that the distribution has changed. Some features of the dynamics \n\n\fin the switching experiments match the dynamics of an optimal statistical estimator [11], \nbut the overall time scale does not: for all the experiments we have seen, the apparent time \nscales of adaptation in a switching experiment are much longer than would be required to \nmake reliable estimates of the relevant statistical parameters. \n\nIn this work we re-examine the phenomena of statistical adaptation in the motion sensitive \nneurons of the fly visual system. Specifically, we are interested in adaptation to the vari(cid:173)\nance or dynamic range of the velocity distribution [10]. It has been shown that, in steady \nstate, this adaptation includes a rescaling of the neuron's input/output relation, so that the \nsystem seems to encode dynamic velocity signals in relative units; this allows the system, \npresumably, to deal both with the \"-' 50\u00b0 /s motions that occur in straight flight and with the \n\"-' 2000\u00b0 /s motions that occur during acrobatic flight (see Ref.[12]). Further, the precise \nform of rescaling chosen by the fly's visual system is that which maximizes information \ntransmission. There are several natural questions: (1) How long does it take the system to \naccomplish the rescaling of its input/output relation? (2) Are the transients seen in switch(cid:173)\ning experiments an indication of gradual rescaling? (3) If the system adapts to the variance \nof its inputs, is the neural signal ambiguous about the absolute scale of velocity? (4) Can \nwe see the optimization of information transmission occurring in real time? \n\n2 Stimulus structure and experimental setup \n\nA fly (Calliphora vicina) is immobilized in wax and views a computer controlled oscillo(cid:173)\nscope display while we record action potentials from the identified neuron HI using stan(cid:173)\ndard methods. The stimulus movie is a random pattern of dark and light vertical bars, and \nthe entire pattern moves along a random trajectory with velocity S(t); since the neuron is \nmotion (and not position) sensitive we refer to this signal as the stimulus. We construct the \nstimulus S(t) as the product of a normalized white noise s(t), constructed from a random \nnumber sequence refreshed every Ts = 2 ms, and an amplitude or standard deviation (J'(t) \nwhich varies on a characteristic timescale Ta \u00bb Ts. Frames of the movie are drawn every \n2 ms. For analysis all spike times are discretized at the 2 ms resolution of the movie. \n\na) \n\nT = 20s \n--- T= lOs \n---T=4s \n\n~ \n\n0 \n\n\"0 \nQ) \n_!!1 \ntil \nE o -1 \n\nZ \n\n-2+-------~------~------~------~ \n1.00 \n\n0.00 \n\n0.50 \n\n0. 25 \n\n0.75 \n\nNormalised time 1fT \n\n10 \n\n20 \n\n30 \n\n40 \n\nPeriod T (sec) \n\nFigure 1: (a) The spike rate measured in response to a square-wave modulated white noise stimulus \nset) , averaged over many presentations of set), and normalized by the mean and standard deviation. \n(b) Decay time of the rate following an upward switch as a function of switching period T. \n\n3 Spike rate dynamics \n\nSwitching experiments as described above correspond to a stimulus such that the amplitude \n(J'(t) is a square wave, alternating between two values (J'l and (J'2, (J'l > (J'2. Experiments \nwere performed over a range of switching periods (T = 40, 20, 10, 4 s), with the am(cid:173)\nplitudes (J'l and (J'2 in a ratio of 5:1. Remarkably, the timescales of the response depend \n\n\fstrongly on those of the experiment; in fact, the response times rescale by T, as is seen \nin Fig. lea). The decay of the rate in the first half of the experiment is fitted by an expo(cid:173)\nnential, and in Fig. l(b), the resulting decay time T(T) is plotted as a function of T; we \nuse an exponential not to insist that this is the correct form, only to extract a timescale. As \nsuggested by the rescaling of Fig. lea), the fitted decay times are well described as a linear \nfunction of the stimulus period. This demonstrates that the timescale of adaptation of the \nrate is not absolute, but is a function of the timescale established in the experiment. \nLarge sudden changes in stimulus variance might trigger special mechanisms, so we turn \nto a signal that changes variance continuously: the amplitude (J'(t) is taken to be the ex(cid:173)\nponential of a sinusoid, (J'(t) = exp(asin(27l'kt)), where the period T = 11k was varied \nbetween 2 s and 240 s, and the constant a is fixed such that the amplitude varies by a factor \nof 10 over a cycle. A typical averaged rate response to the exponential-sinusoidal stimulus \nis shown in Fig. 2(a). The rate is close to sinusoidal over this parameter regime, indicating \na logarithmic encoding of the stimulus variance. Significantly, the rate response shows a \nphase lead ~ * with respect to the stimulus. This may be interpreted as the effect of adap(cid:173)\ntation: at every point on the cycle, the gain of the response is set to a value defined by the \nstimulus a short time before. \n\n2 \n\na) \n\n:;-\nQl \n-0 \n\n1 \n~ \n.l!l \n0 \n~ \n-0 \nQl \n.!!l \n'iii \nE \n0 \nZ \n\n-2 \n\n-1 \n\n120 \n\n100 \n0- 80 \nQl \n.!E.-\n00 60 \n:c 40 \nrn \nQl \nE 20 \ni= \n\n0 \n\n-20 \n\nT =30 s \n--- T=60s \n- T=90s \n-\n-\n- T= 120s \n\n0.00 \n\n0.25 \n\n0.50 \n\n0.75 \n\n1.00 \n\nNormalised tIT \n\nb) \n\no \n\n100 150 200 250 300 \n\n50 \nPeriod T (sec) \n\nFigure 2: (a) The spike rate measured in response to a exponential-sinusoidal modulation of a white \nnoise stimulus set) , averaged over presentations of set) , and normalised by the mean and standard \ndeviation, for several periods T. (b) The time shift 0 between response and stimulus, for a range of \nperiods T. \n\nAs before, the response of the system was measured over a range of periods T. Fig. 2(b) \nshows the measured relation of the timeshift 8(T) = T ~** of the response as a function of \nT. One observes that the relation is nearly linear over more than one order of magnitude in \nT; that is, the phase shift is approximately constant. Once again there is a strong and simple \ndependence of the apparent timescale of adaptation on the stimulus parameters. Responses \nto stimulus sequences composed of many frequencies also exhibit a phase shift, consistent \nwith that observed for the single frequency experiments. \n\n4 The dynamic input-output relation \n\nBoth the switching and sinusoidally modulated experiments indicate that responses to \nchanging the variance of input signals have multiple time scales, ranging from a few sec(cid:173)\nonds to several minutes. Does it really take the system this long to adjust its input/output \nrelation to the new input distribution? In the range of velocities used, and at the contrast \nlevel used in the laboratory, spiking in HI depends on features of the velocity waveform \nthat occur within a window of '\" 100 ms. After a few seconds, then, the system has had \naccess to several tens of independent samples of the motion signal, and should be able to \nestimate its variance to within'\" 20%; after a minute the precision would be better than \na few percent. In practice, we are changing the input variance not by a few percent but a \n\n\ffactor of two or ten; if the system were really efficient, these changes would be detected and \ncompensated by adaptation on much shorter time scales. To address this, we look directly \nat the input/output relation as the standard deviation u(t) varies in time. \nFor simplicity we analyze (as in Ref. [10]) features of the stimulus that modulate the \nprobability of occurrence of individual spikes, P(spikelstimulus); we will not consider \npatterns of spikes, although the same methods can be easily generalised. The space of \nstimulus histories of length '\" 100 ms, discretised at 2 ms, leading up to a s(like has a \ndimensionality'\" 50, too large to allow adequate sampling of P(spikelstimulus) from the \ndata, so we must begin by reducing the dimensionality of the stimulus description. \nThe simplest way to do so is to find a subset of directions in stimulus space determined to \nbe relevant for the system, and to project the stimulus onto that set of directions. These \ndirections correspond to linear filters. Such a set of directions can be obtained from the \nmoments of the spike-conditional stimulus; the first such moment is the spike-triggered \naverage, or reverse correlation function [2]. It has been shown [10] that for HI, under these \nconditions, there are two relevant dimensions: a smoothed version of the velocity, and \nalso its derivative. The rescaling observed in steady state experiments was seen to occur \nindependently in both dimensions, so without loss of generality we will use as our filter the \nsingle dimension given by the spike-triggered average. The stimulus projected onto this \nfilter will be denoted by So. \nThe filtered stimulus is then passed through a nonlinear decision process akin to a threshold. \nTo calculate the input/output relation P(spikelso) [10], we use Bayes' rule: \n\nP(spikelso) \nP(spike) \n\nP(solspike) \n\nP(so) \n\n(1) \n\nThe spike rate r(so) is proportional to the probability of spiking, r(so) ex: P(spikelso), \nleading to the relation \n\nr(so) \n\nr \n\nP(solspike) \n= -----',=-:-----:---'--\n\nP(so) \n\n(2) \n\nwhere r is the mean spike rate. P(so) is the (lrior distribution of the projected stimulus, \nwhich we know. The distribution P(solspike) is estimated from the projected stimulus \nevaluated at the spike times, and the ratio of the two is the nonlinear input/output relation. \nA number of experiments have shown that the filter characteristics of HI are adaptive, and \nwe see this in the present experiments as well: as the amplitude u(t) is decreased, the filter \nchanges both in overall amplitude and shape. The filter becomes increasingly extended: \nthe system integrates over longer periods of time under conditions of low velocities. Thus \nthe filter depends on the input variance, and we expect that there should be an observable \nrelaxation of the filter to its new steady state form after a switch in variance. We find, \nhowever, that within 200 ms following the switch, the amplitude of the filter has already \nadjusted to the new variance, and further that the detailed shape of the filter has attained \nits steady state form in less than I s. The precise timescale of the establishment of the \nnew filter shape depends on the value of u: for the change to U1, the steady state form is \nachieved within 200 ms. The long tail of the low variance filter for U2 \u00ab (1) is established \nmore slowly. Nonetheless, these time scales which characterize adaptation of the filter are \nmuch shorter than the rate transients seen in the switching experiments, and are closer to \nwhat we might expect for an efficient estimator. \nWe construct time dependent input/output relations by forming conditional distributions us(cid:173)\ning spikes from particular time slices in a periodic experiment. In Figs. 3.I(b) and 3.I(c), \nwe show the input/output relation calculated in 1 s bins throughout the switching experi(cid:173)\nment. Within the first second the input/output relation is almost indistinguishable from its \nsteady state form. Further, it takes the same form for the two halves of the experiment: it is \nrescaled by the standard deviation, as was seen for the steady state experiments. The close \ncollapse or rescaling of the input/output relations depends not only on the normalisation by \nthe standard deviation, but also on the use of the \"local\" adapted filter (i.e. measured in the \nsame time bin). Returning to the sinusoidal experiments, the input/output relations were \n\n\f1b) \n\n2 \n\n. 1a) \n\n\";. -\n\n-1 \n\n10 , - - - - - - . - - - - - - - - , r----.,,--------,--,------, r - - - - - - - - - - - - - , \n\n10 \n\n15 \n\n20 \n\n10 \n\n15 \n\n20 \n\n25 \n\n30 \n\n20 \n\n40 \n\n60 \n\n80 \n\n\" > \n~('J \n\n001 \n\n10 \n\n3a) \n\n\" \n..... ili \n-.:: \n\n0 1 \n\n0.01 \n\nFigure 3: Input/output relations for (a) switching, (b) sinusoidal and (c) randomly modulated exper(cid:173)\niments. Figs. 3.1 show the modulation envelope u(t) , in log for (b) and (c) (solid), and the measured \nrate (dotted), normalised by mean and standard deviation. Figs. 3.2 show input/output relations \ncalculated in non-overlapping bins throughout the stimulus cycle, with the input 80 in units of the \nstandard deviation of the whole stimulus. Figs. 3.3 show the input/output relations with the input \nrescaled to units of the local standard deviation. \n\nconstructed for T = 45 s in 20 non-overlapping bins of width 2.25 s. Once again the func(cid:173)\ntions show a remarkable rescaling which is sharpened by the use of the appropriate local \nfilter: see Fig.3.2(b) and (c). Finally, we consider an amplitude which varies randomly with \ncorrelation time Tu '\" 3 s: u(t) is a repeated segment of the exponential of a Gaussian ran(cid:173)\ndom process, pictured in Fig.3.3(a), with periodicity T = 90s\u00bb Tu. Dividing the stimulus \ninto sequential bins of 2 s in width, we obtain the filters for each timeslice, and calculate \nthe local prior distributions, which are not Gaussian in this case as they are distorted by the \nlocal variations of u(t). Nonetheless, the ratio P(solspike)j P(so) conspires such that the \nform of the input/output relation is preserved. \nIn all three cases, our results show that the system rapidly and continuously adjusts its \ncoding strategy, rescaling the input/output relation with respect to the local variance of the \ninput as for steady state stimuli. Variance normalisation occurs as rapidly as is measurable, \nand the system chooses a similar form for the input/output relation in each case. \n\n5 \n\nInformation transmission \n\nWhat does this mean for the coding efficiency of the neuron? An experiment was designed \nto track the information transmission as a function of time. We use a small set of N 2 s \nlong random noise sequences {Si(t)}, i = 1, ... ,N, presented in random order at two \n\n\fdifferent amplitudes, 0\"1 and 0\"2. We then ask how much information the spike train conveys \nabout (a) which of the random segments Si(t) and (b) which of the amplitudes O\"j was \nused. Specifically, the experiment consists of a series of trials of length 2 s where the fast \ncomponent is one of the sequences {Si}, and after 1 s, the amplitude switches from 0\"1 to \n0\"2 or vice versa. N was taken to be 40, so that a 2 hour experiment provides approximately \n80 samples for each (Si,O\"j). This allows us to measure the mutual information between \nthe response and either the fast or the slow component of the stimulus as a function of time \nacross the 2 s repeated segment. We use only a very restricted subspace of 0\" and s: the \nmaximum available information about 0\" is 1 bit, and about sis log2N. \nThe spike response is represented by \"words\" [13], generated from the spike times discre(cid:173)\ntised to timebins of 2 ms, where no spike is represented by 0, and a spike by 1. A word \nis defined as the binary digit formed from 10 consecutive bins, so there are 210 possible \nwords. The information about the fast component S in trials of a given 0\" is \n\nIcr(w(t); s) = H[Pcr(w(t))] - L P(sj)H[Pcr(w(t); Sj)]' \n\nN \n\nj=l \n\nwhere H is the entropy of the word distribution: \n\nH[P(w(t))] = - L P(Wk(t)) log2 P(Wk(t)). \n\nk \n\n(3) \n\n(4) \n\nOne can compare this information for different values of 0\". Similarly, one can calculate \nthe information about the amplitude using a given probe s: \n\nIs (w(t); 0\") = H[Ps(w(t))]- L P(O\"j)H[Ps(w(t); O\"j)]. \n\n2 \n\n(5) \n\nj=l \n\nThe amount of information for each S j varies rapidly depending on the presence or absence \nof spikes, so we average these contributions over the {Sj} to give I(w; 0\"). \n\n__ I(w;s). upward sWitch \n\no \n\nI(w;s). downward SWitch \n\n-/>- I(w;,,) \n\n-075 \n\n-050 \n\n-0.25 \n\n000 \n\n025 \n\n0.50 \n\n075 \n\nTime relative to switch (sec) \n\nFigure 4: Information per spike as a function of time where 0\" is switched every 2 s. \n\nThe mutual information as a function of time is shown in Fig. 4, presented as bits/spike. \nAs one would expect, the amount of information transmitted per second about the stimulus \ndetails, or s, depends on the ensemble parameter 0\": larger velocities allow a higher SNR \nfor velocity estimation, and the system is able to transmit more information. However, \n\n\fwhen we convert the information rate to bits/spike, we find that the system is transmitting \nat a constant efficiency of around 1.4 bits/spike. Any change in information rate during a \nswitch from 0\"1 to 0\"2 is undetectable. For a switch from 0\"2 to 0\"1, the time to recovery is \nof order 100 ms. This demonstrates explicitly that the system is indeed rapidly maximising \nits information transmission. Further, the transient \"excess\" of spikes following an upward \nswitch provide information at a constant rate per spike. The information about the ampli(cid:173)\ntude, similarly, remains at a constant level throughout the trial. Thus, information about the \nensemble variable is retained at all times: the response is not ambiguous with respect to the \nabsolute scale of velocity. Despite the rescaling of input/output curves, responses within \ndifferent ensembles are distinguishable. \n\n6 Discussion \n\nWe find that the neural response to a stimulus with well-separated timescales S(t) = \nO\"(t)s(t) takes the form of a ratel8)timing code, where the response r(t) may be approxi(cid:173)\nmately modelled as \n\nr(t) = R[O\"(t)]g (s(t)). \n\n(6) \nHere R modulates the overall rate and depends on the slow dynamics of the variance enve(cid:173)\nlope, while the precise timing of a given spike in response to fast events in the stimulus is \ndetermined by the nonlinear input/output relation g, which depends only on the normalised \nquantity s(t). Through this apparent normalisation by the local standard deviation, g, as for \nsteady-state experiments, maximises information transmission about the fast components \nof the stimulus. The function R modulating the rate varies on much slower timescales so \ncannot be taken as an indicator of the extent of the system's adaptation to a new ensemble. \nRather, R appears to function as an independent degree of freedom, capable of transmitting \ninformation, at a slower rate, about the slow stimulus modulations. The presence of many \ntimescales in R may itself be an adaptation to the many timescales of variation in natural \nsignals. At the same time, the rapid readjustment of the input/output relation - and the \nconsequent recovery of information after a sudden change in 0\" -\nindicate that the adaptive \nmechanisms approach the limiting speed set by the need to gather statistics. \n\nAcknowledgments \n\nWe thank B. Agiiera y Arcas, N. Brenner and T. Adelman for helpful discussions. \n\nReferences \n\n[1] E. Adrian (1928) The Basis of Sensation (London Christophers) \n[2] F. Rieke, D. Warland, R. de Ruyter van Steveninck and W. Bialek (1997). Spikes: exploring the \n\nneural code. (Cambridge, MA: MIT Press). \n[3] F. Attneave (1954) P5YCh. Rev. 61,183-193. \n[4] H. B. Barlow (1961) in Sensory Communication, W. A. Rosenbluth, ed. (Cambridge, MA: MIT \n\nPress), pp.217-234. \n\n[5] S.B. Laughlin (1981) Z. Naturforsch. 36c, 910-912. \n[6] M. Potters and W. Bialek (1994) 1. Phys. I. France 4, 1755-1775. \n[7] S. Smirnakis, M. Berry, D. Warland, W. Bialek and M. Meister (1997) Nature 386, 69-73. \n[8] J. H. van Hateren (1997) Vision Research 37, 3407-3416. \n[9] R.R. de Ruyter van Steveninck, W. Bialek, M. Potters, and R. H. Carlson (1994) Proc. of the \n\nIEEE International Conference on Systems, Man and Cybernetics, 302-307. \n\n[10] N. Brenner, W. Bialek and R. de Ruyter van Steveninck (2000), Neuron, 26, 695-702. \n[11] M. deWeese and A. Zador (1998) Neural Compo 10, 1179-1202. \n[12] C. Schilstra and J. H. van Hateren (1999) 1. Exp. Biol. 202, 1481-1490. \n[13] S. Strong, R. Koberle, R. de Ruyter van Steveninck and W. Bialek (1998) Phys. Rev. Lett. 80, \n\n197-200. \n\n\f", "award": [], "sourceid": 1809, "authors": [{"given_name": "Adrienne", "family_name": "Fairhall", "institution": null}, {"given_name": "Geoffrey", "family_name": "Lewen", "institution": null}, {"given_name": "William", "family_name": "Bialek", "institution": null}, {"given_name": "Robert", "family_name": "van Steveninck", "institution": null}]}*