{"title": "Coding of Naturalistic Stimuli by Auditory Midbrain Neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 103, "page_last": 109, "abstract": "", "full_text": "Coding of Naturalistic Stimuli by \n\nAuditory Midbrain Neurons \n\nH. Attias* and C.E. Schreinert \n\nSloan Center for Theoretical Neurobiology and \n\nW.M. Keck Foundation Center for Integrative Neuroscience \n\nUniversity of California at San Francisco \n\nSan Francisco, CA 94143-0444 \n\nAbstract \n\nIt is known that humans can make finer discriminations between \nfamiliar sounds (e.g. syllables) than between unfamiliar ones (e.g. \ndifferent noise segments). Here we show that a corresponding en(cid:173)\nhancement is present in early auditory processing stages. Based on \nprevious work which demonstrated that natural sounds had robust \nstatistical properties that could be quantified, we hypothesize that \nthe auditory system exploits those properties to construct efficient \nneural codes. To test this hypothesis, we measure the informa(cid:173)\ntion rate carried by auditory spike trains on narrow-band stimuli \nwhose amplitude modulation has naturalistic characteristics, and \ncompare it to the information rate on stimuli with non-naturalistic \nmodulation. We find that naturalistic inputs significantly enhance \nthe rate of transmitted information, indicating that auditiory neu(cid:173)\nral responses are matched to characteristics of natural auditory \nscenes. \n\n1 Natural Scene Statistics and the Neural Code \n\nA primary goal of hearing research is to understand how complex sounds that occur \nin natural scenes are processed by the auditory system. However, natural sounds \nare difficult to describe quantitatively and the complexity of auditory responses \nthey evoke makes it hard to gain insight into their processing. Hence, most studies \nof auditory physiology are restricted to pure tones and noise stimuli, resulting in \na limited understanding of auditory encoding. In this paper we pursue a novel \napproach to the study of natural sound encoding in auditory spike trains. Our \n\n\u2022 Corresponding author. E-mail: hagai@phy.ucsf.edu. \nt E-mail: chris@phy.ucsf.edu . \n\n\f104 \n\nH. Attias and C. E. Schreiner \n\n~ 11111111 I II \n\nI II I I 111111 \n\n~ I 111111 III 11111111 \n\nIII II III U II \n\nFigure 1: Left: amplitude modulation stimulus drawn from a naturalistic stimulus \nset, and the evoked spike train of an inferior colliculus neuron. Right: amplitude \nmodulation from a non-naturalistic set and the evoked spike train of the same \nneuron. \n\nmethod consists of measuring statistical characteristics of natural auditory scenes, \nand incorporating them into simple stimuli in a systematic manner, thus creating \n'naturalistic' stimuli which enable us to study the encoding of natural sounds in a \ncontrolled fashion. The first stage of this program has been described in (Attias \nand Schreiner 1997); the second is reported below. \nFig. 1 shows two segments of long stimuli and the corresponding spike trains of \nthe same neuron, elicited by pure tones that were amplitude-modulated by these \nstimuli. While both stimuli appear to be random and to have the same mean and \nboth spike trains have the same firing rate, one may observe that high and low \namplitudes are more likely to occur in the stimulus on the left; indeed, these stimuli \nare drawn from two stimulus sets with different statistical properties. Our present \nstudy of auditory coding focuses on assessing the efficiency of this neural code: \nfor a given stimulus set, how well can the animal reconstruct the input sound and \ndiscriminate between similar sound segments, based on the evoked spike train, and \nhow those abilities are affected by changing the stimulus statistics. We quantify \nthe discrimination capability of auditory neurons in the inferior colliculus of the cat \nusing concepts from information theory (Bialek et al. 1991; Rieke et al. 1997). \n\nThis leads to the issue of optimal coding (Atick 1992). Theoretically, given an \nauditory scene with particular statistical properties, it is possible to design an en(cid:173)\ncoding scheme that would exploit those properties, resulting in a neural code that \nis optimal for that scene but is consequently less efficient for other scenes. Here we \ninvestigate the hypothesis that the auditory system uses a code that is adapted to \nnatural auditory scenes. This question is addressed by comparing the discrimination \ncapability of auditory neurons between sound segments drawn from a naturalistic \nstimulus set, to the one for a non-naturalistic set. \n\n2 Statistics of Natural Sounds \n\nAs a first step in investigating the relation between neural responses and auditory \ninputs, we studied and quantified temporal statistics of natural auditory scenes {At(cid:173)\ntias and Schreiner 1997}. It is well-known that different locations on the basal mem(cid:173)\nbrane respond selectively to different frequency components of the incoming sound \nx{t) (e.g., Pickles 1988), hence the frequency v corresponds to a spatial coordinate, \nin analogy with retinal location in vision. We therefore analyzed a large database \nof sounds, including speech, music, animal vocalizations, and background sounds, \nusing various filter banks comprising 0 -10kHz. In each frequency band v, the am(cid:173)\nplitude a{t) ~ 0 and phase r/>{t) ofthe band-limited signal xv(t) = a{t) cos{vt+r/>{t)) \nwere extracted, and the amplitude probability distribution p(a) and auto-correlation \n\n\fCoding of Naturalistic Stimuli by Auditory Midbrain Neurons \n\n105 \n\nPiano music \n\nSymphonic: music: \n\n0.6 \n\n0.5 \n\nCat vQcalizatlona \n\nBird 80ngs \n\nBa.okground sound. \n\n0 . 5 \n\n0 . 5 \n\n0 . 4 \n\n0 . 2 \n\no . ~ \n\no~~--------~~ \n-4 \n\nFigure 2: Log-amplitude distribution in several sound ensembles. Different curves \nfor a given ensemble correspond to different frequency bands. The low amplitude \npeak in the cat plot reflect abundance of silent segments. The theoretical curve p(a) \n(1) is plotted for comparison (dashed line). \n\nfunction c(r) = (a(t)a(t + r)) were computed, as well as those of the instantaneous \nfrequency d\u00a2(t)/dt. \nThose statistics were found to be nearly identical in all bands and across all ex(cid:173)\nIn particular, the distribution of the log-amplitude a = log a, \namined sounds. \nnormalized to have zero mean and unit variance, could be well-fitted to the form \n\np(a) = ,8 exp (,8a + Q -\n\ne.Bii+t:t) \n\n(1) \n(with normalization constants Q = -.578 and ,8 = 1.29), which should, however, be \ncorrected at large amplitude (> 5a). Several examples are displayed in Fig. 1. The \nlog-amplitude distribution (1) corresponds mathematically to the amplitude distri(cid:173)\nbution of musical instruments and vocalizations, found to be p(a) = e- a (known as \nthe Laplace distribution in speech signal processing), as well as that of background \nsounds, where p(a) ae) = 0 at each time point t independently, using a cutoff modulation \nfrequency of fe = 100Hz (i.e., 1 a(J ::; fe) 1= const., 1 a(J > fe) 1= 0, where a(J) is \nthe Fourier transform of a{t)). We also used a non-naturalistic stimulus set where \na(t) was chosen from a uniform distribution p(O ::; a ~ be) = 1lbe, p(a > be) = 0, \nwith be adjusted so that both stimulus sets had the same mean. A short segment \nfrom each set is shown in Fig. 1, and the two distributions are plotted in Figs. 3,4 \n(right) . \nStimuli of 15 - 20min duration were played to ketamine-anesthetized cats. To \nminimize adaptation effects we alternated between the two sets using 10sec long \nsegments. Single-unit recordings were made from the inferior colliculus (IC), a sub(cid:173)\nthalamic auditory processing stage (e.g., Pickles 1988). Each IC unit responds best \nto a narrow range of sound frequencies, the center of which is called its 'best fre(cid:173)\nquency' (BF). Neighboring units have similar BF's, in accord with the topographic \nfrequency organization of the auditory system. For each unit, stimuli with carrier \nfrequency v at most 500Hz away from the unit's BF were used. Firing rates in \nresponse to those stimuli were between 60 - 100Hz. The stimulus and the electrode \nsignal were recorded simultaeneously at a sampling rate of 24kHz. After detecting \nand sotring the spikes and extracting the stimulus amplitude, both amplitude and \nspike train were down-sampled to 3kHz. \n\n3.2 Analysis \n\nIn order to assess the ability to discriminate between different inputs based on the \nobserved spike train, we computed the mutual information Ir,s between the spike \ntrain response r(t) = Li o(t - ti), where ti are the spike times, and the stimulus \namplitude s(t). I consists of two terms, Ir,s = Hs - Hslr' where Hs is the stimulus \nentropy (the log-number of different stimuli) and Hslr is the entropy of the stimulus \nconditioned on the response (the log-number of different stimuli that could elicit \na given response, and thus could not be discriminated based on that response, \naveraged over all responses). Our approach generally follows the ideas of (Bialek et \nal. 1991; Rieke et al. 1997). \nTo simplify the calculation, we first modified the stimuli s(t) to get s'(t) = f(s(t\u00bb, \nwhere the function f(s) was chosen so that s' was Gaussian. Hence for exponential \nstimuli f(s) = y'(2)erfi(1-2e- S ) and for uniform stimuli f(s) = y'(2)erfi(2slbe-1), \nwhere erfi is the inverse error function. This Gaussianization has two advantages: \nfirst, the expression for the mutual information Ir,s' (= Ir,s) is now simpler, being \ngiven by the frequency-dependent signal-to-noise ratio SNR(J) (see below), since \nHs' depends only on the power spectrum of s'(t); second and more importantly, \nthe noise distribution was observed to become closer to Gaussian following this \ntransformation. \nTo compute Hs'lr we bound it from above by ftc dfH[s'(J) 1 f(J)], the calculation of \nwhich requires the conditional distribution p[s'(J) 1 f(J)] (note that these variables \nare complex, hence this is the joint ditribution of the real and imaginary parts). \nThe latter is approximated by a Gaussian with mean s~(J) and variance Nr(f). \nThis variance is, in fact, the power spectrum of the noise, Nr(J) = (I nr(J) 12 ), \nwhich we define by nr(t) = s'(t) - s~(t). Computing the mutual information for \nthose Gaussian distributions is straightforward and provides a lower bound on the \n\n\fCoding of Naturalistic Stimuli by Auditory Midbrain Neurons \n\n107 \n\n1 \n\n0.8 \n\nI, \nI, \n\" I , \n\nI \nI \nI \n\n, \n, \n, \n\n1.6 \n\n60 \nf \n\n100 \n\nFigure 3: Left: signal-to-noise ratio SNR(f) vs. modulation frequency I for natu(cid:173)\nralistic stimuli. Right: normalized noise distribution (solid line), amplitude distri(cid:173)\nbution of stimuli (dashed line) and of Gaussianized stimuli (dashed-dotted line). \n\ntrue Ir,s, \n\nIr,s = Ir,s' ~ J dllog2 SNR(f) . \n\nIe \n\no \n\n(2) \n\nThe signal-to-noise ratio is given by SNR(f) = S'(J)j(Nr(f))r, where S'(f) = (I \ns'(J) 12} is the spectrum of the Gaussianized stimulus and the averaging Or is \nperformed over all responses. \nThe main object here is s~(J), which is an estimate of the stimulus from the elicited \nspike train, and would optimally be given by the conditional mean J ds's'p(s' 1 f) \nat each I (Kay 1993). For Gaussian p( S' ,f) this estimator, which is gener-\nally non-linear, becomes linear in f(f) and is given by h(J)f(J), where h(J) = \n(s'(J)f*(J)}j(f(J)f*(f\u00bb) is the Wiener filter. However, since our distributions were \nonly approximately Gaussians we used the conditional mean, obtained by the kernel \nestimate \n\ns~(f) = \n\n(3) \n\nwhere k is a Gaussian kernel, R(J) is the spectrum of the spike train, and i indexes \nthe data points obtained by computing FFT using a sliding window. The scaling \nby y'Si,.,fii reflects the assumption that the distributions at all I differ only by \ntheir variance, which enables us to use the data points at all frequencies to estimate \ns~ at a given I. Our estimate produced a slightly higher SNR(f) than the Wiener \nestimate used by (Bialek et al. 1991;Rieke et al. 1997) and others. \n\n4 \n\nInformation on Naturalistic Stimuli \n\nThe SNR(f) for exponential stimuli is shown in Fig. 3 (left) for one of our units. \nIe neurons have a preferred modulation frequency 1m (e.g., Pickles 1988), which is \nabout 40Hz for this unit; notice that generally SNR(J) ~ 1, with equality when the \nstimulus and response are completely independent. Thus, stimulus components at \nfrequencies higher than 60Hz effectively cannot be estimated from the spike train. \nThe stimulus amplitude distribution is shown in Fig. 3 (right, dashed line), together \nwith the noise distribution (normalized to have unit variance; solid line) which is \nnearly Gaussian. \n\n\fJ08 \n\nH Attias and C. E. Schreiner \n\n4,-----------------------, \n\n.... .. \n\n1 \n\n0.8 \n\n~o.e \n\n0.4 \n\n. '. \n\n3.5 \n\n3 \n~2.6 \nen \n\n2 \n\n1.5 \n\n0.2 \n\n.. \nFigure 4: Left: signal-to-noise ratio SNR(f) vs. modulation frequency f for non(cid:173)\nnaturalistic stimuli (solid line) compared with naturalistic stimuli (dotted line). \nRight: normalized noise distribution (solid line), amplitude distribution of stim(cid:173)\nuli (dashed line) compared with that of naturalistic stimuli (dotted line), and of \nGaussianized stimuli (dashed-dotted line). \n\n0 \n-5 \n\n100 \n\nUsing (2) we obtain an information rate of Ir,s ~ 114bit/sec. For the spike rate \nof 82spike/sec measured in this unit, this translates into 1.4bit/spike. Averaging \nacross units, we have 1.3 \u00b1 0.2bit/spike for naturalistic stimuli. \nAlthough this information rate was computed using the conditional mean estimator \n(3), it is interesting to examine the Wiener filter h{t) which provides the optimal \nlinear estimator of the stimulus, as discussed in the previous section. This filter \nis displayed in Fig. 5 (solid line) and has a temporal width of several tens of \nmilliseconds. \n\n5 \n\nInformation on Non-Naturalistic Stimuli \n\nThe SNR(f) for uniform stimuli is shown in Fig. 4 (left, solid line) for the same \nunit as in Fig. 3, and is significantly lower than the corresponding SNR(f) for ex(cid:173)\nponential stimuli plotted for comparison (dashed line). For the mutual information \nrate we obtain Ir,B ~ 77bit/sec, which amounts to 0.94bit/spike. Averaging across \nunits, we have 0.8 \u00b1 0.2bit/spike for non-naturalistic stimuli. \nThe stimulus amplitude distribution is shown in Fig. 4 (right, dashed line), together \nwith the exponential distribution (dotted line) plotted for comparison, as well as \nthe noise distribution (normalized to have unit variance). The noise in this case is \nless Gaussian than for exponential stimuli, suggesting that our calculated bound on \nIr,s may be lower for uniform stimuli. \nFig. 5 shows the stimulus reconstruction filter (dashed line). It has a similar time \ncourse as the filter for exponential stimuli, but the decay is significantly slower and \nits temporal width is more than 100msec. \n\n6 Conclusion \n\nWe measured the rate at which auditory neurons carry information on simple stimuli \nwith naturalistic amplitude modulation, and found that it was higher than for \nstimuli with non-naturalistic modulation. A result along the same lines for the frog \nwas obtained by (Rieke et al. 1995) using Gaussian signals whose spectrum was \nshaped according to the frog call spectrum. Similarly, work in vision (Laughlin 1981; \nField 1987; Atick and Redlich 1990; Ruderman and Bialek 1994; Dong and Atick \n1995) suggests that visual receptive field properties are consistent with optimal \ncoding predictions based on characteristics of natural images. Future work will \nexplore coding of stimuli with more complex natural statistical characteristics and \n\n\fCoding a/Naturalistic Stimuli by Auditory Midbrain Neurons \n\n109 \n\n0 .6,------------------------. \n0 . 4 \n\n0 . 3 \n\n~ \n\n0 . 2 \n\n0 . '\" \n\no 1.-___ -'-=----::;::-.::: v Vrr -~------==---__l \n\n__ -.r..1 \n\n-0.'\" \n-0 .2 L---_-=o=-. ..,=-------::0::-------::0,--. ..,=------' \n\nt \n\nFigure 5: Impulse response of Wiener reconstruction filter for naturalistic stimuli \n(solid line) and non-naturalistic stimuli (dashed line). \n\nwill extend to higher processing stages. \n\nAcknowledgements \n\nWe thank W. Bialek, K. Miller, S. Nagarajan, and F. Theunissen for useful discus(cid:173)\nsions and B. Bonham, M. Escabi, M. Kvale, L. Miller, and H. Read for experimental \nsupport. Supported by The Office of Naval Research (NOOOI4-94-1-0547), NIDCD \n(ROI-02260), and the Sloan Foundation. \n\nReferences \n\nJ.J. Atick and N. Redlich (1990). Towards a theory of early visual processing. \nNeural Comput. 2,308-320. \nJ .J. Atick (1992). Could information theory provide an ecological theory of sensory \nprocessing. Network 3, 213-251. \nH. Attias and C.E. Schreiner (1997). Temporal low-order statistics of natural \nsounds. In Advances in Neural Information Processing Systems 9, MIT Press. \nW. Bialek, F. Rieke, R. de Ruyter van Steveninck, and D. Warland (1991). Reading \nthe neural code. Science 252, 1854-1857. \n\nD.W. Dong and J.J. Atick (1995). Temporal decorrelation: a theory of lagged and \nnon-lagged responses in the lateral geniculate nucleus. Network 6, 159-178. \n\nD.J. Field (1987). Relations between the statistics of natural images and the re(cid:173)\nsponse properties of cortical cells. J. Opt. Soc. Am. 4, 2379-2394. \nS.M. Kay (1993). Fundamentals of Statistical Signal Processing: Estimation Theory. \nPrentice-Hall, New Jersey. \nS.B. Laughlin (1981). A simple coding procedure enhances a neuron's information \ncapacity. Z. Naturforsch. 36c, 910-912. \n\nJ.O. Pickles (1988). An introduction to the physiology of hearing (2nd Ed.). San \nDiego, CA: Academic Press. \nF. Rieke, D. Bodnar, and W. Bialek (1995). Naturalistic stimuli increase the rate \nand efficiency of information transmission by primary auditory neurons. Proc. R. \nSoc . Lond. B, 262, 259-265. \nF. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek (1997). Spikes: \nExploring the Neural Code. MIT Press, Cambridge, MA. \nD.L. Ruderman and W. Bialek (1994). Statistics of natural images: scaling in the \nwoods. Phys. Rev. Lett. 73,814-817. \n\n\f", "award": [], "sourceid": 1401, "authors": [{"given_name": "Hagai", "family_name": "Attias", "institution": null}, {"given_name": "Christoph", "family_name": "Schreiner", "institution": null}]}