{"title": "Optimization Principles for the Neural Code", "book": "Advances in Neural Information Processing Systems", "page_first": 281, "page_last": 287, "abstract": null, "full_text": "Optimization Principles for the Neural \n\nCode \n\nMichael DeWeese \n\nSloan Center, Salk Institute \n\nLa Jolla, CA 92037 \ndeweese@salk.edu \n\nAbstract \n\nRecent experiments show that the neural codes at work in a wide \nrange of creatures share some common features. At first sight, these \nobservations seem unrelated. However, we show that these features \narise naturally in a linear filtered threshold crossing (LFTC) model \nwhen we set the threshold to maximize the transmitted information. \nThis maximization process requires neural adaptation to not only \nthe DC signal level, as in conventional light and dark adaptation, \nbut also to the statistical structure of the signal and noise distribu(cid:173)\ntions. We also present a new approach for calculating the mutual \ninformation between a neuron's output spike train and any aspect \nof its input signal which does not require reconstruction of the in(cid:173)\nput signal. This formulation is valid provided the correlations in \nthe spike train are small, and we provide a procedure for checking \nthis assumption. This paper is based on joint work (DeWeese [1], \n1995). Preliminary results from the LFTC model appeared in a \nprevious proceedings (DeWeese [2], 1995), and the conclusions we \nreached at that time have been reaffirmed by further analysis of the \nmodel. \n\n1 \n\nIntroduction \n\nMost sensory receptor cells produce analog voltages and currents which are smoothly \nrelated to analog signals in the outside world. Before being transmitted to the brain, \nhowever, these signals are encoded in sequences of identical pulses called action \npotentials or spikes. We would like to know if there is a universal principle at work \nin the choice of these coding strategies. The existence of such a potentially powerful \ntheoretical tool in biology is an appealing notion, but it may not turn out to be \nuseful. Perhaps the function of biological systems is best seen as a complicated \ncompromise among constraints imposed by the properties of biological materials, \nthe need to build the system according to a simple set of development rules, and \n\n\f282 \n\nM. DEWEESE \n\nthe fact that current systems must arise from their ancestors by evolution through \nrandom change and selection. In this view, biology is history, and the search for \nprinciples (except for evolution itself) is likely to be futile. Obviously, we hope that \nthis view is wrong, and that at least some of biology is understandable in terms of the \nsame sort of universal principles that have emerged in the physics of the inanimate \nworld. \n\nAdrian noticed in the 1920's that every peripheral neuron he checked produced dis(cid:173)\ncrete, identical pulses no matter what input he administered (Adrian, 1928). From \nthe work of Hodgkin and Huxley we know that these pulses are stable non-linear \nwaves which emerge from the non-linear dynamics describing the electrical proper(cid:173)\nties of the nerve cell membrane These dynamics in turn derive from the molecular \ndynamics of specific ion channels in the cell membrane. By analogy with other non(cid:173)\nlinear wave problems, we thus understand that these signals have propagated over a \ne.g. ~ one meter from touch receptors in a finger to their targets \nlong distance -\nin the spinal cord -\nso that every spike has the same shape. This is an important \nobservation since it implies that all information carried by a spike train is encoded \nin the arrival times of the spikes. Since a creature's brain is connected to all of its \nsensory systems by such axons, all the creature knows about the outside world must \nbe encoded in spike arrival times. \n\nUntil recently, neural codes have been studied primarily by measuring changes in the \nrate of spike production by different input signals. Recently it has become possible \nto characterize the codes in information-theoretic terms, and this has led to the \ndiscovery of some potentially universal features of the code (Bialek, 1996) (or see \n(Bialek, 1993) for a brief summary). They are: \n\n1. Very high information rates. The record so far is 300 bits per second in a \n\ncricket mechanical sensor. \n\n2. High coding efficiency. In cricket and frog vibration sensors, the information \n\nrate is within a factor of 2 of the entropy per unit time of the spike train. \n\n3. Linear decoding. Despite evident non-linearities ofthe nervous system, spike \ntrains can be decoded by simple linear filters . Thus we can write an estimate \nof the analog input signal s(t) as Sest (t) = Ei Kl (t - td, with Kl chosen to \nminimize the mean-squared errors (X 2 ) in the estimate. Adding non-linear \nK2(t - ti, t - tj) terms does not significantly reduce X2 . \n\n4. Moderate signal-to-noise ratios (SNR). The SNR in these experiments was \ndefined as the ratio of power spectra of the input signal to the noise referred \nback to the input; the power spectrum of the noise was approximated by X2 \ndefined above. All these examples of high information transmission rates \nhave SNR of order unity over a broad bandwidth, rather than high SNR in \na narrow band. \n\nWe will try to tie all of these observations together by elevating the first to a principle: \nThe neural code is chosen to maximize information transmission where information \nis quantified following Shannon. We apply this principle in the context of a simple \nmodel neuron which converts analog signals into spike trains. Before we consider \na specific model, we will present a procedure for expanding the information rate of \nany point process encoding of an analog signal about the limit where the spikes are \nuncorrelated. We will briefly discuss how this can be used to measure information \nrates in real neurons. \n\n\fOptimization Principles for the Neural Code \n\n283 \n\nThis work will also appear in Network. \n\n2 \n\nInformation Theory \n\nIn the 1940's, Shannon proposed a quantitative definition for \"information\" (Shan(cid:173)\nnon, 1949). He argued first that the average amount of information gained by \nobserving some event Z is the entropy of the distribution from which z is chosen, \nand then showed that this is the only definition consistent with several plausible \nrequirements. This definition implies that the amount of information one signal can \nprovide about some other signal is the difference between the entropy of the first \nsignal's a priori distribution and the entropy of its conditional distribution. The \naverage of this quantity is called the mutual (or transmitted) information. Thus, \nwe can write the amount of information that the spike train, {td, tells us about the \ntime dependent signal, s(t), as \n\n(1) \n\nwhere I1Jt; is shorthand for integration over all arrival times {til from 0 to T \nand summation over the total number of spikes, N (we have divided the integration \nmeasure by N! to prevent over counting due to equivalent permutations of the spikes, \nrather than absorb this factor into the probability distribution as we did in (DeWeese \n[1], 1995)). < ... >8= I1JsP[sO]'\" denotes integration over the space offunctions \ns(t) weighted by the signal's a priori distribution, P[{t;}ls()] is the probability \ndistribution for the spike train when the signal is fixed and P[{t;}] is the spike \ntrain's average distribution. \n\n3 Arbitrary Point Process Encoding of an Analog Signal \n\nIn order to derive a useful expression for the information given by Eq. (1), we need \nan explicit representation for the conditional distribution of the spike train. If we \nchoose to represent each spike as a Dirac delta function, then the spike train can be \ndefined as \n\nN \n\np(t) = L c5(t - t;). \n\n;=1 \n\nThis is the output spike train for our cell, so it must be a functional of both the \ninput signal, s(t), and all the noise sources in the cell which we will lump together \nand call '7(t). Choosing to represent the spikes as delta functions allows us to think \nof p(t) as the probability of finding a spike at time t when both the signal and noise \nare specified. In other words, if the noise were not present, p would be the cell's \nfiring rate, singular though it is. This implies that in the presence of noise the cell's \nobserved firing rate, r(t), is the noise average of p(t): \n\nr(t) = J 1J'7P ['70I s0]p(t) = (p(t))'1' \n\nNotice that by averaging over the conditional distribution for the noise rather than its \na priori distribution as we did in (DeWeese [1], 1995), we ensure that this expression \nis still valid if the noise is signal dependent, as is the case in many real neurons. \n\nFor any particular realization of the noise, the spike train is completely specified \nwhich means that the distribution for the spike train when both the signal and \n\n(2) \n\n(3) \n\n\f284 \n\nM. DEWEESE \n\nnoise are fixed is a modulated Poisson process with a singular firing rate, p(t). We \nemphasize that this is true even though we have assumed nothing about the encoding \nof the signal in the spike train when the noise is not fixed. One might then assume \nthat the conditional distribution for the spike tra.in for fixed signal would be the \nnoise average of the familiar formula for a modulated Poisson process: \n\n(4) \n\nHowever, this is only approximately true due to subtleties arising from the singular \nnature of p(t). One can derive the correct expression (DeWeese [1], 1995) by care(cid:173)\nfully taking the continuum limit of an approximation to this distribution defined for \ndiscrete time. The result is the same sum of noise averages over products of p's \nproduced by expanding the exponential in Eq. (4) in powers of f dtp(t) except that \nall terms containing more than one factor of p(t) at equal times are not present. \nThe exact answer is: \n\n(5) \n\nwhere the superscripted minus sign reminds us to remove all terms containing \nproducts of coincident p's after expanding everything in the noise average in powers \nof p. \n\n4 Expanding About the Poisson Limit \n\nAn exact solution for the mutual information between the input signal and spike \ntrain would be hopeless for all but a few coding schemes. However, the success \nof linear decoding coupled with the high information rates seen in the experiments \nsuggests to us that the spikes might be transmitting roughly independent information \n(see (DeWeese [1], 1995) or (Bialek, 1993) for a more fleshed out argument on this \npoint). If this is the case, then the spike train should approximate a Poisson process. \nWe can explicitly show this relationship by performing a cluster expansion on the \nright hand side of Eq. (5): \n\n(6) \n\nwhere we have defined ~p(t) == p(t)- < p(t) >'1= p(t) - r(t) and introduced C'1(m) \nwhich collects all terms containing m factors of ~p. For example, \n\nC (2) == ~ ,,(~Pi~Pj}q - J dt' ~ (~p' ~Pi}q + ~ J dt'dt\"(~ '~ \")-. \n\np P '1 \n\n(7) \n\n'1 \n\n2 L..J \ni\u00a2j \n\nr \u00b7r \u00b7 \n, J \n\nL..J \ni= l ' \n\nr \u00b7 \n\n2 \n\nClearly, if the correlations between spikes are small in the noise distribution, then \nthe C'1 's will be small, and the spike train will nearly approximate a modulated \nPoisson process when the signal is fixed. \n\n\fOptimization Principles for the Neural Code \n\n285 \n\nPerforming the cluster expansion on the signal average of Eq. (5) yields a similar \nexpression for the average distribution for the spike train: \n\n(8) \n\nwhere T is the total duration of the spike train, r is the average firing rate, and \nC'1. 8 (m) is identical to C'1(m) with these substitutions: r(t) --+ r, ~p(t) --+ ap(t) == \np(t) - f, and ( ... ); --+ {{ .. \u00b7);)8. In this case, the distribution for a homogeneous \nPoisson process appears in front of the square brackets, and inside we have 1 + \ncorrections due to correlations in the average spike train. \n\n5 The Transmitted Information \n\nInserting these expressions for P[ {til IsO] and P[ {til] (taken to all orders in ~p and \nap, respectively) into Eq. (1), and expanding to second non-vanishing order in fTc \nresults in a useful expression for the information (DeWeese [1], 1995): \n\n(9) \n\nwhere we have suppressed the explicit time notation in the correction term inside the \n\ndouble integral. If the signal and noise are stationary then we can replace the I; dt \nin front of each of these terms by T illustrating that the information does indeed \ngrow linearly with the duration of the spike train. \n\nThe leading term, which is exact if there are no correlations between the spikes, \ndepends only on the firing rate, and is never negative. The first correction is positive \nwhen the correlations between pairs of spikes are being used to encode the signal, \nand negative when individual spikes carry redundant information. This correction \nterm is cumbersome but we present it here because it is experimentally accessible, \nas we now describe. \n\nThis formula can be used to measure information rates in real neurons without \nhaving to assume any method of reconstructing the signal from the spike train. In \nthe experimental context, averages over the (conditional) noise distribution become \nrepeated trials with the same input signal, and averages over the signal are accom(cid:173)\nplished by summing over all trials. r(t), for example, is the histogram of the spike \ntrains resulting from the same input signal, while f(t) is the histogram of all spike \ntrains resulting from all input signals. If the signal and noise are stationary, then \nf will not be time dependent. {p(t)p(t'))'1 is in general a 2-dimensional histogram \nwhich is signal dependent: It is equal to the number of spike trains resulting from \nsome specific input signal which simultaneously contain a spike in the time bins \ncontaining t and t'. If the noise is stationary, then this is a function of only t - t', \nand it reduces to a 1-dimensional histogram. \n\nIn order to measure the full amount of information contained in the spike train, it \nis crucial to bin the data in small enough time bins to resolve all of the structure in \n\n\f286 \n\nM. DEWEESE \n\nr(t), (p(t)p(t'))'l' and so on. We have assumed nothing about the noise or signal; \nin fact, they can even be correlated so that the noise averages are signal dependent \nwithout changing the experimental procedure. The experimenter can also choose \nto fix only some aspects of the sensory data during the noise averaging step, thus \nmeasuring the mutual information between the spike train and only these aspects of \nthe input. The only assumption we have made up to this point is that the spikes \nare roughly uncorrelated which can be checked by comparing the leading term to \nthe first correction, just as we do for the model we discuss in the next section. \n\n6 The Linear Filtered Threshold Crossing Model \n\nAs we reported in a previous proceedings (DeWeese [2], 1995) (and see (DeWeese \n[1], 1995) for details), the leading term in Eq. (9) can be calculated exactly in the \ncase of a linear filtered threshold crossing (LFTC) model when the signal and noise \nare drawn from independent Gaussian distributions. Unlike the Integrate and Fire \n(IF) model, the LFTC model does not have a \"renewal process\" which resets the \nvalue of the filtered signal to zero each time the threshold is reached. Stevens and \nZador have developed an alternative formulation for the information transmission \nwhich is better suited for studying the IF model under some circumstances (Stevens, \n1995), and they give a nice discussion on the way in which these two formulations \ncompliment each other. \n\nFor the LFTC model, the leading term is a function of only three variables: 1) The \nthreshold height; 2) the ratio of the variances of the filtered signal and the filtered \nnoise, (s2(t)),/(7J2(t))'l' which we refer to as the SNR; 3) and the ratio of correlation \ntimes ofthe filtered signal and the filtered noise, T,/T'l' where T; == (S2(t)),/(S2(t)), \nand similarly for the noise. In the equations in this last sentence, and in what follows, \nwe absorb the linear filter into our definitions for the power spectra of the signal and \nnoise. Near the Poisson limit, the linear filter can only affect the information rate \nthrough its generally weak influence on the ratios of variances and correlation times \nof the signal and noise, so we focus on the threshold to understand adaptation in \nour model cell. \n\nWhen the ratio of correlation times of the signal and noise is moderate, we find a \nthe leading term ~ lOx \nmaximum for the information rate near the Poisson limit -\nthe first correction. For the interesting and physically relevant case where the noise \nis slightly more broadband than the signal as seen through the cell's prefiltering, \nwe find that the maximum information rate is achieved with a threshold setting \nwhich does not correspond to the maximum average firing rate illustrating that this \noptimum is non-trivial. Provided the SNR is about one or less, linear decoding does \nwell -\na lower bound on the information rate based on optimal linear reconstruction \nof the signal is within a factor of two of the total available information in the spike \ntrain. As SNR grows unbounded, this lower bound asymptotes to a constant. In \naddition, the required timing resolution for extracting the information from the spike \ntrain is quite modest - discretizing the spike train into bins which are half as wide \nas the correlation time of the signal degrades the information rate by less than 10%. \nHowever, at maximum information transmission, the information per spike is low -\nRmaz/r ~ .7 bits/spike, much lower than 3 bits/spike seen in the cricket. This low \ninformation rate drives the efficiency down to 1/3 of the experimental values despite \nthe model's robustness to timing jitter. Aside from the low information rate, the \noptimized model captures all the experimental features we set out to explain. \n\n\fOptimization Principles for the Neural Code \n\n287 \n\n7 Concluding Remarks \n\nWe have derived a useful expression for the transmitted information which can be \nused to measure information rates in real neurons provided the correlations between \nspikes are shorter range than the average inter-spike interval. We have described \na method for checking this hypothesis experimentally. The four seemingly unre(cid:173)\nlated features that were common to several experiments on a variety of neurons \nare actually the natural consequences of maximizing the transmitted information. \nSpecifically, they are all due to the relation between if and Tc that is imposed by \nthe optimization. We reiterate our previous prediction (DeWeese [2], 1995; Bialek, \n1993): Optimizing the code requires that the threshold adapt not only to cancel \nDC offsets, but it must adapt to the statistical structure of the signal and noise. \nExperimental hints at adaptation to statistical structure have recently been seen in \nthe fly visual system (de Ruyter van Steveninck, 1994) and in the salamander retina \n(Warland, 1995). \n\n8 References \n\nM. DeWeese 1995 Optimization Principles for the Neural Code (Dissertation, Prin(cid:173)\nceton University) \n\nM. DeWeese and W. Bialek 1995 Information flow in sensory neurons II Nuovo \nCimento l7D 733-738 \n\nE. D. Adrian 1928 The Basis of Sensation (New York: W. W. Norton) \n\nF . Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek 1996 Neural \nCoding (Boston: MIT Press) \n\nW. Bialek, M. DeWeese, F. Rieke, and D. Warland 1993 Bits and Brains: Informa(cid:173)\ntion Flow in the Nervous System Physica A 200 581-593 \n\nC. E. Shannon 1949 Communication in the presence of noise, Proc. \n10-21 \n\nI. R. E. 37 \n\nC. Stevens and A. Zador 1996 Information Flow Through a Spiking Neuron in M. \nHasselmo ed Advances in Neural Information Processing Systems, Vol 8 (Boston: \nMIT Press) (this volume) \n\nR.R. de Ruyter van Steveninck, W. Bialek, M. Potters, R.H. Carlson 1994 Statistical \nadaptation and optimal estimation in movement computation by the blowfly visual \nsystem, in IEEE International Conference On Systems, Man, and Cybernetics pp \n302-307 \n\nD. Warland, M. Berry, S. Smirnakis, and M. Meister 1995 personal communication \n\n\f", "award": [], "sourceid": 1120, "authors": [{"given_name": "Michael", "family_name": "DeWeese", "institution": null}]}