{"title": "Information Rates and Optimal Decoding in Large Neural Populations", "book": "Advances in Neural Information Processing Systems", "page_first": 846, "page_last": 854, "abstract": "Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.", "full_text": "Information Rates and Optimal Decoding in Large\n\nNeural Populations\n\nKamiar Rahnama Rad Liam Paninski\nDepartment of Statistics, Columbia University\n\n{kamiar,liam}@stat.columbia.edu\n\nhttp://www.stat.columbia.edu/\u02dcliam/research/pubs/kamiar-ss-info.pdf\n\nAbstract\n\nMany fundamental questions in theoretical neuroscience involve optimal decod-\ning and the computation of Shannon information rates in populations of spiking\nneurons. In this paper, we apply methods from the asymptotic theory of statistical\ninference to obtain a clearer analytical understanding of these quantities. We \ufb01nd\nthat for large neural populations carrying a \ufb01nite total amount of information, the\nfull spiking population response is asymptotically as informative as a single obser-\nvation from a Gaussian process whose mean and covariance can be characterized\nexplicitly in terms of network and single neuron properties. The Gaussian form\nof this asymptotic suf\ufb01cient statistic allows us in certain cases to perform optimal\nBayesian decoding by simple linear transformations, and to obtain closed-form\nexpressions of the Shannon information carried by the network. One technical\nadvantage of the theory is that it may be applied easily even to non-Poisson point\nprocess network models; for example, we \ufb01nd that under some conditions, neural\npopulations with strong history-dependent (non-Poisson) effects carry exactly the\nsame information as do simpler equivalent populations of non-interacting Poisson\nneurons with matched \ufb01ring rates. We argue that our \ufb01ndings help to clarify some\nresults from the recent literature on neural decoding and neuroprosthetic design.\n\nIntroduction\nIt has long been argued that many key questions in neuroscience can best be posed in information-\ntheoretic terms; the ef\ufb01cient coding hypothesis discussed in [2, 3, 1], represents perhaps the best-\nknown example. Answering these questions quantitatively requires us to compute the Shannon\ninformation rate of neural channels, whether numerically using experimental data or analytically\nin mathematical models. In many cases it is useful to exploit connections with \u201cideal observer\u201d\nanalysis, in which the performance of an optimal Bayesian decoder places fundamental bounds on\nthe performanceof any biological system given access to the same neural information. However, the\nnon-linear, non-Gaussian, and correlated nature of neural responses has hampered the development\nof this theory, particularly in the case of high-dimensional and/or time-varying stimuli.\nThe neural decoding literature is far too large to review systematically here; instead, we will focus\nour attention on work which has attempted to develop an analytical theory to simplify these complex\ndecoding and information-rate problems. Two limiting regimes have received signi\ufb01cant analytical\nattention in the neuroscience literature. In the \u201chigh-SNR\u201d regime, n \u2192 \u221e, where n is the number\nof neurons encoding the signal of interest; if the information rate of each neuron is bounded away\nfrom zero and neurons respond in a conditionally weakly-dependentmanner given the stimulus, then\nthe total information provided by the neural population becomes in\ufb01nite, and the error rate of any\nreasonable neural decoder tends to zero. For discrete stimuli, the Shannon information is effectively\ndetermined in this asymptotic limit by a simpler quantity known as the Chernoff information [9];\nfor continuous stimuli, maximum likelihood estimation is asymptotically optimal, and the asymp-\n\n1\n\n\ftotic Shannon information is controlled by the Fisher information [8, 7]. On the other hand we can\nconsider the \u201clow-SNR\u201d limit, where only a few neurons are observed and each neuron is asymptot-\nically weakly tuned to the stimulus. In this limit, the Shannon information tends to zero, and under\ncertain conditions the optimal Bayesian estimator (which can be strongly nonlinear in general) can\nbe approximated by a simpler linear estimator; see [5] and more recently [16] for details.\nIn this paper, we study information transmission and optimal decoding in what we would argue\nis a more biologically-relevant \u201cintermediate\u201d regime, where n is large but the total amount of\ninformation provided by the population remains \ufb01nite, and the problem of decoding the stimulus\ngiven the population neural activity remains nontrivial.\nLikelihood in the intermediate regime: the inhomogeneous Poisson case\nFor clarity, we begin by analyzing the information in a simple population of neurons, represented as\ninhomogenous Poisson processes that are conditionally independent given the stimulus. We will ex-\ntend our analysis to more general neural populations in the next section. In response to the stimulus,\nat each time step t neuron i \ufb01res with probability \u03bbi(t)dt, where the rate is given by\n\n\u03bbi(t) = f [bi(t) + \u0001#i,t(\u03b8)] ,\n\n(1)\nwhere f (.) is a smooth rectifying non-linearity and \u0001 is a gain factor controlling each neuron\u2019s\nsensitivity. The baseline \ufb01ring rate is determined by bi(t) and is independent of the input signal.\nThe true stimulus at time t is de\ufb01ned by \u03b8t, and \u03b8 abbreviates the time varying stimulus \u03b80:T in the\ntime interval [0, T dt]. The term #i,t(\u03b8) summarizes the dependence of the neuron\u2019s \ufb01ring rate on\n\u03b8; depending on the setting, this term may represent e.g. a tuning curve or a spatiotemporal \ufb01lter\napplied to the stimulus (see examples below).\nThe likelihood includes all the information about the stimulus encoded in the population\u2019s spiking\nresponse. Neuron i\u2019s response at time step t is designated by by the binary variable ri(t). The log-\nlikelihood at the parameter value \u03d1 (which may be different from the true parameter \u03b8) is given by\nthe standard point-process formula [21]:\n\nL\u03d1(r) := log p(r|\u03d1) =\n\n!i=1\nThis expression can be expanded around \u0001 = 0:\n\u2202L\u03d1(r)\n\nL\u03d1(r) = L\u03d1(r)|\u0001=0 + \u0001\n\n\u2202\u0001\n\n|\u0001=0 +\n\nwhere\n\nn\n\nT\n\n!t=0\n\nri(t) log \u03bbi(t) \u2212 \u03bbi(t)dt.\n\n(2)\n\n1\n2\n\n\u00012 \u22022L\u03d1(r)\n\n\u2202\u00012\n\n|\u0001=0 + O(n\u00013),\n\n\u2202L\u03d1(r)\n\n\u2202\u0001\n\n\u22022L\u03d1(r)\n\n\u2202\u00012\n\nf!\n\n|\u0001=0 = !i,t\n|\u0001=0 = !i,t\n\n#i,t(\u03d1)\"ri(t)\ni,t(\u03d1)\"ri(t)#\n\nf #bi(t)$ \u2212 f!(bi(t))dt%\nf $!#bi(t)$ \u2212 f!!(bi(t))dt%.\n\nf!\n\n#2\n\nLet ri denote the vector representation of the ith neuron\u2019s spike train and let1\n\ngi(ri)\n\nhi(ri)\n\n\u00b7\u00b7\u00b7\n\nf!\nf\n\n#i(\u03d1)\n\nri(T )\n\nf!\n(bi(1)) \u2212 f!(bi(1))dt\nf\nf!\nf $!(bi(1)) \u2212 f!!(bi(1))dt\n#i,T (\u03d1)\u2019T\n\n:= &ri(1)\n:= &ri(1)#\n:= &#i,1(\u03d1) #i,2(\u03d1)\n!i=1\nL\u03d1(r) = L\u03d1(r)|\u0001=0 + \u0001\n1With a slight abuse of notation, we use T for both the total number of time steps and the transpose opera-\n\n(bi(T )) \u2212 f!(bi(T ))dt\u2019T\nf $!(bi(T )) \u2212 f!!(bi(T ))dt\u2019T\n\n#i(\u03d1)T diag[hi(ri)]#i(\u03d1) + O(n\u00013).\n\nri(T )#\n\n#i(\u03d1)T gi(ri) +\n\n!i=1\n\n1\n2\n\n\u00012\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\n;\n\nn\n\nn\n\nf!\n\nthen\n\ntion; the difference is clear from the context.\n\n2\n\n\fThis second-order loglikelihood expansion is standard in likelihood theory [24]; as usual, the \ufb01rst\nterm is constant in \u03d1 and can therefore be ignored, while the third (quadratic) term controls the\ncurvature of the loglikelihood at \u0001 = 0, and scales as \u0001n2.\nIn the high-SNR regime discussed\nabove, where n \u2192 \u221e and \u0001 is \ufb01xed, the likelihood becomes sharply peaked at \u03b8 (and therefore the\nFisher information, which may be understood as the curvature of the log-likelihood at \u03b8, controls the\nasymptotics of the estimation error in the case of continuous stimuli), and estimation of \u03b8 becomes\neasy; in the low-SNR regime, we \ufb01x n and consider the \u0001 \u2192 0 limit.\nNow, \ufb01nally, we can more precisely de\ufb01ne the \u201cintermediate\u201d SNR regime: we will focus on the\ncase of large populations (n \u2192 \u221e), but in order to keep the total information in a \ufb01nite range we\nneed to scale the sensitivity \u0001 as \u0001 \u223c n\u22121/2. In this setting, the error term O(n\u00013) = O(n\u2212 1\nand can therefore be neglected, and the law of large numbers (LLN) implies that\n#i(\u03d1)T diag[hi(ri)]#i(\u03d1));\n\n|\u0001=0 = Er|\u03b8( 1\n\n\u00012 \u22022L\u03d1(r)\n\n2 ) = o(1)\n\n\u2202\u00012\n\n|\u0001=0 will be independent of the observed spike train and\nconsequently, the quadratic term \u00012 \u2202 2L\u03d1(r)\ntherefore void of information about \u03b8. So the \ufb01rst derivative term is the only part of the likelihood\nthat depends both on the neural activity and \u03d1, and may therefore be considered a suf\ufb01cient statistic\nin this asymptotic regime: all the information about the stimulus is summarized in\n\n\u2202\u00012\n\nn!i\n\n\u0001\n\n\u2202L\u03d1(r)\n\n\u2202\u0001\n\n|\u0001=0 =\n\n1\n\n\u221an!i\n\n#i(\u03d1)T gi(ri).\n\n(3)\n\nWe may further apply the central limit theorem (CLT) to this sum of independent random vectors to\nconclude that this term converges to a Gaussian process indexed by \u03d1 (under mild technical condi-\ntions that we will ignore here, for clarity). Thus this model enjoys the local asymptotic normality\nproperty observed in many parametric statistical models [24]: all of the information in the data can\nbe summarized asymptotically by a suf\ufb01cient statistic with a sampling distribution that turns out to\nbe Gaussian.\nExample: Linearly \ufb01ltered stimuli and state-space models\nIn many cases neurons are modeled in terms of simple recti\ufb01ed linear \ufb01lters responding to the\nstimulus. We can handle this case easily using the language introduced above, if we let Ki denote\nthe matrix implementing the transformation (Ki\u03b8)t = #i,t(\u03b8), the projection of the stimulus onto\nthe i-th neuron\u2019s stimulus \ufb01lter. Then,\n|\u0001=0 = \u03d1T( 1\n\u221an\n\nfi, ri \u2212 f!idt-) := \u03d1T \u2206(r),\n\ni *diag+ f!i\n\nwhere fi stands for the vector version of f [bi(t)]. Thus all the information in the population spike\ntrain can be summarized in the random vector \u2206(r), which is a simple linear functionof the observed\nspike train data. This vector has an asymptotic Gaussian distribution, with mean and covariance\n)- \u2212 f!idt-\n\nEr|\u03b8 (\u2206(r)) =\n\n!i=1\n\n\u2202L\u03d1(r)\n\n+ O(\n\nK T\n\nK T\n\n1\nn\n\n\u2202\u0001\n\n\u0001\n\nn\n\nn\n\nJ := covr|\u03b8 (\u2206(r)) =\n\nn\n\nn\n\n1\n!i=1\n\u221an\n= ( 1\n!i=1\n!i=1\n!i=1\n\n1\nn\n\n1\nn\n\n=\n\nn\n\nn\n\nK T\n\ni\nfi\n\nKi\u03b8\n\u221an\n\ni *diag+ f!i\ni diag& f!2\ni diag+ f!i\ni diag& f!2\n\nfi,*fidt + f!idt\ndt\u2019Ki)\u03b8 + O(\n1\n\u221an\nfi, covr|\u03b8&ri\u2019diag+ f!i\nfi, Ki\ndt\u2019Ki + O(\n\n1\n\u221an\n\ni\nfi\n\n).\n\n)\n\nK T\n\nK T\n\n3\n\nThus, the neural population\u2019s non-linear and temporally dynamic response to the stimulus is as\ninformativein this intermediate regime as a single observation from a standard Gaussian experiment,\n\n\fin which the parameter \u03b8 is \ufb01ltered linearly by J and corrupted by Gaussian noise. All of the \ufb01ltering\nproperties of the population are summarized by the matrix J. (Note that if we consider each Ki as a\nrandom sample from some distribution of \ufb01lters, then J will converge by the law of large numbers\nto a matrix we can compute explicitly.)\nThus in many cases we can perform optimal Bayesian decoding of \u03b8 given the spike trains quite\neasily. For example, if \u03b8 has a zero mean Gaussian prior distribution with covariance C\u03b8, then the\nposterior mean and the maximum-a-posteriori(MAP) estimate is well-known and coincides with the\noptimal linear estimate (OLE):\n\n(4)\nWe may compute the Shannon information I(\u03b8 : r) between r and \u03b8 in a similarly direct fashion.\nWe know that, asymptotically, the suf\ufb01cient statistic \u2206(r) is as informative as the full population\nresponse r\n\n\u02c6\u03b8OLE(r) = E(\u03b8|r) = (J + C\u22121\n\n\u03b8 )\u22121\u2206(r).\n\nIn the case that the prior of \u03b8 is Gaussian, as above, then the information can therefore be computed\nquite explicitly via standard formulas for the linear-Gaussian channel [9]:\n\nI(\u03b8 : r) = I(\u03b8 : \u2206(r)).\n\n1\n2\n\nI(\u03b8 : \u2206(r)) =\n\nlog det(I + JC\u03b8).\n\n(5)\nTo summarize, when the encodings #i,t(\u03b8) are linear in \u03b8, and we are in the intermediate-SNR\nregime, and the parameter \u03b8 has a Gaussian prior distribution, then the optimal Bayesian estimate is\nobtained by applying a linear transformation to the suf\ufb01cient statistic \u2206(r) which itself is linear in\nthe spike train, and the mutual information between the stimulus and full population response has\na particularly simple form. These results help to extend previous theoretical studies [5, 18, 20, 16]\ndemonstrating that in some cases linear decoding can be optimal, and also shed some light on recent\nexperimental studies indicating that optimal linear and nonlinear Bayesian estimators often have\nsimilar performance in practice [13, 12].\nTo work through a concrete example, consider the case that the temporal sequence of parameter\nvalues \u03b8t is generated by an autoregressive process:\n\u03b8t+1 = A\u03b8t + \u03b7t\n\n\u03b7t \u223c N (0, R),\n\nfor a stable dynamics matrix A and positive-semide\ufb01nite covariance matrix R. Further assume that\nthe observation matrices Ki act instantaneously, i.e., Ki is block-diagonal with blocks Ki,t, and\ntherefore the responses are modeled as\n\nri(t) \u223c P oiss[f (bi(t) + \u0001Ki,t\u03b8t)dt].\n\nThus \u03b8 and the responses r together represent a state-space model. This framework has been shown\nto lead to state-of-the-art performance in a wide variety of neural data analysis settings [14]. To\nunderstand optimal inference in this class of models in the intermediate SNR regime, we may fol-\nlow the recipe outlined above: we see that the asymptotic suf\ufb01cient statistic in this model can be\nrepresented as\n\n\u2206t = Jt\u03b8t + \u0001t\n\n\u0001t \u223c N (0, Jt),\n\nwhere the effective \ufb01lter matrix J de\ufb01ned above is block-diagonal (due to the block-diagonal\nstructure of the \ufb01lter matrices Ki), with blocks we have denoted Jt. Thus \u2206t represents obser-\nvations from a linear-Gaussian state-space model, i.e., a Kalman \ufb01lter model [17]. Optimal de-\ncoding of \u03b8 given the observation sequence \u22061:T can therefore be accomplished via the standard\nforward-backward Kalman \ufb01lter-smoother [10]; see Fig. 1 for an illustration. The information rate\nlimT\u2192\u221e I(\u03b80:T : r0:T ) = limT\u2192\u221e I(\u03b80:T : \u2206(r)0:T ) may be computed via similar recursions in\nthe stationary case (i.e., when Jt is constant in time). The result may be expressed most explicitly in\nterms of a matrix which is the solution of a Riccati equation involving the effective Kalman model\nparameters; the details are provided in the appendix.\nNonlinear examples: orientation coding, place \ufb01elds, and small-time expansions\nWhile the linear setting discussed above can handle many examples of interest, it does not seem\ngeneral enough to cover two well-studied decoding problems: inferring the orientation of a visual\n\n4\n\n\fstimulus from a population of cortical neurons [19, 4], or inferring position from a population of\nhippocampal or entorhinal neurons [6]. In the former case, the stimulus is a phase variable, and\ntherefore does not \ufb01t gracefully into the linear setting described above; in the latter case, place\n\ufb01elds and grid \ufb01elds are not well-approximated as linear functions of position. If we apply our\ngeneral theory in these settings, the interpretation of the encoding function #i(\u03b8) does not change\nsigni\ufb01cantly: #i(\u03b8) could represent the tuning curve of neuron i as a function of the orientation of\nthe visual stimulus, or of the animal\u2019s location in space. However, without further assumptions the\nlimiting suf\ufb01cient statistic, which is a weighted sum of these encoding functions #i(\u03b8) (recall eq. 3)\nmay result in an in\ufb01nite-dimensionalGaussian process, which may be computationallyinconvenient.\nTo simplify matters somewhat, we can introduce a mild assumption on the tuning functions #i(\u03b8).\nLet\u2019s assume that these functions may be expressed in some low-dimensional basis: #i(\u03b8) =\nKi\u03a6(\u03b8), for some vectors Ki, and \u03a6(\u03b8) is de\ufb01ned to map \u03b8 into an mT-dimensional space which\nis usually smaller than dim(\u03b8) = dim(\u03b8t)T. This \ufb01nite-basis assumption is very natural: in the\norientation example, tuning curves are periodic in the angle \u03b8t and are therefore typically expressed\nas sums of a few Fourier functions; similarly, two-dimensional \ufb01nite Fourier or Zernike bases are\noften used to represent grid or place \ufb01elds [6]. The key point here is that we may now simply follow\nthe derivation of the last section with \u03a6(\u03b8) in place of \u03b8; we \ufb01nd that the suf\ufb01cient statistic may\nbe represented asymptotically as an mT-dimensional Gaussian vector with mean J and covariance\nJ\u03a6(\u03b8), with J de\ufb01ned as in the preceding section.\nWe should note that this nonlinear case does remain slightly more complicated than the linear case\nin one respect: while the likelihood with respect to \u03a6(\u03b8) reduces to something very simple and\ntractable, the prior (which is typically de\ufb01ned as a function of \u03b8) mightbe some complicated function\nof the remapped variable \u03a6(\u03b8). So in most interesting nonlinear cases we can no longer compute the\noptimal Bayesian decoder or the Shannon information rate analytically. However, our approach does\nlead to a major simpli\ufb01cation in numericalinvestigationsinto theoretical coding issues. For example,\nto examine the coding ef\ufb01ciency of a population of neurons encoding an orientation variable in this\nintermediate SNR regime we do not need to simulate the responses of the entire population (which\nwould involve drawing nT random variables, for some large population size n); instead, we only\nneed to draw a single equivalent mT-dimensional Gaussian vector \u2206(r), and quantify the decoding\nperformance based on the approximate loglikelihood\n\nL\u03d1(r) = L\u03d1(r)|\u0001=0 + \u03a6(\u03d1)T \u2206(r) +\n\n1\n2\n\n\u03a6(\u03d1)T J\u03a6(\u03d1) + O(\n\n1\n\u221an\n\n),\n\nwhich as emphasized above has a simple quadratic form as a function of \u03a6(\u03d1). Since m can typically\nbe chosen to be much smaller than n, this approach can result in signi\ufb01cant computational savings.\nWe now switch gears slightly and examine another related intermediate regime in which nonlinear\nencoding plays a key role: instead of letting the sensitivity \u0001 of each neuron becomesmall (in order to\nkeep the total information in the population \ufb01nite), we could instead keep the sensitivity constant and\nlet the time period over which we are observing the population scale inversely with the population\nsize n. This short-time limit is sensible in some physiological and psychophysical contexts [22] and\nwas examined analytically in [15] to study the impact of inter-neuron dependencies on information\ntransmission. Our methods can also be applied to this short-time limit. We begin by writing the\nloglikelihood of the observed spike count vector r in a single time-bin of length dt:\n\nL\u03d1(r) := log p(r|\u03b8) = !i\n\nri log f [bi + #i(\u03d1)] \u2212 f [bi + #i(\u03d1)] dt.\n\nThe second term does not depend on r; therefore, all information in r about \u03b8 resides in the suf\ufb01cient\nstatistic\n\n\u2206\u03d1(r) :=!i\n\nri log f [bi + #i(\u03d1)] .\n\nSince the i-th neuron \ufb01res with probability f [bi + #i(\u03b8)] dt, the mean of \u2206\u03d1(r) scales with ndt, and\nit is clear that dt = 1/n is a natural scaling of the time bin. With this scaling \u2206\u03d1(r) converges to a\nGaussian stochastic process with mean\n\nEr|\u03b8[\u2206\u03d1(r)] =\n\n1\n\nn!i\n\nf [bi + #i(\u03b8)] log f [bi + #i(\u03d1)]\n\n5\n\n\fand covariance\n\ncovr|\u03b8[\u2206\u03d1(r), \u2206\u03d1! (r)] =\n\n1\n\nn!i\n\nf [bi + #i(\u03b8)].log f [bi + #i(\u03d1)]/.log f [bi + #i(\u03d1!)]/,\n\nwhere we have used the fact that the variance of a Poisson random variable coincides with its mean.\nIn general, this limiting Gaussian processwill be in\ufb01nite-dimensional. However, if we choosethe ex-\nponential nonlinearity (f (.) = exp(.)) and the encoding functions #i(\u03b8) are of the \ufb01nite-dimensional\nform considered above, #i(\u03b8) = K T\ni \u03a6(\u03b8), then the log f [bi + #i(\u03d1)] term in the de\ufb01nition of \u2206\u03d1(r)\nsimpli\ufb01es: in this case, all information about \u03b8 is captured by the suf\ufb01cient statistic\n\n\u2206(r) =!i\n\nriKi.\n\n1\n\nn!i\n\ncovr|\u03b8[\u2206(r)] =\n\ni \u03a6(\u03b8)1 Ki;\n\nIf we again let dt = 1/n, then we \ufb01nd that \u2206(r) converges to a \ufb01nite-dimensional Gaussian random\nvector with mean and covariance\nf0bi + K T\n\ni \u03a6(\u03b8)1 KiK T\nEr|\u03b8[\u2206(r)] =\nagain, if the \ufb01lters Ki are modeled as independent draws from some \ufb01xed distribution, then the\nabove normalized sums converge to their expectations, by the LLN. Thus, as in the intermediate-\nSNR regime, we see that inference can be dramatically simpli\ufb01ed in this short-time setting.\nLikelihood in the intermediate regime: non-Poisson effects\nWe conclude by discussing the generalization to non-Poisson networks with interneuronal depen-\ndencies and nontrivial correlation structure. We generalize the rate equation (1) to\n\nf0bi + K T\n\n1\n\nn!i\n\ni ;\n\n\u03bbi(t) = fi0bi(t) + \u0001#i,t(\u03b8)22Ht1 ,\n\nwhere Ht stands for the spiking activity of all neurons prior to time t: Ht = {ri(t!)}t!\u03c4ref ,\n\nwhere Ij(t) is the synaptic input from the j-th cell (generated by convolving the spike train rj with\nan exponential of time constant 20 ms), wji is the synaptic weight matrix coupling the output of\nneuron j to the input of neuron i, \u03c4i(t) is the time since the last spike; therefore, 1\u03c4i(t)>\u03c4ref enforces\nthe absolute refractory period \u03c4ref, which was set to be 2 ms here. Since the encoding \ufb01lters Ki act\ninstantaneously in this model (Ki can be represented as a delta function, weighted by n\u22121/2), the\nobserved spike trains can be considered observations from a state-space model, as described above.\nThe weights wji were generated randomly from a uniform distribution on the interval \u2212[5/n, 5/n],\nwith self-weights wii = 0, and3j wji = 0 to enforce detailed balance in the network. Note that,\nwhile the interneuronal coupling is weak in this example, the autocorrelation in these spike trains is\nquite strong on short time scales, due to the absolute refractory effect.\nWe compared two estimators of \u03b8: the full (nonlinear) MAP estimate \u02c6\u03b8MAP = arg max\u03b8 p(\u03b8|r),\nwhich we computed using the fast direct optimization methods described in [14], and the limiting\noptimal estimator \u02c6\u03b8\u2206 := (J + C\u22121\n\u03b8 )\u22121\u2206(r). Note that J is diagonal; we computed the expectations\nin the de\ufb01nition of J using the numerical approach described above in this simulation, though in\n\n7\n\n\fstimuli\n\n \n\nspike train(s) with 2ms refractory period, \n\n20ms synaptic time constant and baseline rate 30Hz\n\nsufficient statistics \u0394(r)\n\n5\n\n0\n\n\u22125\n\n \n\n5\n\n0\n\n\u22125\n\n5\n\n0\n\n\u22125\n\n0\n\n1\n \n=\n \nn\n\n\u03b8\n\u03b8MAP\n\u03b8\u0394\n\n5\n \n=\n \nn\n\n \n\n0\n2\n=\nn\n\n \n\n0.05\n\n0.1\n\ntime(sec)\n\n0.15\n\n0.2\n\n0\n\n0.05\n\n0.1\n\ntime(sec)\n\n0.15\n\n0.2\n\n2.5\n2\n1.5\n1\n0.5\n0\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n\n0\n\n0\n\n0.05\n\n0.1\n\ntime(sec)\n\n0.15\n\n0.2\n\nFigure 1: The left panels show the true stimulus (green), MAP estimate (red) and the limiting optimal\nestimator \u02c6\u03b8\u2206 := (J + C\u22121\n\u03b8 )\u22121\u2206(r) (blue) for various population sizes n. The middle panels show\nthe spike trains used to compute these estimates. The right panels show the suf\ufb01cient statistics \u2206(r)\nused to compute \u02c6\u03b8\u2206. Note that the same true stimulus was used in all three simulations. As n\nincreases, the linear decoder converges to the MAP estimate, despite the nonlinear and correlated\nnature of the network model generating the spike trains (see main text for details).\n\n\u03b8\n\nother simulations (with uncoupled renewal-model populations) we checked that the fully-analytical\napproach gave the correct solution. In addition, C\u22121\nis tridiagonal in this state-space setting; thus\nthe linear matrix equation in eq. (4) can be solved ef\ufb01ciently in O(T ) time using standard tridiagonal\nmatrix solvers. We \ufb01nd that, as predicted, the full nonlinear Bayesian estimator \u02c6\u03b8MAP approaches\nthe limiting optimal estimator \u02c6\u03b8\u2206 as n becomes large; n = 20 is basically suf\ufb01cient in this case,\nalthough of course the convergence will be slower for larger values of the gain factor \u0001 (or, equiva-\nlently, larger \ufb01lters Ki or larger values of the variance of \u03b8t).\nWe conclude with a few comments about these results. First, note that the covariance matrix J\nwe have computed here coincides almost exactly with what we computed previously in the Poisson\ncase. Indeed, we can make this connection much more precise: we can always choose an equivalent\nPoisson network with rates de\ufb01ned so that the Er|\u03b8=0[(f!i)2/fi] term in the non-Poisson network\nmatches the (f!i)2/fi term in the Poisson network. Since J determines the information rate com-\npletely, we conclude that for any weakly-coupled network there is an equivalent Poisson network\nwhich conveys exactly the same information in the intermediate regime. However, note that the the\nsuf\ufb01cient statistic \u2206(r) is different in the Poisson and non-Poisson settings, since the f!/f term\nlinearly reweights the observed spikes, depending on how likely they were given the history; thus\nthe optimal Bayesian decoder incorporates non-Poisson effects explicitly.\nA number of interesting questions remain open. For example, while we expect a LLN and CLT to\ncontinue to hold in many cases of strong, structured interneuronal coupling, computing the asymp-\ntotic mean and covariance of the suf\ufb01cient statistic \u2206(r) may be more challenging in such cases,\nand new phenomena may arise.\n\n8\n\n\fwork: Computation in Neural Systems, pages 213\u2013251, May 1992.\n\nReferences\n[1] J. Atick. Could information theory provide an ecological theory of sensory processing? Net-\n[2] F. Attneave. Some informational aspects of visual perception. Psychological Review, 1954.\n[3] H. B. Barlow. Possible principles underlying the transformation of sensory messages. Sensory\nCommunication, pages 217\u2013234, 1961.\n[4] P. Berens, A. S. Ecker, S. Gerwinn, A. S. Tolias, and M. Bethge. Reassessing optimal neu-\nral population codes with neurometric functions. Proceedings of the National Academy of\nSciences, 108:4423\u20134428, 2011.\n[5] W. Bialek and A. Zee. Coding and computation with neural spike trains. Journal of Statistical\nPhysics, 59:103\u2013115, 1990.\n[6] E. Brown, L. Frank, D. Tang, M. Quirk, and M. Wilson. A statistical paradigm for neural spike\ntrain decoding applied to position prediction from ensemble \ufb01ring patterns of rat hippocampal\nplace cells. Journal of Neuroscience, 18:7411\u20137425, 1998.\n[7] N. Brunel and J.-P. Nadal. Mutual information, \ufb01sher information, and population coding.\nNeural Comput., 10(7):1731\u20131757, 1998.\n[8] B. Clarke and A. Barron. Information-theoretic asymptotics of Bayes methods. IEEE Trans-\nactions on Information Theory, 36:453 \u2013 471, 1990.\n[9] T. Cover and J. Thomas. Elements of information theory. Wiley, New York, 1991.\n[10] J. Durbin and S. Koopman. Time Series Analysis by State Space Methods. Oxford University\nPress, 2001.\n[11] I. Ginzburg and H. Sompolinsky. Theory of correlations in stochastic neural networks. Phys\nRev E, 50(4):3171\u20133191, 1994.\n[12] V. Lawhern, W. Wu, N. Hastopoulos, and L. Paninski. Population decoding of motor cortical\nactivity using a generalized linear model with hidden states. Journal of Neuroscience Methods,\n2011.\n[13] J. Macke, L. Sing, B. Cunningham, J.P. snd Yu, K. Shenoy, and M. Sahani. Modelling low-\ndimensional dynamics in recorded spiking populations. COSYNE, 2011.\n[14] L. Paninski, Y. Ahmadian, D. Ferreira, S. Koyama, K. Rahnama Rad, M. Vidne, J. Vogelstein,\nand W. Wu. A new look at state-space models for neural data. Journal of Computational\nNeuroscience, 29(1):107\u2013126, 2010.\n[15] S. Panzeri, S. Schultz, A. Treves, and E. Rolls. Correlations and the encoding of information in\nthe nervous system. Proceedings of the Royal Society London B, 266(1423):1001\u20131012,1999.\n[16] J. Pillow, Y. Ahmadian, and L. Paninski. Model-based decoding, information estimation, and\nchange-pointdetection in multi-neuron spike trains. Neural Computation, 23(1):1\u201345, January\n2011.\n[17] S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Compu-\ntation, 11:305\u2013345, 1999.\n[18] E. Salinas and L. Abbott. Vector reconstruction from \ufb01ring rates. Journal of Computational\nNeuroscience, 1:89\u2013107, 1994.\n[19] H. S. Seung and H. Sompolinsky. Simple models for reading neuronal population codes.\nProceedings of the National Academy of Sciences, 90:10749\u201310753, 1993.\n[20] H. Snippe. Parameter extraction from population codes: A critical assesment. Neural Compu-\ntation, 8:511\u2013529, 1996.\n[21] D. Snyder and M. Miller. Random Point Processes in Time and Space. Springer-Verlag, 1991.\n[22] S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature,\n381:520\u2013522, 1996.\n[23] T. Toyoizumi, K. Rahnama Rad, and L. Paninski. Mean-\ufb01eld approximations for coupled\npopulations of generalized linear model spiking neurons with Markov refractoriness. Neural\nComputation, 21:1203\u20131243, 2009.\n\n[24] A. van der Vaart. Asymptotic statistics. Cambridge University Press, Cambridge, 1998.\n\n9\n\n\f", "award": [], "sourceid": 563, "authors": [{"given_name": "Kamiar", "family_name": "Rad", "institution": null}, {"given_name": "Liam", "family_name": "Paninski", "institution": null}]}