{"title": "Optimal prior-dependent neural population codes under shared input noise", "book": "Advances in Neural Information Processing Systems", "page_first": 1880, "page_last": 1888, "abstract": "The brain uses population codes to form distributed, noise-tolerant representations of sensory and motor variables. Recent work has examined the theoretical optimality of such codes in order to gain insight into the principles governing population codes found in the brain. However, the majority of the population coding literature considers either conditionally independent neurons or neurons with noise governed by a stimulus-independent covariance matrix. Here we analyze population coding under a simple alternative model in which latent input noise\" corrupts the stimulus before it is encoded by the population. This provides a convenient and tractable description for irreducible uncertainty that cannot be overcome by adding neurons, and induces stimulus-dependent correlations that mimic certain aspects of the correlations observed in real populations. We examine prior-dependent, Bayesian optimal coding in such populations using exact analyses of cases in which the posterior is approximately Gaussian. These analyses extend previous results on independent Poisson population codes and yield an analytic expression for squared loss and a tight upper bound for mutual information. We show that, for homogeneous populations that tile the input domain, optimal tuning curve width depends on the prior, the loss function, the resource constraint, and the amount of input noise. This framework provides a practical testbed for examining issues of optimality, noise, correlation, and coding fidelity in realistic neural populations.\"", "full_text": "Optimal prior-dependent neural population codes\n\nunder shared input noise\n\nAgnieszka Grabska-Barwi\u00b4nska\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\n\nagnieszka@gatsby.ucl.ac.uk\n\nJonathan W. Pillow\n\nPrinceton Neuroscience Institute\n\nDepartment of Psychology\n\nPrinceton University\n\npillow@princeton.edu\n\nAbstract\n\nThe brain uses population codes to form distributed, noise-tolerant representa-\ntions of sensory and motor variables. Recent work has examined the theoretical\noptimality of such codes in order to gain insight into the principles governing\npopulation codes found in the brain. However, the majority of the population\ncoding literature considers either conditionally independent neurons or neurons\nwith noise governed by a stimulus-independent covariance matrix. Here we an-\nalyze population coding under a simple alternative model in which latent \u201cinput\nnoise\u201d corrupts the stimulus before it is encoded by the population. This provides\na convenient and tractable description for irreducible uncertainty that cannot be\novercome by adding neurons, and induces stimulus-dependent correlations that\nmimic certain aspects of the correlations observed in real populations. We ex-\namine prior-dependent, Bayesian optimal coding in such populations using exact\nanalyses of cases in which the posterior is approximately Gaussian. These anal-\nyses extend previous results on independent Poisson population codes and yield\nan analytic expression for squared loss and a tight upper bound for mutual infor-\nmation. We show that, for homogeneous populations that tile the input domain,\noptimal tuning curve width depends on the prior, the loss function, the resource\nconstraint, and the amount of input noise. This framework provides a practical\ntestbed for examining issues of optimality, tuning width, noise, and correlations\nin realistic neural populations.\n\n1\n\nIntroduction\n\nA substantial body of work has examined the optimality of neural population codes [1\u201318]. How-\never, the classical literature has focused predominantly on codes with independent Poisson neurons\nand on analyses of unbiased decoders using Fisher information. Real neurons, by contrast, ex-\nhibit noise correlations (dependencies not introduced by the stimulus), and Fisher information does\nnot accurately quantify information when performance is biased or close to threshold [6, 14, 17].\nMoreover, the classical population codes with independent Poisson noise predict unreasonably good\nperformance with even a small number of neurons. A variety of studies have shown cases where\na small number of independent neurons can outperform an entire animal [19, 20]. For example,\na population of only 220 Poisson neurons with tuning width of 60 deg (full width at half height)\nand tuning amplitude of 10 spikes can match the human orientation discrimination threshold of \u21e1 1\ndeg. (See Supplement S1 for derivation.) Even fewer neurons would be required if tuning curve\namplitude were higher.\nThe mismatch between this predicted ef\ufb01ciency and animals\u2019 actual behaviour has been attributed\nto the presence of information-limiting correlations between neurons [21, 22]. However, deviation\n\n1\n\n\finput noise\n\ntuning curves\n\nPoisson\nnoise\n\nt\n\nn\nu\no\nc\n \n\ne\nk\np\ns\n\ni\n\npopulation response\n\nposterior\n\nlikelihood\n\n)\ns\nu\nu\nm\n\nl\n\ni\nt\ns\n(\np\n\nstimulus\n\npreferred stimulus\n\nstimulus\n\nstimulus prior\n\n)\ns\nu\nu\nm\n\nl\n\ni\nt\ns\n(\np\n\nstimulus\n\n+\n\nt\n\nn\nu\no\nc\n \n\ne\nk\np\ns\n\ni\n\n0\n\nFigure 1: Bayesian formulation of neural population coding with input noise.\n\nfrom independence renders most analytical treatments infeasible, necessitating the use of numerical\nmethods (Monte Carlo simulations) for quantifying the performance of such codes [6, 14].\nHere we examine a family of population codes for which the posterior is approximately Gaussian,\nwhich makes it feasible to perform a variety of analytical treatments. In particular, we consider a\npopulation with Gaussian tuning curves that \u201ctile\u201d the input domain, and Gaussian stimulus priors.\nThis yields a Gaussian-shaped likelihood and a Gaussian posterior with variance that depends only\non the total spike count [2, 15]. We use this formulation to derive tractable expressions for neuro-\nmetric functions such as mean squared error (MSE) and mutual information (MI), and to analyze\noptimality without resorting to Fisher information, which can be inaccurate for short time windows\nor small spike counts [6, 14, 17]. Secondly, we extend this framework to incorporate shared \u201cinput\nnoise\u201d in the stimulus variable of interest (See Fig. 1). This form of noise differs from many existing\nmodels, which assume noise to arise from shared connectivity, but with no direct relationship to the\nstimulus coding [5, 14, 17, 23]. (See [15, 24] for related approaches).\nThis paper is organised as follows. In Sec. 2, we describe an idealized Poisson population code with\ntractable posteriors, and review classical results based on Fisher Information. In Sec. 3, we provide\na Bayesian treatment of these codes, deriving expressions for mean squared error (MSE) and mutual\ninformation (MI). In Sec. 4, we extend these analyses to a population with input noise. Finally, in\nSec. 5, we examine the patterns of pairwise dependencies introduced by input noise.\n\n2\n\nIndependent Poisson population codes\n\nConsider an idealized population of Poisson neurons that encode a scalar stimulus s with Gaussian-\nshaped tuning curves. Under this (classical) model, the response vector r = (r1, . . . rN )> is condi-\ntionally Poisson distributed:\n\nri|s \u21e0 Poiss(fi(s)),\n\np(r|s) =\n\nri! fi(s)riefi(s),\n\nwith equally-spaced centers or \u201cpreferred stimuli\u201d ?s = ( ?s 1, . . . ?s N ), tuning width t, amplitude\nA, and time window for counting spikes \u2327. We assume that the tuning curves \u201ctile\u201d, i.e., sum to a\nconstant over the relevant stimulus range:\n\nNXi=1\n\nfi(s) \u21e1 .\n\n(tiling property) (3)\n\nWe can determine by integrating the summed tuning curves (eq. 3) over the stimulus space, giving\n\ni=1 fi(s) = N Ap2\u21e1t = S, which gives:\n\nR dsPN\n(expected total spike count) (4)\nwhere = S/N is the spacing between tuning curve centers, and a = p2\u21e1A\u2327 is an \u201camplitude\nconstant\u201d that depends on true tuning curve amplitude and the time window for integrating spikes.\n\n = at/,\n\n2\n\nwhere tuning curves fi(s) take the form\n\nfi(s) = \u2327 A exp\u21e3 1\n\n22\nt\n\n1\n\nNYi=1\n(s ?s i)2\u2318 ,\n\n(Poisson encoding) (1)\n\n(tuning curves) (2)\n\n\fNote, that tiling holds almost perfectly if tuning curves are broad compared to their spacing (e.g.\nt > ). However, our results hold for a much broader range of t (see Supplementary Figs S3 and\nS4 for a numerical analysis.)\n\nLet R =P ri denote the total spike count from the entire population. Due to tiling, R is a Poisson\nrandom variable with rate , regardless of the stimulus: p(R|s) = 1\nFor simplicity, we will consider stimuli drawn from a zero-mean Gaussian prior with variance 2\ns:\n(stimulus prior) (5)\n\nR! Re.\n\ne s2\n22\ns .\n\nSinceQi efi(s) = e due to the tiling assumption, the likelihood (eq. 1 as a function of s) and\n\nposterior both take Gaussian forms:\n\np(s) = 1p2\u21e1s\n\ns \u21e0 N (0, 2\ns ),\np(r|s) /Yi\n\n(likelihood) (6)\n\n(posterior) (7)\n\nR r> ?s, 1\n\nfi(s)ri / N 1\np(s|r) = N\u21e3 r> ?s\n\nR + \u21e2\n\nR 2\n\nt\nR + \u21e2\u2318,\n\n2\nt\n\n,\n\nt /2\n\ns denotes the ratio of the tuning curve variance to prior variance. The maximum of\nwhere \u21e2 = 2\nthe likelihood (eq. 6) is the so-called center-of-mass estimator estimator, 1\nR r> ?s, while the mean of\nthe posteror (eq. 7) is biased toward zero by an amount that depends on \u21e2. Note that the posterior\nvariance does not depend on which neurons emitted spikes, only the total spike count R, a fact that\nwill be important for our analyses below.\n\n2.1 Capacity constraints for de\ufb01ning optimality\n\nDe\ufb01ning optimality for a population code requires some form of constraint on the capacity of the\nneural population, since clearly we can achieve arbitrarily narrow posteriors if we allow arbitrarily\nlarge total spike count R. In the following, we will consider two different biologically plausible\nconstraints:\n\n\u2022 An amplitude constraint, in which we constrain the tuning curve amplitude. Under this\nconstraint, expected spike count will grow as the tuning width t increases (see eq. 4),\nsince more neurons will respond to any stimulus when tuning is broader.\n\n\u2022 An energy constraint, in which we \ufb01x while allowing t and amplitude A to vary. Here,\nwe can make tuning curves wider so that more neurons respond, but must reduce the am-\nplitude so that total expected spike count remains \ufb01xed.\n\nWe will show that the optimal tuning depends strongly on which kind of constraint we apply.\n\n2.2 Analyses based on Fisher Information\n\nNXi=1\n\nNXi=1\n\n\u2318 =\n\nexp\u21e3\n\nThe Fisher information provides a popular, tractable metric for quantifying the ef\ufb01ciency of a neural\ncode, given by E[ @2\n@s2 log p(r|s)], where expectation is taken with respect to encoding distribution\np(r|s). For our idealized Poisson population, the total Fisher information is:\nf0i(s)2\n\n=\n2\nfi(s)\nt\n\n(Fisher info) (8)\n\n(s ?s i)2\n\n(s ?s i)2\n\nIF (s) =\n\n22\nt\n\nt\n\n4\nt\n\n=\n\nA\n\na\n\n,\n\nwhich we can derive, as before, using the tiling property (eq. 3). (See Supplemental Sec. S2 for\nderivation.) The \ufb01rst of the two expressions at right re\ufb02ects IF for the amplitude constraint, where\n varies implicitly as we vary t. The second expresses IF under the energy constraint, where is\nconstant so that the amplitude constant a varies implicitly with t. For both constraints, IF increases\nwith decreasing t [5].\nFisher information provides a well-known bound on the variance of an unbiased estimator \u02c6s(r)\nknown as the Cram\u00b4er-Rao (CR) bound, namely var(\u02c6s|s) 1/IF (s). Since FI is constant over s in\nour idealized setting, this leads to a bound on the mean squared error ([6, 11]):\n2\nt\n\n\nMSE , E\u21e5(\u02c6s(r) s)2\u21e4p(r,s) E\uf8ff 1\n\nIF (s)p(s)\n\nt\n\n(9)\n\n=\n\n=\n\na\n\n,\n\n3\n\n\feffects of prior stdev\n\neffects of time window (ms)\n\n \n\n=\n\n 3\n2\n\namplitude\nconstraint: M\n\nE\nS\n\nenergy\nconstraint: M\n\nE\nS\n\n103\n\n101\n10(cid:239)(cid:20)\n\n101\n10(cid:239)(cid:20)\n10(cid:239)(cid:22)\n\n16\n8\n4\n2\n1\n\n \n\n0\n\n102\n100\n10(cid:239)(cid:21)\n\n100\n10(cid:239)(cid:21)\n10(cid:239)(cid:23)\n\nCR bound\n\n2\n\n4\n\ntuning width \n\n6\n\n8\n\n= 25\n501\n\n0\n\n02\n\n4\n\n0\n\n0\n\n0\n\n0\n\nCR bound\n\n \n\n0\n\n2\n\n4\n\ntuning width \n\n6\n\n \n\n8\n\nFigure 2: Mean squared error as a function of the tuning width t, under amplitude constraint (top\nrow) and energy constraint (bottom row), for spacing = 1 and amplitude A = 20 sp/s. and\nTop left: MSE for different prior widths s (with \u2327 = 100ms), showing that optimal t increases\nwith larger prior variance. Cram\u00b4er-Rao bound (gray solid) is minimized at t = 0, whereas bound\n(eq. 12, gray dashed) accurately captures shape and location of the minimum. Top right: Similar\ncurves for different time windows \u2327 for counting spikes (with s=32), showing that optimal t in-\ncreases for lower spike counts. Bottom row: Similar traces under energy constraint (where A scales\ninversely with t so that = p2\u21e1\u2327 At is constant). Although the CR bound grossly understates\nthe true MSE for small counting windows (right), the optimal tuning is maximally narrow in this\ncon\ufb01guration, consistent with the CR curve.\n\nwhich is simply the inverse of Fisher Information (eq. 8).\nFisher information also provides a (quasi) lower bound on the mutual information, since an ef\ufb01cient\nestimator (i.e., one that achieves the CR bound) has entropy upper-bounded by that of a Gaussian\nwith variance 1/IF (see [3]). In our setting this leads to the lower bound:\n\nMI(s, r) , H(s) H(s|r) 1\n\n2 log\u21e32\n\ns\n\na\n\nt\u2318 = 1\n\n2 log\u21e32\n\ns\n\n\n2\n\nt\u2318.\n\n(10)\n\nNote that neither of these FI-based bounds apply exactly to the Bayesian setting we consider here,\nsince Bayesian estimators are generally biased, and are inef\ufb01cient in the regime of low spike counts\n[6]. We examine them here nonetheless (gray traces in Figs. 2 and 3) due to their prominence in the\nprior literature ([5, 11, 13]), and to emphasize their limitations for characterizing optimal codes.\n\n2.3 Exact Bayesian analyses\n\nIn our idealized population, the total spike count R is a Poisson random variable with mean , which\nallows us to compute the MSE and MI by taking expectations w.r.t. this distribution.\n\nMean Squared Error (MSE)\nThe mean squared error, which equals the average posterior variance (eq. 7), can be computed\nanalytically for this model:\n\nMSE = E\uf8ff 2\n\nR + \u21e2p(R)\n\nt\n\n=\n\ne = 2\n\nt e (\u21e2) \u21e4 (\u21e2,) ,\n\n(11)\n\nt /2\n\ns and \u21e4(a, z) = za 1\n\n0 ta1etdt is the holomorphic extension of the lower\nwhere \u21e2 = 2\nincomplete gamma function [25] (see SI for derivation). When the tuning curve is narrower than the\nprior (i.e., 2\n\ns ), we can obtain a relatively tight lower bound:\n\nt \uf8ff 2\n\nMSE \n\ns 2\n\nt )e.\n\n(12)\n\nt\n\nR!\n\nR + \u21e2\u25c6 R\n\n1XR=0\u2713 2\n(a)R z\n 1 e + (2\n\n2\nt\n\n4\n\n\feffects of prior stdev\n\neffects of time window (ms)\n\namplitude\nconstraint:\n\n)\ns\nt\ni\n\nb\n(\n \nI\n\nM\n\nenergy\nconstraint:\n\n)\ns\nt\ni\n\nb\n(\n \nI\n\nM\n\n \n\n6\n4\n2\n0\n\n6\n4\n2\n0\n\nFI-based bound\n\n \n\n = 32\n16\n8\n4\n2\n1\n\n2\n\n4\n\ntuning width \n\n6\n\n8\n\n \n\n6\n4\n2\n0\n\n6\n4\n2\n0\n\n \n\n= 400\n\n= 25\n\n2\n\n4\n\ntuning width \n\n6\n\n8\n\nFigure 3: Mutual information as a function of tuning width t, directly analogous to plots in Fig. 2.\nNote the problems with the lower bound on MI derived from Fisher information (top, gray traces)\nand the close match of the derived bound (eq. 14, dashed gray traces). The effects are similar\nto Fig. 2, except that MI-optimal tuning widths are slightly smaller (upper left and right) than for\nMSE-optimal codes. For both loss functions, optimal width is minimal under an energy constraint.\n\nFigure 2 shows the MSE (and derived bound) as a function of the tuning width t over the range\nwhere tiling approximately holds. Note the high accuracy of the approximate formula (12, dashed\ngray traces) and that the FI-based bound does not actually lower-bound the MSE in the case of\nnarrow priors (darker traces).\nFor the amplitude-constrained setting (top row, obtained by substituting = at/ in eqs. 11 and\n12), we observe substantial discrepancies between the true MSE and FI-based analysis. While FI\nsuggests that optimal tuning width is near zero (down to the limits of tiling), analyses reveal that the\noptimal t grows with prior variance (left) and decreasing time window (right). These observations\nagree well with the existing literature (e.g. [14, 15]). However, if we restrict the average population\n\ufb01ring rate (energy constraint, bottom plots), the optimal tuning curves once again approach zero. In\nthis case, FI provides correct intuitions and better approximation of the true MSE.\n\nMutual Information (MI)\n\nFor a tiling population and Gaussian prior, mutual information between stimulus and response is:\n\nMI(s, r) = 1\n\n2Ehlog\u21e31 + R 2\n\nt\u2318ip(R)\n\ns\n2\n\n,\n\n(13)\n\nwhich has no closed-form solution, but can be calculated ef\ufb01ciently with a discrete sum over R from\n0 to some large integer (e.g., R = + np to capture n standard deviations above the mean). We\ncan derive an upper bound using the Taylor expansion to log while preserving the exact zeroth order\nterm:\n\nMI(s, r) \uf8ff 1e\n\n2\n\nlog\u21e31 + (\n\n\n\n1e ) 2\n\ns\n2\n\nt\u2318 = 1eat/\n\n2\n\nlog\u21e31 +\n\na\n\n1eat/\n\n2\ns\n\nt\u2318\n\n(14)\n\nOnce again, we investigate the ef\ufb01ciency of population coding, but in terms of the maximal MI.\nFigure 3 shows MI as a function of the neural tuning width t. We observe a similar effect as for the\nMSE: the optimal tuning widths are now different from zero, but only for the amplitude constraint.\nUnder the energy constraint (as with FI) the optimum is maximally narrow tuning.\n\n5\n\n\f3 Poisson population coding with input noise\n\nWe can obtain a more general family of correlated population codes by considering \u201cinput noise\u201d,\nwhere the stimulus s is corrupted by an additive noise n (see Fig. 1):\n\n(prior) (15)\n(input noise) (16)\n(population response) (17)\nThe use of Gaussians allows us to marginalise over n analytically, resulting in a Gaussian form for\nthe likelihood and Gaussian posterior:\n\ns \u21e0 N (0, 2\ns )\nn \u21e0 N (0, 2\nn)\n\nri|s, n \u21e0 Poiss(fi(s + n))\n\np(r|s) / N 1\n\nr> ?s\ns + R(2\n\n2\nt /2\n\np(s|r) = N\u2713\n\nR r> ?s, 1\n\nR 2\n\nn/2\n\ns + 1)\n\nn\n\nt + 2\n(2\nt + R2\n2\nt + R(2\n\n,\n\nn)2\ns\nn + 2\n\ns )\u25c6\n\n(likelihood) (18)\n\n(posterior) (19)\n\nNote that even in the limit of many neurons and large spike counts, the posterior variance is non-zero,\nconverging to 2\n\ns ), a limit de\ufb01ned by the prior and input noise variance. [22].\n\nn + 2\n\ns /(2\n\nn2\n\n3.1 Population coding characteristics: FI, MSE, & MI\n\nFisher information for a population with input noise can be computed using the fact that the likeli-\nhood (eq. 18) is Gaussian:\n\nIF (s) = E\uf8ff\nwhere \u21e2 = 2\nn and \u21e4(\u00b7,\u00b7) once again denotes holomorphic extension of lower incomplete\ngamma function. Note that for n = 0, this reduces to (eq. 8).\nIt is straightforward to employ the results from Sec. 2.3 for the exact Bayes analyses of a Gaussian\nposterior (19):\n\n(1 + \u21e2)\u21e4(1 + \u21e2,)\n\nnp(R)\n\ne\n2\nn\n\n2\nt + R2\n\nt /2\n\n(20)\n\n=\n\nR\n\nMSE = 2\n\nsE\uf8ff\n\n2\nt + R2\nn\n\n2\nt + R(2\n\nn + 2\n\n= \u21e5()\u21e4(,) + 2\n2E\uf8fflog\u27131 +\n\nR2\ns\n2\nt + R2\n\nMI = 1\n\nn\n\n1\n\n= 2\n\ns )p(R)\n\ns E\uf8ff\n\n + Rp(R)\n(1 + )\u21e4(1 + ,)\u21e42\nn\u25c6p(R)\n\n,\n\n2\ns +2\nn\n\ns 2\n+ 2\nn\ns +2\n2\n\nn E\uf8ff R\n\n + Rp(R)\n\ns e, and\n\n(21)\n\n(22)\n\nt /(2\n\ns + 2\n\nwhere = 2\nn). Although we could not determine closed-form analytical expressions\nfor MI, it can be computed ef\ufb01ciently by summing over a range of integers [0, . . . Rmax] for which\nP (R) has suf\ufb01cient support. This is still a much faster procedure than estimating these values from\nMonte Carlo simulations.\n\n3.2 Optimal tuning width under input noise\n\nFig. 4 shows the optimal tuning width under the amplitude constraint, that is, the value of t that\nachieves minimal MSE (left) or maximal MI (right) as a function of the prior width s, for several\ndifferent time windows \u2327. Blue traces show results for a Poisson population, while green traces\ncorrespond to a population with input noise (n = 1).\nFor both MSE and MI loss functions, optimal tuning width decreases for narrower priors. However,\nunder input noise (green traces), the optimal tuning width saturates at the value that depends on\nthe available number of spikes. As the prior grows wider, the growth of the optimal tuning width\ndepends strongly on the choice of loss function: optimal t grows approximately logarithmically\nwith s for minimizing MSE (left), but it grows much slower for maximizing MI (right). Note that\nfor realistic prior widths (i.e. for s > n), the effects of input noise on optimal tuning width are far\nmore substantial under MI than under MSE.\n\n6\n\n\f8\n\n6\n\n4\n\n2\n\n0\n\nMSE\n\nPoisson \nnoise only\n\nw/ input noise\n\n0.1\nprior stdev \n\n1\n\n= 25\n\n= 5 0\n0\n= 1\n\n0\n\n= 200\n10\n\n8\n\n6\n\n4\n\n2\n\n0\n\n \n\nmutual information\n\n= 25\n\n= 200\n\n0.1\nprior stdev \n\n1\n\n10\n\n \n\nh\n\nt\n\ni\n\nd\nw\nC\nT\n\n \n\n \nl\n\na\nm\n\ni\nt\n\np\no\n\nFigure 4: Optimal tuning width t (under amplitude constraint only) as a function of prior width s,\nfor classic Poisson populations (blue) and populations with input-noise (green, 2\nn = 1). Different\ntraces correspond to different time windows of integration, for = 1 and A = 20 sp/s. As n\nincreases, the optimal tuning width increases under MI, and under MSE when s < n (traces not\nshown). For MSE, predictions of the Poisson and input-noise model converge for priors s > n.\n\nWe have not shown plots for energy-constrained population codes because the optimal tuning width\nsits at the minimum of the range over which tiling can be said to hold, regardless of prior width,\ninput noise level, time window, or choice of loss function. This can be seen easily in the expressions\nfor MI (eqs. 13 and 22), in which each term in the expectation is a decreasing function of t for\nall R > 0. This suggests that, contrary to some recent arguments (e.g., [14, 15]), narrow tuning (at\nleast down to the limit of tiling) really is best if the brain has a \ufb01xed energetic budget for spiking, as\nopposed to a mere constraint on the number of neurons.\n\n4 Correlations induced by input noise\n\nInput noise alters the mean, variance, and pairwise correlations of population responses in a system-\natic manner that we can compute directly (see Supplement for derivations). In Fig. 5 we show the\neffects of input noise with standard deviation n = 0.5, for neurons with the tuning amplitude of\nA = 10. The tuning curve (mean response) becomes slightly \ufb02atter (Fig. 5A), while the variance in-\ncreases, especially at the \ufb02anks (Fig. 5B). Fig. 5C shows correlations between the two neurons with\ntuning curves and variance are shown in panels A-B: one pair with the same preferred orientation at\nzero (red) and a second with a 4 degree difference in preferred orientation (blue). From these plots,\nit is clear that the correlation structure depends on both the tuning as well as the stimulus. Thus, in\norder to describe such correlations one needs to consider the entire stimulus range, not simply the\naverage correlation marginalized over stimuli.\nFigure 5D shows the pairwise correlations across an entire population of 21 neurons given a stim-\nulus at s = 0. Although we assumed Gaussian tuning curves here, one can obtain similar plots\nfor arbitrary unimodal tuning curves (see Supplement), which should make it feasible to test our\npredictions in real data. However, the time scale of the input noise and basic neural computations\nis about 10 ms. At such short spike count windows, available number of spikes is low, and so are\ncorrelations induced by input noise. With other sources of second order statistics, such as common\ninput gains (e.g. by contrast or adaptation), these correlations might be too subtle to recover [22].\n\n5 Discussion\n\nWe derived exact expressions for mean squared error and mutual information in a Bayesian analysis\nof: (1) an idealized Poisson population coding model; and (2) a correlated, conditionally Poisson\npopulation coding model with shared input noise. These expressions allowed us to examine the\noptimal tuning curve width under both loss functions, under two kinds of resource constraints. We\nhave con\ufb01rmed that optimal t diverges from predictions based on Fisher information, if the overall\nspike count is allowed to grow with tuning width (i.e., because more neurons respond to the stim-\nulus when tuning curves become broader). We refer to this as an \u201camplitude constraint\u201d, because\nthe amplitude is \ufb01xed independently of tuning width. This differs from an \u201cenergy constraint\u201d, in\n\n7\n\n\fA\n\n(cid:20)(cid:19)\n\n5\n\n(cid:19)\n\ns\n \n/\n \n\np\ns\n\nC\n\nr\n\n(cid:19)(cid:17)(cid:21)\n(cid:19)\n(cid:239)(cid:19)(cid:17)(cid:21)\n(cid:239)(cid:19)(cid:17)(cid:23)\n\nmean\n\n(cid:239)(cid:20)(cid:19)\n\n(cid:19)\n\nstimulus s\n\n(cid:20)(cid:19)\n\ncorrelation \n\n(cid:239)(cid:20)(cid:19)\n\n(cid:19)\n\nstimulus s\n\n(cid:20)(cid:19)\n\nvariance\n\n(cid:239)(cid:20)(cid:19)\n\n(cid:19)\n\nstimulus s\n\n(cid:20)(cid:19)\n\n \n \n\nB\n\n(cid:20)(cid:19)\n\n2\n\n)\ns\n \n/\n \n\np\ns\n(\n\n5\n\n(cid:19)\n\n(cid:20)(cid:19)\n\n(cid:19)\n\nD\n\nm\n\ni\nt\ns\n \n\nd\ne\nr\nr\ne\ne\nr\np\n\nf\n\n(cid:239)(cid:20)(cid:19)\n\n \n(cid:239)(cid:20)(cid:19)\n\n(cid:19)\n\npreferred stim\n\n(cid:20)(cid:19)\n\nr\n(cid:19)(cid:17)(cid:21)\n(cid:19)(cid:17)(cid:20)\n(cid:19)\n(cid:239)(cid:19)(cid:17)(cid:20)\n(cid:239)(cid:19)(cid:17)(cid:21)\n\nFigure 5: Response statistics of neural population with input noise, for standard deviation n = 0.5.\n(A) Expected spike responses of two neurons: ?s 1 = 0 (red) and ?s 2 = 4 (blue). The common\nnoise effectively smooths blurs the tuning curves with a Gaussian kernel of width n. (B) Variance\nof neuron 1, its tuning curve replotted in black for reference. Input noise has largest in\ufb02uence on\nvariance at the steepest parts of the tuning curve. (C) Cross-correlation of the neuron 1 with two\nothers: one sharing the same preference (red), and one with ?s = 4 (blue). Note that correlation\nof two identically tuned neurons is largest at the steepest part of the tuning curve. (D) Spike count\ncorrelations for entire population of 21 neurons given a \ufb01xed stimulus s = 0, illustrating that the\npattern of correlations is signal dependent.\n\nwhich tuning curve amplitude scales with tuning width so that average total spike count is constant.\nUnder an energy constraint, predictions from Fisher information match those of an exact Bayesian\nanalysis, and we \ufb01nd that optimal tuning width should be narrow (down to the limit at which the\ntiling assumption applies), regardless of the duration, prior width, or input noise level.\nWe also derived explicit expressions for the response correlations induced by the input noise. These\ncorrelations depend on the shape and amplitude of tuning curves, and on the amount of input noise\n(n). However, for a reasonable assumption that noise distribution is much narrower than the width\nof the prior (and tuning curves), under which the mean \ufb01ring rate changes little, we can derive pre-\ndictions for the covariances directly from the measured tuning curves. An important direction for\nfuture work will be to examine the detailed structure of correlations measured in large populations.\nWe feel that the input noise model \u2014 which describes exactly those correlations that are most harm-\nful for decoding \u2014 has the potential to shed light on the factors affecting the coding capacity in\noptimal neural populations [22].\nFinally, we can return to the introductory example involving orientation discrimination, to ask\nhow the number of neurons necessary to reach the human discrimination threshold of s=1 de-\ngree changes in the presence of input noise. As n approaches s, the number of neurons required\ngoes rapidly to in\ufb01nity (See Supp. Fig. S1).\n\nAcknowledgments\nThis work was supported by the McKnight Foundation (JP), NSF CAREER Award IIS-1150186\n(JP), NIMH grant MH099611 (JP) and the Gatsby Charitable Foundation (AGB).\n\nReferences\n[1] HS Seung and H. Sompolinsky. Simple models for reading neuronal population codes. Proceedings of\n\nthe National Academy of Sciences, 90(22):10749\u201310753, 1993.\n\n[2] R. S. Zemel, P. Dayan, and A. Pouget. Probabilistic interpretation of population codes. Neural Comput,\n\n10(2):403\u2013430, Feb 1998.\n\n8\n\n\f[3] Nicolas Brunel and Jean-Pierre Nadal. Mutual information, \ufb01sher information, and population coding.\n\nNeural Computation, 10(7):1731\u20131757, 1998.\n\n[4] Kechen Zhang and Terrence J. Sejnowski. Neuronal tuning: To sharpen or broaden? Neural Computation,\n\n11(1):75\u201384, 1999.\n\n[5] A. Pouget, S. Deneve, J. Ducom, and P. E. Latham. Narrow versus wide tuning curves: What\u2019s best for a\n\npopulation code? Neural Computation, 11(1):85\u201390, 1999.\n\n[6] M. Bethge, D. Rotermund, and K. Pawelzik. Optimal short-term population coding: When \ufb01sher infor-\n\nmation fails. Neural computation, 14(10):2317\u20132351, 2002.\n\n[7] P. Series, P. E. Latham, and A. Pouget. Tuning curve sharpening for orientation selectivity: coding\n\nef\ufb01ciency and the impact of correlations. Nature Neuroscience, 7(10):1129\u20131135, 2004.\n\n[8] W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic population\n\ncodes. Nature Neuroscience, 9:1432\u20131438, 2006.\n\n[9] Marcelo A Montemurro and Stefano Panzeri. Optimal tuning widths in population coding of periodic\n\nvariables. Neural computation, 18(7):1555\u20131576, 2006.\n\n[10] R. Haefner and M. Bethge. Evaluating neuronal codes for inference using \ufb01sher information. Neural\n\nInformation Processing Systems, 2010.\n\n[11] D. Ganguli and E. P. Simoncelli. Implicit encoding of prior probabilities in optimal neural populations.\n\nIn Adv. Neural Information Processing Systems, volume 23, May 2010.\n\n[12] Xue-Xin Wei and Alan Stocker. Ef\ufb01cient coding provides a direct link between prior and likelihood in\n\nperceptual bayesian inference. In Adv. Neur. Inf. Proc. Sys. 25, pages 1313\u20131321, 2012.\n\n[13] Z Wang, A Stocker, and A Lee. Optimal neural tuning curves for arbitrary stimulus distributions: Dis-\n\ncrimax, infomax and minimum lp loss. In Adv. Neur. Inf. Proc. Sys. 25, pages 2177\u20132185, 2012.\n\n[14] P. Berens, A.S. Ecker, S. Gerwinn, A.S. Tolias, and M. Bethge. Reassessing optimal neural population\ncodes with neurometric functions. Proceedings of the National Academy of Sciences, 108(11):4423, 2011.\n[15] Steve Yaeli and Ron Meir. Error-based analysis of optimal tuning functions explains phenomena observed\n\nin sensory neurons. Frontiers in computational neuroscience, 4, 2010.\n\n[16] J. M. Beck, P. E. Latham, and A. Pouget. Marginalization in neural circuits with divisive normalization.\n\nJ Neurosci, 31(43):15310\u201315319, Oct 2011.\n\n[17] Stuart Yarrow, Edward Challis, and Peggy Seri`es. Fisher and shannon information in \ufb01nite neural popu-\n\nlations. Neural Computation, 24(7):1740\u20131780, 2012.\n\n[18] D Ganguli and E P Simoncelli. Ef\ufb01cient sensory encoding and Bayesian inference with heterogeneous\nneural populations. Neural Computation, 26(10):2103\u20132134, Oct 2014. Published online: 24 July 2014.\n[19] E. Zohary, M. N. Shadlen, and W. T. Newsome. Correlated neuronal discharge rate and its implications\n\nfor psychophysical performance. Nature, 370(6485):140\u2013143, Jul 1994.\n\n[20] Keiji Miura, Zachary Mainen, and Naoshige Uchida. Odor representations in olfactory cortex: distributed\n\nrate coding and decorrelated population activity. Neuron, 74(6):1087\u20131098, 2012.\n\n[21] Jakob H. Macke, Manfred Opper, and Matthias Bethge. Common input explains higher-order correlations\nand entropy in a simple model of neural population activity. Phys. Rev. Lett., 106(20):208102, May 2011.\n[22] R. Moreno-Bote, J. Beck, I. Kanitscheider, X. Pitkow, P.E. Latham, and A. Pouget. Information-limiting\n\ncorrelations. Nat Neurosci, 17(10):1410\u20131417, Oct 2014.\n\n[23] K. Josic, E. Shea-Brown, B. Doiron, and J. de la Rocha. Stimulus-dependent correlations and population\n\ncodes. Neural Comput, 21(10):2774\u20132804, Oct 2009.\n\n[24] G. Dehaene, J. Beck, and A. Pouget. Optimal population codes with limited input information have \ufb01nite\n\ntuning-curve widths. In CoSyNe, Salt Lake City, Utah, February 2013.\n\n[25] NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.0.9 of 2014-08-29.\n[26] David C Burr and Sally-Ann Wijesundra. Orientation discrimination depends on spatial frequency. Vision\n\nResearch, 31(7):1449\u20131452, 1991.\n\n[27] P. Seri`es, A. A. Stocker, and E. P. Simoncelli. Is the homunculus \u2019aware\u2019 of sensory adaptation? Neural\n\nComputation, 21(12):3271\u20133304, Dec 2009.\n\n[28] R. L. De Valois, E. W. Yund, and N. Hepler. The orientation and direction selectivity of cells in macaque\n\nvisual cortex. Vision research, 22(5):531\u2013544, 1982.\n\n9\n\n\f", "award": [], "sourceid": 1024, "authors": [{"given_name": "Agnieszka", "family_name": "Grabska-Barwinska", "institution": "Gatsby Unit, UCL"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "UT Austin"}]}