{"title": "Demixing odors - fast inference in olfaction", "book": "Advances in Neural Information Processing Systems", "page_first": 1968, "page_last": 1976, "abstract": "The olfactory system faces a difficult inference problem: it has to determine what odors are present based on the distributed activation of its receptor neurons. Here we derive neural implementations of two approximate inference algorithms that could be used by the brain. One is a variational algorithm (which builds on the work of Beck. et al., 2012), the other is based on sampling. Importantly, we use a more realistic prior distribution over odors than has been used in the past: we use a spike and slab'' prior, for which most odors have zero concentration. After mapping the two algorithms onto neural dynamics, we find that both can infer correct odors in less than 100 ms, although it takes ~500 ms to eliminate false positives. Thus, at the behavioral level, the two algorithms make very similar predictions. However, they make different assumptions about connectivity and neural computations, and make different predictions about neural activity. Thus, they should be distinguishable experimentally.  If so, that would provide insight into the mechanisms employed by the olfactory system, and, because the two algorithms use very different coding strategies, that would also provide insight into how networks represent probabilities.\"", "full_text": "Demixing odors \u2014 fast inference in olfaction\n\nAgnieszka Grabska-Barwi\u00b4nska\n\nGatsby Computational Neuroscience Unit\n\nUCL\n\nagnieszka@gatsby.ucl.ac.uk\n\nJeff Beck\n\nDuke University\n\njeff@gatsby.ucl.ac.uk\n\nAlexandre Pouget\nUniversity of Geneva\n\nAlexandre.Pouget@unige.ch\n\nPeter E. Latham\n\nGatsby Computational Neuroscience Unit\n\nUCL\n\npel@gatsby.ucl.ac.uk\n\nAbstract\n\nThe olfactory system faces a dif\ufb01cult inference problem: it has to determine what\nodors are present based on the distributed activation of its receptor neurons. Here\nwe derive neural implementations of two approximate inference algorithms that\ncould be used by the brain. One is a variational algorithm (which builds on the\nwork of Beck. et al., 2012), the other is based on sampling. Importantly, we use\na more realistic prior distribution over odors than has been used in the past: we\nuse a \u201cspike and slab\u201d prior, for which most odors have zero concentration. Af-\nter mapping the two algorithms onto neural dynamics, we \ufb01nd that both can infer\ncorrect odors in less than 100 ms. Thus, at the behavioral level, the two algo-\nrithms make very similar predictions. However, they make different assumptions\nabout connectivity and neural computations, and make different predictions about\nneural activity. Thus, they should be distinguishable experimentally. If so, that\nwould provide insight into the mechanisms employed by the olfactory system,\nand, because the two algorithms use very different coding strategies, that would\nalso provide insight into how networks represent probabilities.\n\n1\n\nIntroduction\n\nThe problem faced by the sensory system is to infer the underlying causes of a set of input spike\ntrains. For the olfactory system, the input spikes come from a few hundred different types of olfac-\ntory receptor neurons, and the problem is to infer which odors caused them. As there are more than\n10,000 possible odors, and more than one can be present at a time, the search space for mixtures of\nodors is combinatorially large. Nevertheless, olfactory processing is fast: organisms can typically\ndetermine what odors are present in a few hundred ms.\nHere we ask how organisms could do this. Since our focus is on inference, not learning: we assume\nthat the olfactory system has learned both the statistics of odors in the world and the mapping\nfrom those odors to olfactory receptor neuron activity. We then choose a particular model for both,\nand compute, via Bayes rule, the full posterior distribution. This distribution is, however, highly\ncomplex: it tells us, for example, the probability of coffee at a concentration of 14 parts per million\n(ppm), and no bacon, and a rose at 27 ppm, and acetone at 3 ppm, and no apples and so on, where\nthe \u201cso on\u201d is a list of thousands more odors. It is unlikely that such detailed information is useful\nto an organism. It is far more likely that organisms are interested in marginal probabilities, such\nas whether or not coffee is present independent of all the other odors. Unfortunately, even though\nwe can write down the full posterior, calculation of marginal probabilities is intractable due to the\n\n1\n\n\fsum over all possible combinations of odors: the number of terms in the sum is exponential in the\nnumber of odors.\nWe must, therefore, consider approximate algorithms. Here we consider two: a variational approxi-\nmation, which naturally generates approximate posterior marginals, and sampling from the posterior,\nwhich directly gives us the marginals. Our main goal is to determine which, if either, is capable of\nperforming inference on ecologically relevant timescales using biologically plausible circuits. We\nbegin by introducing a generative model for spikes in a population of olfactory receptor neurons. We\nthen describe the variational and sampling inference schemes. Both descriptions lead very naturally\nto network equations. We simulate those equations, and \ufb01nd that both the variational and sampling\napproaches work well, and require less than 100 ms to converge to a reasonable solution. Therefore,\nfrom the point of view of speed and accuracy \u2013 things that can be measured from behavioral exper-\niments \u2013 it is not possible to rule out either of them. However, they do make different predictions\nabout activity, and so it should be possible to tell them apart from electrophysiological experiments.\nThey also make different predictions about the neural representation of probability distributions. If\none or the other could be corroborated experimentally, that would provide valuable insight into how\nthe brain (or at least one part of the brain) codes for probabilities [1].\n\n2 The generative model for olfaction\n\nThe generative model consists of a probabilistic mapping from odors (which for us are a high level\npercepts, such as coffee or bacon, each of which consists of a mixture of many different chemicals) to\nodorant receptor neurons, and a prior over the presence or absence of odors and their concentrations.\nIt is known that each odor, by itself, activates a different subset of the olfactory receptor neurons;\ntypically on the order of 10%-30% [2]. Here we assume, for simplicity, that activation is linear, for\nwhich the activity of odorant receptor neuron i, denoted ri is linearly related to the concentrations,\ncj of the various odors which are present in a given olfactory scene, plus some background rate, r0.\nAssuming Poisson noise, the response distribution has the form\n\nr0 +(cid:80)\nIn a nutshell, ri is Poisson with mean r0 +(cid:80)\n\nP (r|c) =\n\n(cid:89)\n\ni\n\n(cid:16)\n\nj wijcj\nri!\nj wijcj.\n\n(cid:17)ri\n\n\u2212(cid:0)r0+(cid:80)\n\ne\n\n(cid:1)\n\nj wij cj\n\n.\n\n(2.1)\n\nIn contrast to previous work [3], which used a smooth prior on the concentrations, here we use\na spike and slab prior. With this prior, there is a \ufb01nite probability that the concentration of any\nparticular odor is zero. This prior is much more realistic than a smooth one, as it allows only a\nsmall number of odors (out of \u223c10,000) to be present in any given olfactory scene. It is modeled by\nintroducing a binary variable, sj, which is 1 if odor j is present and 0 otherwise. For simplicity we\nassume that odors are independent and statistically homogeneous,\n\n(cid:89)\n(cid:89)\n\nj\n\nP (c|s) =\n\n(1 \u2212 sj)\u03b4(cj) + sj\u0393(cj|\u03b11, \u03b21)\n\nP (s) =\n\n\u03c0sj (1 \u2212 \u03c0)1\u2212sj\n\n(2.2a)\n\n(2.2b)\n\nwhere \u03b4(c) is the Dirac delta function and \u0393(c|\u03b1, \u03b2) is the Gamma distribution: \u0393(c|\u03b1, \u03b2) =\n\n\u03b2\u03b1c\u03b1\u22121e\u2212\u03b2c/\u0393(\u03b1) with \u0393(\u03b1) the ordinary Gamma function, \u0393(\u03b1) =(cid:82) \u221e\n\n0 dx x\u03b1\u22121e\u2212x.\n\nj\n\n3\n\nInference\n\n3.1 Variational inference\n\nBecause of the delta-function in the prior, performing ef\ufb01cient variational inference in our model is\ndif\ufb01cult. Therefore, we smooth the delta-function, and replace it with a Gamma distribution. With\nthis manipulation, the approximate (with respect to the true model, Eq. (2.2a)) prior on c is\n\nPvar(c|s) =\n\n(1 \u2212 sj)\u0393(cj|\u03b10, \u03b20) + sj\u0393(cj|\u03b11, \u03b21) .\n\n(3.1)\n\n(cid:89)\n\nj\n\n2\n\n\fThe approximate prior allows absent odors to have nonzero concentration. We can partially com-\npensate for that by setting the background \ufb01ring rate, r0 to zero, and choosing \u03b10 and \u03b20 such that\nthe effective background \ufb01ring rate (due to the small concentration when sj = 0) is equal to r0; see\nSec. 4.\nAs is typical in variational inference, we use a factorized approximate distribution. This distribution,\ndenoted Q(c, s|r),was set to Q(c|s, r)Q(s|r) where\n\nQ(c|s, r) =\n\nQ(s|r) =\n\n(cid:89)\n(cid:89)\n\nj\n\nj\n\n(1 \u2212 sj)\u0393(cj|\u03b10j, \u03b20j) + sj\u0393(cj|\u03b11j, \u03b21j)\n\nj (1 \u2212 \u03bbj)1\u2212sj .\n\u03bbsj\n\n(3.2a)\n\n(3.2b)\n\n(3.3a)\n\n(3.3b)\n\n(3.3c)\n\n(3.4a)\n\n(3.4b)\n\n(3.4c)\n\nIntroducing auxiliary variables, as described in Supplementary Material, and minimizing the\nKullback-Leibler distance between Q and the true posterior augmented by the auxiliary variables\nleads to a set of nonlinear equations for the parameters of Q. To simplify those equations, we set \u03b11\nto \u03b10 + 1, resulting in\n\n(cid:88)\n\ni\n\n(cid:80)\n\n\u03b10j = \u03b10 +\n\nriwijFj(\u03bbj, \u03b10j)\nk=1 wikFk(\u03bbk, \u03b10k)\n\nLj \u2261 log\n\n\u03bbj\n1 \u2212 \u03bbj\n\n= L0j + log(\u03b10j/\u03b10) + \u03b10j log(\u03b20j/\u03b21j)\n\nwhere\n\nL0j \u2261 log\n\n\u03c0\n1 \u2212 \u03c0\n\n\u2212 \u03b10 log (\u03b20/\u03b21) + log(\u03b21/\u03b21j)\n\nFj(\u03bb, \u03b1) \u2261 exp [(1 \u2212 \u03bb)(\u03a8(\u03b1) \u2212 log \u03b20j) + \u03bb(\u03a8(\u03b1 + 1) \u2212 log \u03b21j)]\n\n(3.3d)\nand \u03a8(\u03b1) \u2261 d log \u0393(\u03b1)/d\u03b1 is the digamma function. The remaining two parameters, \u03b20j and \u03b21j,\n\nare \ufb01xed by our choice of weights and priors: \u03b20j = \u03b20 +(cid:80)\n\ni wij and \u03b21j = \u03b21 +(cid:80)\n\ni wij.\n\nTo solve Eqs. (3.3a-b) in a way that mimics the kinds of operations that could be performed by\nneuronal circuitry, we write down a set of differential equations that have \ufb01xed points satisfying\nEq. (3.3),\n\n(cid:88)\n\nj\n\n\u03c4\u03c1\n\nd\u03c1i\ndt\n\n= ri \u2212 \u03c1i\n\nwijFj(\u03bbj, \u03b10j)\n\n(cid:88)\n\n= \u03b10 + Fj(\u03bbj, \u03b10j)\n\n\u03c1iwij \u2212 \u03b10j\n\n= L0j + log(\u03b10j/\u03b10) + \u03b10j log(\u03b20j/\u03b21j) \u2212 Lj\n\ni\n\n\u03c4\u03b1\n\nd\u03b10j\ndt\n\n\u03c4\u03bb\n\ndLj\ndt\n\nNote that we have introduced an additional variable, \u03c1i. This variable is proportional to ri, but\nmodulated by divisive inhibition: the \ufb01xed point of Eq. (3.4a) is\n\n(cid:80)\n\n\u03c1i =\n\nri\n\nk wikFk(\u03bbk, \u03b10k)\n\n.\n\n(3.5)\n\nClose scrutiny of Eqs. (3.4) and (3.3d) might raise some concerns: (i) \u03c1 and \u03b1 are reciprocally\nand symmetrically connected; (ii) there are multiplicative interactions between F (\u03bbj, \u03b10j) and \u03c1;\nand (iii) the neurons need to compute nontrivial nonlinearities, such as logarithm, exponent and a\nmixture of digamma functions. However: (i) reciprocal and symmetric connectivity exists in the\nearly olfactory processing system [4, 5, 6]; (ii) although multiplicative interactions are in general\nnot easy for neurons, the divisive normalization (Eq. (3.5)) has been observed in the olfactory bulb\n[7], and (iii) the nonlinearities in our algorithms are not extreme (the logarithm is de\ufb01ned only on the\npositive range (\u03b10j > \u03b10, Eq. (3.3a)), and Fj(\u03bb, \u03b1) function is a soft-thresholded linear function;\nsee Fig. S1). Nevertheless, a realistic model would have to approximate Eqs. (3.4a-c), and thus\ndegrade slightly the quality of the inference.\n\n3\n\n\f3.2 Sampling\n\nThe second approximate algorithm we consider is sampling. To sample ef\ufb01ciently from our model,\nwe introduce a new set of variables, \u02dccj,\n\nWhen written in terms of \u02dccj rather than cj, the likelihood becomes\n\ncj = \u02dccjsj .\n\n(r0 +(cid:80)\n\n(cid:89)\n\ni\n\nj wij \u02dccjsj)ri\nri!\n\ne\n\n\u2212(cid:0)r0+(cid:80)\n\n(cid:1)\n\n.\n\nj wij \u02dccj sj\n\n(3.6)\n\n(3.7)\n\nP (r|\u02dcc, s) =\n\nBecause the value of \u02dccj is unconstrained when sj = 0, we have complete freedom in choosing\nP (\u02dccj|sj = 0), the piece of the prior corresponding to the absence of odor j. It is convenient to set it\nto the same prior we use when sj = 1, which is \u0393(\u02dccj|\u03b11, \u03b21). With this choice, \u02dcc is independent of\ns, and the prior over \u02dcc is simply\n\nP (\u02dcc) =\n\n\u0393(\u02dccj|\u03b11, \u03b21) .\n\n(3.8)\n\n(cid:89)\n\nj\n\nThe prior over s, Eq. (2.2b), remains the same. Note that this set of manipulations does not change\nthe model: the likelihood doesn\u2019t change, since by de\ufb01nition \u02dccjsj = cj; when sj = 1, \u02dccj is drawn\nfrom the correct prior; and when sj = 0, \u02dccj does not appear in the likelihood.\nTo sample from this distribution we use Langevin sampling on c and Gibbs sampling on s. The\nformer is standard,\n\n\u03c4c\n\nd\u02dccj\ndt\n\n=\n\n\u2202 log P (\u02dcc, s|r)\n\n\u2202\u02dccj\n\n+ \u03be(t) =\n\n\u03b11 \u2212 1\n\n\u02dccj\n\n\u2212 \u03b21 + sj\n\n(cid:88)\n\ni\n\nwij\n\n(cid:18)\n\nr0 +(cid:80)\n\n(cid:19)\n\nri\nk wik\u02dccksk\n\n\u2212 1\n\n+ \u03be(t)\n\n(3.9)\n\nwhere \u03be(t) is delta-correlated white noise with variance 2\u03c4: (cid:104)\u03bej(t)\u03bej(cid:48)(t(cid:48))(cid:105) = 2\u03c4 \u03b4(t \u2212 t(cid:48))\u03b4jj(cid:48).\nBecause the ultimate goal is to implement this algorithm in networks of neurons, we need a Gibbs\nsampler that runs asynchronously and in real time. This can be done by discretizing time into steps\nof length dt, and computing the update probability for each odor on each time step. This is a valid\nGibbs sampler only in the limit dt \u2192 0, where no more than one odor can be updated per time step\nthat\u2019s the limit of interest here. The update rule is\n\nT (s(cid:48)\n\nj|\u02dcc, s, r) = \u03bd0dtP (s(cid:48)\n\nj|\u02dcc, s, r) + (1 \u2212 \u03bd0dt) \u2206(s(cid:48)\n\nj \u2212 sj)\n\n(3.10)\nj \u2261 sj(t + dt), s and \u02dcc should be evaluated at time t, and \u2206(s) is the Kronecker delta:\nwhere s(cid:48)\n\u2206(s) = 1 if s = 0 and 0 otherwise. As is straightforward to show, this update rule has the correct\nequilibrium distribution in the limit dt \u2192 0 (see Supplementary Material).\nComputing P (s(cid:48)\n\nP (s(cid:48)\n\nj = 1|\u02dcc, s, r) is straightforward, and we \ufb01nd that\nj = 1|\u02dcc, s, r) =\n(cid:88)\n\n1 + exp[\u2212\u03a6j]\n\nr0 +(cid:80)\nr0 +(cid:80)\n\n\u03c0\n1 \u2212 \u03c0\n\nri log\n\n(cid:34)\n\n+\n\n1\n\n\u03a6j = log\n\ni\n\nk(cid:54)=j wik\u02dccksk + wij \u02dccj\n\nk(cid:54)=j wik\u02dccksk\n\n(cid:35)\n\n.\n\n\u2212 \u02dccjwij\n\n(3.11)\n\nEquations (3.9) and (3.11) raise almost exactly the same concerns that we saw for the variational\napproach: (i) c and s are reciprocally and symmetrically connected; (ii) there are multiplicative\ninteractions between \u02dcc and s; and (iii) the neurons need to compute nontrivial nonlinearities, such as\nlogarithm and divisive normalization. Additionally, the noise in the Langevin sampler (\u03be in Eq. 3.9)\nhas to be white and have exactly the right variance. Thus, as with the variational approach, we\nexpect a biophysical model to introduce approximations, and, therefore \u2014 as with the variational\nalgorithm \u2014 degrade slightly the quality of the inference.\n\n4\n\n\fFigure 1: Priors over concentration. The true priors \u2013 the ones used to generate the data \u2013 are shown\nin red and magenta; these correspond to \u03b4(c) and \u0393(c|\u03b11, \u03b21), respectively. The variational prior in\nthe absence of an odor, \u0393(c|\u03b10, \u03b20) with \u03b10 = 0.5 and \u03b20 = 20, is shown in blue.\n\n4 Simulations\n\nTo determine how fast and accurate these two algorithms are, we performed a set of simulations\nusing either Eq. (3.4) (variational inference) or Eqs. (3.9 - 3.11) (sampling). For both algorithms,\nthe odors were generated from the true prior, Eq. (2.2). We modeled a small olfactory system, with\n40 olfactory receptor types (compared to approximately 350 in humans and 1000 in mice [8]). To\nkeep the ratio of identi\ufb01able odors to receptor types similar to the one in humans [8], we assumed\n400 possible odors, with 3 odors expected to be present in the scene (\u03c0 = 3/400). If an odor was\npresent, its concentration was drawn from a Gamma distribution with \u03b11 = 1.5 and \u03b21 = 1/40.\nThe background spike count, r0, was set to 1. The connectivity matrix was binary and random,\nwith a connection probability, pc (the probability that any particular element is 1), set to 0.1 [2]. All\nnetwork time constants (\u03c4\u03c1, \u03c4\u03b1, \u03c4\u03bb, \u03c4c and 1/\u03bd0, from Eqs (3.4), (3.9) and (3.10)) were set to 10 ms.\nThe differential equations were solved using the Euler method with a time step of 0.01 ms. Because\nwe used \u03b11 = \u03b10 + 1, the choice \u03b11 = 1.5 forced \u03b10 to be 0.5. Our remaining parameter, \u03b20, was\nset to ensure that, for the variational algorithm, the absent odors (those with sj = 0) contributed a\nj(cid:104)wij(cid:105)(cid:104)cj(cid:105) =\npcNodors\u03b10/\u03b20. Setting this to r0 yields \u03b20 = pcNodors\u03b10/r0 = 0.1\u00d7 400\u00d7 0.5/1 = 20. The true\n(Eq. (2.2)) and approximate (Eq. (3.1)) prior distributions over concentration are shown in Fig. 1.\nFigure 2 shows how the inference process evolves over time for a typical set of odors and concen-\ntrations. The top panel shows concentration, with variational inference on the left (where we plot\nthe mean of the posterior distribution over concentration, (1\u2212 \u03bbj)\u03b10j(t)/\u03b20j(t) + \u03bbj\u03b11j(t)/\u03b21j(t);\nsee Eq. (3.2)) and sampling on the right (where we plot \u02dccj, the output of our Langevin sampler; see\nEq. (3.9)) for a case with three odors present. The three colored lines correspond to the odors that\n\nbackground \ufb01ring rate of r0 on average. This average background rate is given by(cid:80)\n\nFigure 2: Example run for the variational algorithm (left) and sampling (right); see text for details.\nIn the bottom left panel the green, blue and red lines go to a probability of 1 ( log probability of 0)\nwithin about 50 ms. In sampling, the initial value of concentrations is set to the most likely value\nunder the prior (\u02dcc(0) = (\u03b11 \u2212 1)/\u03b21). The dashed lines are the true concentrations.\n\n5\n\n010203040506070809010000.02\u03b4(c)P0(c)P1(c)Concentration00.5100.05050100150ConcentrationsVariational00.20.40.60.81\u22126\u22124\u221220Log\u2212probabilitiesTime [sec]00.20.40.60.810100200300400Time [sec]odors050100150Samplingc(t)\fwere presented, with solid lines for the inferred concentrations and dashed lines for the true ones.\nBlack lines are the odors that were not present. At least in this example, both algorithms converge\nrapidly to the true concentration.\nIn the bottom left panel of Fig. 2 we plot the log-probability that each of the odors is present, \u03bbj(t).\nThe present odors quickly approach probabilities of 1; the absent odors all have probabilities below\n10\u22124 within about 200 ms. The bottom right panel shows samples from sj for all the odors, with\ndots denoting present odors (sj(t) = 1) and blanks absent odors (sj(t) = 0). Beyond about 500 ms,\nthe true odors (the colored lines at the bottom) are on continuously, and for the odors that were not\npresent, sj is still occasionally 1, but relatively rarely.\nIn Fig. 3 we show the time course of the probability of odors when between 1 and 5 odors were\npresented. We show only the \ufb01rst 100 ms, to emphasize the initial time course. Again, variational\ninference is on the left and sampling is on the right. The black lines are the average values of the\nprobability of the correct odors; the gray regions mark 25%\u201375% percentiles. Ideally, we would like\nto compare these numbers to those expected from a true posterior. However, due to its intractability,\nwe must seek different means of comparison. Therefore, we plot the probability of the most likely\nnon-presented odor (red); the average probability of the non-presented odors (green), and the prob-\nability of guessing the correct odors via simple template matching (dashed; see Fig. 3 legend for\ndetails).\nAlthough odors are inferred relatively rapidly (they exceed template matching within 20 ms), there\nwere almost always false positives. Even with just one odor present, both algorithms consistently\nreport the existence of another odor (red). This problem diminishes with time if fewer odors are\npresented than the expected three, but it persists for more complex mixtures. The false positives\nare in fact consistent with human behavior: humans have dif\ufb01culty correctly identify more than one\nodor in a mixture, with the most common problem being false positives [9].\nFinally, because the two algorithms encode probabilities differently (see Discussion below), we also\nlook into the time courses of the neural activity. In Fig. 4, we show the log-probability, L (left),\nand probability, \u03bb (right), averaged across 400 scenes containing 3 odors (see Supplementary Fig. 2\nfor the other odor mixtures). The probability of absent odors drops from log(3/400) \u2248 e\u22125 (the\nprior) to e\u221212 (the \ufb01nal inferred probability). For the variational approach, this represents a drop\nin activity of 7 log units, comparable to the increase of about 5 log units for the present odors\n(whose probability is inferred to be near 1). For the sampling based approach, on the other hand,\nthis represents a very small drop in activity. Thus, for the variational algorithm the average activity\nassociated with the absent odors exhibits a large drop, whereas for the sampling based approach the\naverage activity associated with the absent odors starts small and stays small.\n\n5 Discussion\n\nWe introduced two algorithms for inferring odors from the activity of the odorant receptor neurons.\nOne was a variational method; the other sampling based. We mapped both algorithms onto dynami-\ncal systems, and, assuming time constants of 10 ms (plausible for biophysically realistic networks),\ntested the time course of the inference.\nThe two algorithms performed with striking similarity: they both inferred odors within about 100 ms\nand they both had about the same accuracy. However, since the two methods encode probabilities\ndifferently (linear vs logarithmic encoding), they can be differentiated at the level of neural activity.\nAs can be seen by examining Eqs. (3.4a) and (3.4c), for variational inference the log probability of\nconcentration and presence/absence are related to the dynamical variables via\n\nlog Q(cj) \u223c \u03b11j log cj \u2212 \u03b21jcj\nlog Q(sj) \u223c Ljsj\n\n(5.1a)\n(5.1b)\nwhere \u223c indicates equality within a constant. If we interpret \u03b10j and Lj as \ufb01ring rates, then these\nequations correspond to a linear probabilistic population code [10]: the log probability inferred by\nthe approximate algorithm is linear in \ufb01ring rate, with a parameter-dependent offset (the term \u2212\u03b21jcj\nin Eq. (5.1a)). For the sampling-based algorithm, on the other hand, activity generates samples from\nthe posterior; an average of those samples codes for the probability of an odor being present. Thus,\nif the olfactory system uses variational inference, activity should code for log probability, whereas\nif it uses sampling, activity should code for probability.\n\n6\n\n\fFigure 3: Inference by networks \u2014 initial 100 ms. Black: average value of the probability of correct\nodors; red: probability of the most likely non-presented odor; green: average probability of the non-\npresented odors. Shaded areas represent 25th\u201375th percentile of values across 400 olfactory scenes.\nIn the variational approach, values are often either 0 or 1, which makes it possible for the mean to\nland outside of the chosen percentile range; this happens whenever the odors are guessed correctly\nmore than 75% of the time, in which case the 25th\u201375th percentile collapses to 1, or less than 25%\nof the time, in which case the 25th\u201375th percentile collapses to 0. The left panel shows variational\ninference, where we plot \u03bbj(t); the right one shows sampling, where we plot sk(t) averaged over 20\nrepetitions of the algorithm (to avoid arbitrariness in decoding/smoothing/averaging the samples).\nBoth methods exceed template matching within 20 ms (dashed line). (Template matching \ufb01nds odors\n(the j\u2019s) that maximize the dot product between the activity, ri, and the weights, wij, associated,\n\nwith odor j; that is, it chooses j\u2019s that maximize(cid:80)\n\n(cid:1)1/2. The number of\n\nodors chosen by template matching was set to the number of odors presented.) For more complex\nmixtures, sampling is slightly more ef\ufb01cient at inferring the presented odors. See Supplementary\nMaterial for the time course out to 1 second and for mixtures of up to 10 odors.\n\ni riwij/(cid:0)(cid:80)\n\n(cid:80)\n\ni r2\ni\n\ni w2\nij\n\n7\n\n00.51<p(s=1)>1 odor00.51<p(s=1)>2 odors00.51<p(s=1)>3 odors00.51<p(s=1)>4 odors02040608010000.51Time [ms]<p(s=1)>5 odorsVariational00.51<p(s=1)>1 odor00.51<p(s=1)>2 odors00.51<p(s=1)>3 odors00.51<p(s=1)>4 odors0200400600800100000.51Time [ms]<p(s=1)>5 odorsSampling\fFigure 4: Average time course of log(p(s)) (left) and p(s) (right, same as in Fig. 3). For the varia-\ntional algorithm, the activity of the neurons codes for log probability (relative to some background\nto keep \ufb01ring rates non-negative). For this algorithm, the drop in probability of the non-presented\nodors from about e\u22125 to e\u221212 corresponds to a large drop in \ufb01ring rate. For the sampling based\nalgorithm, on the other hand, activity codes for probability, and there is almost no drop in activity.\n\nThere are two ways to determine which. One is to note that for the variational algorithm there is\na large drop in the average activity of the neurons coding for the non-present odors (Fig. 4 and\nSupplementary Figure 2). This drop could be detected with electrophysiology. The other focuses on\nthe present odors, and requires a comparison between the posterior probability inferred by an animal\nand neural activity. The inferred probability can be measured by so-called \u201copt-out\u201d experiments\n[11]; the latter by sticking an electrode into an animal\u2019s head, which is by now standard.\nThe two algorithms also make different predictions about the activity coding for concentration. For\nthe variational approach, activity, \u03b10j, codes for the parameters of a probability distribution. Im-\nportantly, in the variational scheme the mean and variance of the distribution are tied \u2013 both are\nproportional to activity. Sampling, on the other hand, can represent arbitrary concentration distri-\nbutions. These two schemes could, therefore, be distinguished by separately manipulating average\nconcentration and uncertainty \u2013 by, for example, showing either very similar or very different odors.\nUnfortunately, it is not clear where exactly one needs to stick the electrode to record the trace of the\nolfactory inference. A good place to start would be the olfactory bulb, where odor representations\nhave been studied extensively [12, 13, 14]. For example, the dendro-dendritic connections observed\nin this structure [4] are particularly well suited to meet the symmetry requirements on wij. We note\nin passing that these connections have been the subject of many theoretical studies. Most, however,\nconsidered single odors [15, 6, 16], for which one does not need a complicated inference process\nAn early notable exception to the two-odor standard was Zhaoping [17], who proposed a model\nfor serial analysis of complex mixtures, whereby higher cortical structures would actively adapt the\nalready recognized components and send a feedback signal to the lower structures. Exactly how her\nnetwork relates to our inference algorithms remains unclear. We should also point out that although\nthe olfactory bulb is a likely location for at least part of our two inference algorithms, both are\nsuf\ufb01ciently complicated that they may need to be performed by higher cortical structures, such as\nthe anterior piriform cortex, [18, 19].\n\nFuture directions. We have made several unrealistic assumptions in this analysis. For instance,\nthe generative model was very simple: we assumed that concentrations added linearly, that weights\nwere binary (so that each odor activated a subset of the olfactory receptor neurons at a \ufb01nite value,\nand did not activate the rest at all), and that noise was Poisson. None of these are likely to be exactly\ntrue. And we considered priors such that all odors were independent. This too is unlikely to be true \u2013\nfor instance, the set of odors one expects in a restaurant are very different than the ones one expects\nin a toxic waste dump, consistent with the fact that responses in the olfactory bulb are modulated\nby task-relevant behavior [20]. Taking these effects into account will require a more complicated,\nalmost certainly hierarchical, model. We have also focused solely on inference: we assumed that\nthe network knew perfectly both the mapping from odors to odorant receptor neurons and the priors.\nIn fact, both have to be learned. Finally, the neurons in our network had to implement relatively\ncomplicated nonlinearities: logs, exponents, and digamma and quadratic functions, and neurons had\nto be reciprocally connected. Building a network that can both exhibit the proper nonlinearities\n(at least approximately) and learn the reciprocal weights is challenging. While these issues are\nnontrivial, they do not appear to be insurmountable. We expect, therefore, that a more realistic\nmodel will retain many of the features of the simple model we presented here.\n\n8\n\n020406080100\u221210\u221250Time [ms]L3 odorsVariational02040608010000.51Time [ms]\u03bb3 odorsSampling\fReferences\n[1] J. Fiser, P. Berkes, G. Orban, and M. Lengyel. Statistically optimal perception and learning:\nfrom behavior to neural representations. Trends Cogn. Sci. (Regul. Ed.), 14(3):119\u2013130, Mar\n2010.\n\n[2] R. Vincis, O. Gschwend, K. Bhaukaurally, J. Beroud, and A. Carleton. Dense representation\n\nof natural odorants in the mouse olfactory bulb. Nat. Neurosci., 15(4):537\u2013539, Apr 2012.\n\n[3] Jeff Beck, Katherine Heller, and Alexandre Pouget. Complex inference in neural circuits with\n\nprobabilistic population codes and topic models. In NIPS, 2012.\n\n[4] W. Rall and G. M. Shepherd. Theoretical reconstruction of \ufb01eld potentials and dendrodendritic\n\nsynaptic interactions in olfactory bulb. J. Neurophysiol., 31(6):884\u2013915, Nov 1968.\n\n[5] Shepherd GM, Chen WR, and Greer CA. The synaptic organization of the brain, volume 4,\n\nchapter Olfactory bulb, pages 165\u2013216. Oxford University Press Oxford, 2004.\n\n[6] A. A. Koulakov and D. Rinberg. Sparse incomplete representations: a potential role of olfac-\n\ntory granule cells. Neuron, 72(1):124\u2013136, Oct 2011.\n\n[7] Shawn Olsen, Vikas Bhandawat, and Rachel Wilson. Divisive normalization in olfactory pop-\n\nulation codes. Neuron, 66(2):287\u2013299, 2010.\n\n[8] P. Mombaerts. Genes and ligands for odorant, vomeronasal and taste receptors. Nat. Rev.\n\nNeurosci., 5(4):263\u2013278, Apr 2004.\n\n[9] D. G. Laing and G. W. Francis. The capacity of humans to identify odors in mixtures. Physiol.\n\nBehav., 46(5):809\u2013814, Nov 1989.\n\n[10] W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget. Bayesian inference with probabilistic\n\npopulation codes. Nat. Neurosci., 9(11):1432\u20131438, Nov 2006.\n\n[11] R. Kiani and M. N. Shadlen. Representation of con\ufb01dence associated with a decision by\n\nneurons in the parietal cortex. Science, 324(5928):759\u2013764, May 2009.\n\n[12] G. Laurent, M. Stopfer, R. W. Friedrich, M. I. Rabinovich, A. Volkovskii, and H. D. Abarbanel.\nOdor encoding as an active, dynamical process: experiments, computation, and theory. Annu.\nRev. Neurosci., 24:263\u2013297, 2001.\n\n[13] H. Spors and A. Grinvald. Spatio-temporal dynamics of odor representations in the mammalian\n\nolfactory bulb. Neuron, 34(2):301\u2013315, Apr 2002.\n\n[14] Kevin Cury and Naoshige Uchida. Robust odor coding via inhalation-coupled transient activity\n\nin the mammalian olfactory bulb. Neuron, 68(3):570\u2013585, 2010.\n\n[15] Z. Li and J. J. Hop\ufb01eld. Modeling the olfactory bulb and its neural oscillatory processings.\n\nBiol Cybern, 61(5):379\u2013392, 1989.\n\n[16] Y. Yu, T. S. McTavish, M. L. Hines, G. M. Shepherd, C. Valenti, and M. Migliore. Sparse\ndistributed representation of odors in a large-scale olfactory bulb circuit. PLoS Comput. Biol.,\n9(3):e1003014, 2013.\n\n[17] Z. Li. A model of olfactory adaptation and sensitivity enhancement in the olfactory bulb. Biol\n\nCybern, 62(4):349\u2013361, 1990.\n\n[18] Julie Chapuis and Donald Wilson. Bidirectional plasticity of cortical pattern recognition and\n\nbehavioral sensory acuity. Nature neuroscience, 15(1):155\u2013161, 2012.\n\n[19] Keiji Miura, Zachary Mainen, and Naoshige Uchida. Odor representations in olfactory cortex:\ndistributed rate coding and decorrelated population activity. Neuron, 74(6):1087\u20131098, 2012.\n[20] R. A. Fuentes, M. I. Aguilar, M. L. Aylwin, and P. E. Maldonado. Neuronal activity of mitral-\ntufted cells in awake rats during passive and active odorant stimulation. J. Neurophysiol.,\n100(1):422\u2013430, Jul 2008.\n\n9\n\n\f", "award": [], "sourceid": 995, "authors": [{"given_name": "Agnieszka", "family_name": "Grabska-Barwinska", "institution": "Gatsby Unit, UCL"}, {"given_name": "Jeff", "family_name": "Beck", "institution": "Gatsby Unit, UCL"}, {"given_name": "Alexandre", "family_name": "Pouget", "institution": "University of Geneva"}, {"given_name": "Peter", "family_name": "Latham", "institution": "Gatsby Unit, UCL"}]}