{"title": "A Dual Algorithm for Olfactory Computation in the Locust Brain", "book": "Advances in Neural Information Processing Systems", "page_first": 2276, "page_last": 2284, "abstract": "We study the early locust olfactory system in an attempt to explain its well-characterized structure and dynamics. We first propose its computational function as recovery of high-dimensional sparse olfactory signals from a small number of measurements. Detailed experimental knowledge about this system rules out standard algorithmic solutions to this problem. Instead, we show that solving a dual formulation of the corresponding optimisation problem yields structure and dynamics in good agreement with biological data. Further biological constraints lead us to a reduced form of this dual formulation in which the system uses independent component analysis to continuously adapt to its olfactory environment to allow accurate sparse recovery. Our work demonstrates the challenges and rewards of attempting detailed understanding of experimentally well-characterized systems.", "full_text": "A Dual Algorithm for Olfactory Computation in the\n\nLocust Brain\n\nSina Tootoonian\n\nst582@eng.cam.ac.uk\n\nM\u00b4at\u00b4e Lengyel\n\nm.lengyel@eng.cam.ac.uk\n\nComputational & Biological Learning Laboratory\n\nDepartment of Engineering, University of Cambridge\n\nTrumpington Street, Cambridge CB2 1PZ, United Kingdom\n\nAbstract\n\nWe study the early locust olfactory system in an attempt to explain its well-\ncharacterized structure and dynamics. We \ufb01rst propose its computational function\nas recovery of high-dimensional sparse olfactory signals from a small number\nof measurements. Detailed experimental knowledge about this system rules out\nstandard algorithmic solutions to this problem. Instead, we show that solving a\ndual formulation of the corresponding optimisation problem yields structure and\ndynamics in good agreement with biological data. Further biological constraints\nlead us to a reduced form of this dual formulation in which the system uses in-\ndependent component analysis to continuously adapt to its olfactory environment\nto allow accurate sparse recovery. Our work demonstrates the challenges and re-\nwards of attempting detailed understanding of experimentally well-characterized\nsystems.\n\n1\n\nIntroduction\n\nOlfaction is perhaps the most widespread sensory modality in the animal kingdom, often crucial for\nbasic survival behaviours such as foraging, navigation, kin recognition, and mating. Remarkably,\nthe neural architecture of olfactory systems across phyla is largely conserved [1]. Such convergent\nevolution suggests that what we learn studying the problem in small model systems will generalize\nto larger ones. Here we study the olfactory system of the locust Schistocerca americana. While\nwe focus on this system because it is experimentally well-characterized (Section 2), we expect our\nresults to extend to other olfactory systems with similar architectures. We begin by observing that\nalthough most odors are mixtures of hundreds of molecular species, with typically only a few of\nthese dominating in concentration \u2013 i.e. odors are sparse in the space of molecular concentrations\n(Fig. 1A). We introduce a simple generative model of odors and their effects on odorant receptors\nthat re\ufb02ects this sparsity (Section 3). Inspired by recent experimental \ufb01ndings [2], we then propose\nthat the function of the early olfactory system is maximum a posteriori (MAP) inference of these\nconcentration vectors from receptor inputs (Section 4). This is basically a sparse signal recovery\nproblem, but the wealth of biological evidence available about the system rules out standard solu-\ntions. We are then led by these constraints to propose a novel solution to this problem in term of\nits dual formulation (Section 5), and further to a reduced form of this solution (Section 6) in which\nthe circuitry uses ICA to continuously adapt itself to the local olfactory environment (Section 7).\nWe close by discussing predictions of our theory that are amenable to testing in future experiments,\nand future extensions of the model to deal with readout and learning simultaneously, and to provide\nrobustness against noise corrupting sensory signals (Section 8).\n\n1\n\n\fFigure 1: Odors and the olfactory circuit. (A) Relative concentrations of \u223c 70 molecules in the\nodor of the Festival strawberry cultivar, demonstrating sparseness of odor vectors. (B,C) Diagram\nInputs from 90,000 ORNs converge onto \u223c 1000\nand schematic of the locust olfactory circuit.\nglomeruli, are processed by the \u223c 1000 cells (projection neurons, PN, and local internuerons, LNs)\nof the antennal lobe, and read out in a feedforward manner by the 50,000 Kenyon cells (KC) of the\nmushroom body, whose activity ultimately is read out to produce behavior. (D,E) Odor response\nof a PN (D) and a KC (E) to 7 trials of 44 mixtures of 8 monomolecular components (colors)\ndemonstrating cell- and odor-speci\ufb01c responses. The odor presentation window is in gray. PN\nresponses are dense and temporally patterned. KC responses are sparse and are often sensitive to\nsingle molecules in a mixture. Panel A is reproduced from [8], B from [6], and D-E from the dataset\nin [2].\n\n2 Biological background\n\nA schematic of the locust olfactory system is shown in Figure 1B-C. Axons from \u223c 90, 000 olfactory\nreceptor neurons (ORNs) each thought to express one type of olfactory receptor (OR) converge onto\napproximately 1000 spherical neuropilar structures called \u2018glomeruli\u2019, presumably by the \u20181-OR-to-\n1-glomerulus\u2019 rule observed in \ufb02ies and mice. The functional role of this convergence is thought to\nbe noise reduction through averaging.\nThe glomeruli are sampled by the approximately 800 excitatory projection neurons (PNs) and 300\ninhibitory local interneurons (LNs) of the antennal lobe (AL). LNs are densely connected to other\nLNs and to the PNs; PNs are connected to each-other only indirectly via their dense connections\nto LNs [3]. In response to odors, the AL exhibits 20 Hz local \ufb01eld potential oscillations and odor-\nand cell-speci\ufb01c activity patterns in its PNs and LNs (Fig. 1D). The PNs form the only output of\nthe AL and project densely [4] to the 50,000 Kenyon cells (KCs) of the mushroom body (MB).\nThe KCs decode the PNs in a memoryless fashion every oscillation cycle, converting the dense\nand promiscuous PN odor code into a very sparse and selective KC code [5], often sensitive to\na single component in a complex odor mixture [2] (Fig. 1E). KCs make axo-axonal connections\nwith neighbouring KCs [6] but otherwise only communicate with one-another indirectly via global\ninhibition mediated by the giant GABA-ergic neuron [7]. Thus, while the AL has rich recurrency,\nthere is no feedback from the KCs back to the AL: the PN to KC circuit is strictly feedforward. As\nwe shall see below, this presents a fundamental challenge to theories of AL-MB computation.\n\n2\n\nMoleculesRelativeconcentration012Time (s)EDAC012Time (s)antennamushroom body (MB)antennal lobe (AL)90,000ORNs~1000glomeruli~1000PNs~300 LNs50,000KCs1 GGNOdors~100bLNsBehaviourB\f3 Generative model\n\nNatural odors are mixtures of hundreds of different types of molecules at various concentrations (e.g.\n[8]), and can be represented as points in RN\n+ , where each dimension represents the concentration\nof one of the N molecular species in \u2018odor space\u2019. Often a few of these will be at a much higher\nconcentration than the others, i.e. natural odors are sparse. Because the AL responds similarly across\nconcentrations [9] , we will ignore concentration in our odor model and consider odors as binary\nvectors x \u2208 {0, 1}N . We will also assume that molecules appear in odor vectors independently of\none-another with probability k/N, where k is the average complexity of odors (# of molecules/odor,\nequivalently the Hamming weight of x) in odor space.\nWe assume a linear noise-free observation model y = Ax for the M-dimensional glomerular activ-\nity vector (we discuss observation noise in Section 7). A is an M \u00d7 N af\ufb01nity matrix representing\nthe response of each of the M glomeruli to each of the N molecular odor components and has el-\nements drawn iid. from a zero-mean Gaussian with variance 1/M. Our generative model for odors\nand observations is summarized as\n\nx = {x1, . . . , xN}, xi \u223c Bernoulli(k/N ), y = Ax, Aij \u223c N (0, M\u22121)\n\n(1)\n\n4 Basic MAP inference\n\nInspired by the sensitivity of KCs to monomolecular odors [2], we propose that the locust olfactory\nsystem acts as a spectrum analyzer which uses MAP inference to recover the sparse N-dimensional\nodor vector x responsible for the dense M-dimensional glomerular observations y, with M (cid:28)\nN e.g. O(1000) vs. O(10000) in the locust. Thus, the computational problem is akin to one in\ncompressed sensing [10], which we will exploit in Section 5. We posit that each KC encodes the\npresence of a single molecular species in the odor, so that the overall KC activity vector represents\nthe system\u2019s estimate of the odor that produced the observations y.\nTo perform MAP inference on binary x from y given A, a standard approach is to relax x to the\npositive orthant RN\n+ [11], smoothen the observation model with isotropic Gaussian noise of variance\n\u03c32 and perform gradient descent on the log posterior\n\nwhere \u03b2 = log((1\u2212 q)/q), q = k/N, (cid:107)x(cid:107)1 =(cid:80)M\n\nlog p(x|y, A, k) = C \u2212 \u03b2(cid:107)x(cid:107)1 \u2212 1\n\n2\u03c32(cid:107)y \u2212 Ax(cid:107)2\n\n(2)\ni=1 xi for x (cid:23) 0, and C is a constant. The gradient\n\n2\n\nof the posterior determines the x dynamics:\n\n\u02d9x \u221d \u2207x log p = \u2212\u03b2 sgn(x) +\n\n1\n\n2\u03c32 AT (y \u2212 Ax)\n\n(3)\n\nGiven our assumed 1-to-1 mapping of KCs to (decoded) elements of x, these dynamics fundamen-\ntally violate the known biology for two reasons. First, they stipulate KC dynamics where there are\nnone. Second, they require all-to-all connectivity of KCs via AT A where none exist. In reality, the\ndynamics in the circuit occur in the lower (\u223c M) dimensional measurement space of the antennal\nlobe, and hence we need a way of solving the inference problem there rather than directly in the high\n(N) dimensional space of KC activites.\n\n5 Low dimensional dynamics from duality\n\nTo compute the MAP solution using lower-dimensional dynamics, we consider the following com-\npressed sensing (CS) problem:\n\nminimize(cid:107)x(cid:107)1,\n\nsubject to(cid:107)y \u2212 Ax(cid:107)2\n\n2 = 0\n\n(4)\n\nwhose Lagrangian has the form\n\n(5)\nwhere \u03bb is a scalar Lagrange multiplier. This is exactly the equation for our (negative) log posterior\n(Eq. 2) with the constants absorbed by \u03bb. We will assume that because x is binary, the two systems\nwill have the same solution, and will henceforth work with the CS problem.\n\nL(x, \u03bb) = (cid:107)x(cid:107)1 + \u03bb(cid:107)y \u2212 Ax(cid:107)2\n\n2\n\n3\n\n\fg(\u03bb)\u2212\u03bbT y = inf\n\nx\n\n(cid:107)x(cid:107)1\u2212bT x = inf\n\nx\n\n(|xi|\u2212bixi) =\n\nM(cid:88)\n\nM(cid:88)\n\ninf\nxi\n\n(|xi|\u2212bixi) = \u2212 M(cid:88)\n\n[bi\u22121]+ (8)\n\nTo derive low dimensional dynamics, we \ufb01rst reformulate the constraint and solve\n\nminimize(cid:107)x(cid:107)1,\n\nsubject to y = Ax\n\n(6)\n\nwith Lagrangian\n\nL(x, \u03bb) = (cid:107)x(cid:107)1 + \u03bbT (y \u2212 Ax)\n\n(7)\nwhere now \u03bb is a vector of Lagrange multipliers. Note that we are still solving an N-dimensional\nminimization problem with M (cid:28) N constraints, while we need M-dimensional dynamics. There-\nfore, we consider the dual optimization problem of maximizing g(\u03bb) where g(\u03bb) = inf x L(x, \u03bb)\nis the dual Lagrangian of the problem. If strong duality holds, the primal and dual objectives have\nthe same value at the solution, and the primal solution can be found by minimizing the Lagrangian\nat the optimal value of \u03bb [11]. Were x \u2208 RN , strong duality would hold for our problem by Slater\u2019s\nsuf\ufb01ciency condition [11]. The binary nature of x robs our problem of the convexity required for\nthis suf\ufb01ciency condition to be applicable. Nevertheless we proceed assuming strong duality holds.\nThe dual Lagrangian has a closed-form expression for our problem. To see this, let b = AT \u03bb.\nThen, exploiting the form of the 1-norm and x being binary, we obtain the following:\n\ni=1\n\ni=1\n\ni=1\n\nor, in vector form, g(\u03bb) = \u03bbT y \u2212 1T [b \u2212 1]+, where [\u00b7]+ is the positive rectifying function.\nMaximizing g(\u03bb) by gradient descent yields M dimensional dynamics in \u03bb:\n\n(9)\nwhere \u03b8(\u00b7) is the Heaviside function. The solution to the CS problem \u2013 the odor vector that produced\nthe measurements y \u2013 is then read out at the convergence of these dynamics to \u03bb(cid:63) as\n\n\u02d9\u03bb \u221d \u2207\u03bb g = y \u2212 A \u03b8(AT \u03bb \u2212 1)\n\nx(cid:63) = argminx L(x, \u03bb(cid:63)) = \u03b8(AT \u03bb(cid:63) \u2212 1)\n\n(10)\nA natural mapping of equations 9 and 10 to antennal lobe dynamics is for the output of the M\nglomeruli to represent y, the PNs to represent \u03bb, and the KCs to represent (the output of) \u03b8, and\nhence eventually x(cid:63). Note that this would still require the connectivity between PNs and KCs to\nbe negative reciprocal (and determined by the af\ufb01nity matrix A). We term the circuit under this\nmapping the full dual circuit (Fig. 2B). These dynamics allow neuronal \ufb01ring rates to be both\npositive and negative, hence they can be implemented in real neurons as e.g. deviations relative to a\nbaseline rate [12], which is subtracted out at readout.\nWe measured the performance of a full dual network of M = 100 PNs in recovering binary odor\nvectors containing an average of k = 1 to 10 components out of a possible N = 1000. The\nresults in Figure 2E (blue) show that the dynamics exhibit perfect recovery.1 For comparison, we\nhave included the performance of the purely feedforward circuit (Fig. 2A), in which the glomerular\nvector y is merely scaled by the k-speci\ufb01c amount that yields minimum error before being read\nout by the KCs (Fig. 2E, black).\nIn principle, no recurrent circuit should perform worse than\nthis feedfoward network, otherwise we have added substantial (energetic and time) costs without\ncomputational bene\ufb01ts.\n\n6 The reduced dual circuit\n\nThe full dual antennal lobe circuit described by Equations 9 and 10 is in better agreement with the\nknown biology of the locust olfactory system than 2 for a number of reasons:\n\n1. Dynamics are in the lower dimensional space of the antennal lobe PNs (\u03bb) rather than the\n\nmushroom body KCs (x).\n\n2. Each PN \u03bbi receives private glomerular input yi\n3. There are no direct connections between PNs; their only interaction with other PNs is\n\nindirect via inhibition provided by \u03b8.\n\n1See the the Supplementary Material for considerations when simulating the piecewise linear dynamics of\n\n9.\n\n4\n\n\fFigure 2: Performance of the feedforward and the dual circuits. (A-C) Circuit schematics. Arrows\n(circles) indicate excitatory (inhibitory) connections. (D) Example PN and LN odor-evoked dynam-\nics for the reduced dual circuit. Top: PNs receive cell-speci\ufb01c excitation or inhibition whose strength\nis changed as different LNs are activated, yielding cell-speci\ufb01c temporal patterning. Bottom: The\nLNs whose corresponding KCs encode the odor (red) are strongly excited and eventually breach\nthe threshold (dashed line), causing changes to the dynamics (time points marked with dots). The\nexcitation of the other LNs (pink) remains subthreshold. (E) Hamming distance between recovered\nand true odor vector as a function of odor density k. The dual circuits generally outperform the\nfeedforward system over the entire range tested. Points are means, bars are s.e.m., computed for 200\ntrials (feedforward) and all trials from 200 attempts in which the steady-state solution was found\n(dual circuits, greater than 90%).\n\n4. The KCs serve merely as a readout stage and are not interconnected.2\n\nHowever, there is also a crucial disagreement of the full dual dynamics with biology: the requirement\nfor feedback from the KCs to the PNs. The mapping of \u03bb to PNs and \u03b8 to the KCs in Equation 9\nimplies negative reciprocal connectivity of PNs and KCs, i.e. a feedforward connection of Aij from\nPN i to KC j, and a feedback connection of \u2212Aij from KC j to PN i. This latter connection from\nKCs to PNs violates biological fact \u2013 no such direct and speci\ufb01c connectivity from KCs to PNs exists\nin the locust system, and even if it did, it would most likely be excitatory rather than inhibitory, as\nKCs are excitatory.\nAlthough KCs are not inhibitory, antennal lobe LNs are and connect densely to the PNs. Hence they\ncould provide the feedback required to guide PN dynamics. Unfortunately, the number of LNs is on\nthe order of that of the PNs, i.e. much fewer than the number of the KCs, making it a priori unlikely\nthat they could replace the KCs in providing the detailed pattern of feedback that the PNs require\nunder the full dual dynamics.\nTo circumvent this problem, we make two assumptions about the odor environment. The \ufb01rst is\nthat any given environment contains a small fraction of the set of all possible molecules in odor\nspace. This implies the potential activation of only a small number of KCs, whose feedback patterns\n(columns of A) could then be provided by the LNs. The second assumption is that the environment\nchanges suf\ufb01ciently slowly that the animal has time to learn it, i.e. that the LNs can update their\nfeedback patterns to match the change in required KC activations.\nThis yields the reduced dual circuit, in which the reciprocal interaction of the PNs with the KCs via\nthe matrix A is replaced with interaction with the M LNs via the square matrix B. The activity of\nthe LNs represents the activity of the KCs encoding the molecules in the current odor environment,\n\n2Although axo-axonal connections between neighbouring KC axons in the mushroom body peduncle are\n\nknown to exist [6], see also Section 2.\n\n5\n\n012345678kDistanceFeedforwardFullDualReducedDual00.020.04PNactivation00.20.40.60.81LNactivationTime(a.u.)AFeedforward CircuitBFull DualCReduced DualOdorPNsOdorLNsOdorglom.KCsDE12345678910\fand the columns of B are the corresponding columns of the full A matrix:\n\u02d9\u03bb \u221d y \u2212 B \u03b8(BT \u03bb \u2212 1), x = \u03b8(AT \u03bb \u2212 1)\n\n(11)\nNote that instantaneous readout of the PNs is still performed by the KCs as in the full dual. The\nperformance of the reduced dual is shown in red in Figure 2E, demonstrating better performance\n(cid:80)k\nthan the feedforward circuit, though not the perfect recovery of the full dual. This is because the\nsolution sets of the two equations are not the same: Suppose that B = A:,1:M , and that y =\ni=1 A:,i. The corresponding solution set for reduced dual is \u039b1(y) = {\u03bb : (B:,1:k)T \u03bb > 1 \u2227\n(B:,k+1:M )T \u03bb < 1}, equivalently \u039b1(y) = {\u03bb : (A:,1:k)T \u03bb > 1 \u2227 (A:,k+1:M )T \u03bb < 1}. On the\nother hand, the solution set for the full dual is \u039b0(y) = {\u03bb : (A:,1:k)T \u03bb > 1 \u2227 (A:,k+1:M )T \u03bb <\n1 \u2227 (A:,M +1:N )T \u03bb < 1}. Note the additional requirement that the projection of \u03bb onto columns\nM + 1 to N of A must also be less than 1. Hence any solution to the full dual is a solution to the\nreduced dual , but not necessarily vise-versa: \u039b0(y) \u2286 \u039b1(y). Since only the former are solutions to\nthe full problem, not all solutions to the reduced dual will solve it, leading to the reduced peformance\nobserved. This analysis also implies that increasing (or decreasing) the number of columns in B, so\nthat it is no longer square, will improve (worsen) the performance of the reduced dual, by making\nits solution-set a smaller (larger) superset of \u039b0(y).\n\n7 Learning via ICA\n\nFigure 2 demonstrates that the reduced dual has reasonable performance when the B matrix is\ncorrect, i.e. it contains the columns of A for the KCs that would be active in the current odor\nenvironment. How would this matrix be learned before birth, when presumably little is known about\nthe local environment, or as the animal moves from one odor environment to another?\nRecall that, according to our generative model (Section 2) and the additional assumptions made for\nderiving the reduced dual circuit (Section 6), molecules appear independently at random in odors\nof a given odor environment and the mapping from odors x to glomerular responses y is linear\nin x via the square mixing matrix B. Hence, our problem of learning B is precisely that of ICA\n(or more precisely, sparse coding, as the observation noise variance is assumed to be \u03c32 > 0 for\ninference), with binary latent variables x. We solve this problem using MAP inference via EM\nwith a mean-\ufb01eld variational approximation q(x) to the posterior p(x|y, B) [13], where q(x) (cid:44)\ni (1 \u2212 qi)1\u2212xi. The E-step, after observing that for binary x,\nx2 = x, is \u2206q \u221d \u2212\u03b3 \u2212 log q\n2\u03c32 c, \u03b2 = log((1 \u2212 q0)/q0),\n\u03c32 BT y \u2212 1\nq0 = k/M, the vector c = diag(BT B), and C = BT B \u2212 diag(c), i.e. C is BT B with the\ndiagonal elements set to zero. To yield more plausible neural dynamics, we change variables to\nv = log(q/(1 \u2212 q)). By the chain rule \u02d9v = diag(\u2202vi/\u2202qi) \u02d9q. As vi is monotonically increasing\nin qi, and so the corresponding partial derivatives are all positive, and the resulting diagonal matrix\nis positive de\ufb01nite, we can ignore it in performing gradient descent and still minimize the same\nobjective. Hence we have\n\n(cid:81)M\ni=1 Bernoulli(xi; qi) = (cid:81)M\n\n\u03c32 Cq, with \u03b3 = \u03b21 + 1\n\ni=1 qxi\n1\u2212q + 1\n\n\u2206v \u221d \u2212\u03b3 \u2212 v +\n\n1\n\n\u03c32 BT y \u2212 1\n\n\u03c32 Cq(v), q(v) =\n\n1\n\n1 + exp(\u2212v)\n\n,\n\n(12)\n\nwith the obvious mapping of v to LN membrane potentials, and q as the sigmoidal output function\nrepresenting graded voltage-dependent transmitter release observed in locust LNs.\nThe M-step update is made by changing B to increase log p(B) + Eq log p(x, y|B), yielding\n\n\u2206B \u221d \u2212 1\nM\n\nB +\n\n1\n\n\u03c32 (rqT + B diag(q(1 \u2212 q))),\n\nr (cid:44) y \u2212 Bq.\n\n(13)\n\nNote that this update rule takes the form of a local learning rule.\nEmpirically, we observed convergence within around 10,000 iterations using a \ufb01xed step size of\ndt \u2248 10\u22122, and \u03c3 \u2248 0.2 for M in the range of 20\u2013100 and k in the range of 1\u20135. In cases when\nthe algorithm did not converge, lowering \u03c3 slightly typically solved the problem. The performance\nof the algorithm is shown in \ufb01gure 3. Although the B matrix is learned to high accuracy, it is\nnot learned exactly. The resulting algorithmic noise renders the performance of the dual shown in\nFig. 2E an upper bound, since there the exact B matrix was used.\n\n6\n\n\fFigure 3: ICA performance for M = 40, k = 1, dt = 10\u22122. (A) Time course of mean squared\nerror between the elements of the estimate B and their true values for 10 different random seeds.\n\u03c3 = 0.162 for six of the seeds, 0.15 for three, and 0.14 for one. (B,C) Projection of the columns of\nBtrue into the basis of the columns of B before (B) and after learning (C), for one of the random\nseeds. Plotted values before learning are clipped to the -1\u20131 range.\n\n8 Discussion\n\n8.1 Biological evidence and predictions\n\nOur work is consistent with much of the known anatomy of the locust olfactory system, e.g. the\nlack of connectivity between PNs and dense connectivity between LNs, and between LNs and PNs\n[3]; direct ORN inputs to LNs (observed in \ufb02ies [14]; unknown in locust); dense connectivity from\nPNs to KCs [4]; odor-evoked dynamics in the antennal lobe [2], vs. memoryless readout in the KCs\n[5]. In addition, we require gradient descent PN dynamics (untested directly, but consistent with PN\ndynamics reaching \ufb01xed-points upon prolonged odor presentation [15]), and short-term plasticity in\nthe antennal lobe for ICA (a direct search for ICA has not been performed, but short-term plasticity\nis present in trial-to-trial dynamics [16]).\nOur model also makes detailed predictions about circuit connectivity. First, it predicts a speci\ufb01c\nstructure for the PN-to-KC connectivity matrix, namely AT , the transpose of the af\ufb01nity matrix.\nThis is super\ufb01cially at odds with recent work in \ufb02ies suggesting random connectivity between PNs\nand KCs (detailed connectivity information is not present in the locust). Murthy and colleagues\n[17] examined a small population of genetically identi\ufb01able KCs and found no evidence of response\nstereotypy across \ufb02ies, unlike that present at earlier stages in the system. Our model is agnostic\nto permutations of the output vector as these reassign the mapping between KCs and molecules\nand affect neither information content nor its format, so our results would be consistent with [17]\nunder animal-speci\ufb01c permutations. Caron and co-workers [18] analysed the structural connectiv-\nity of single KCs to glomeruli and found it consistent with random connectivity conditioned on a\nglomerulus-speci\ufb01c connection probability. This is also consistent with our model, with the ob-\nserved randomness re\ufb02ecting that of the af\ufb01nity matrix itself. Our model would predict (a) the\nobservation of repeated connectivity motifs if enough KCs (across animals) were observed, and that\n(b) each connectivity motif corresponds to the (binarized) glomerular response vector evoked by a\nparticular molecule. In addition we predict symmetric inhibitory connectivity between LNs (BT B),\nand negative reciprocal connectivity between PNs and LNs (Bij from PN i to LN j and \u2212Bij from\nLN to PN).\n\n8.2 Combining learning and readout\n\nWe have presented two mechanisms above \u2013 the reduced dual for readout and and ICA for learning\n\u2013 both of which need to be at play to guarantee high performance. In fact, these two mechanisms\nmust be active simultaneously in the animal. Here we sketch a possible mechanism for combining\nthem. The key is equation 12, which we repeat below, augmented with an additional term from the\nPNs:\n\n+(cid:2)BT \u03bb \u2212 1(cid:3) = \u2212v + Ilearning + Ireadout.\n\n(cid:20)\n\n\u2206v \u221d \u2212v +\n\n\u2212\u03b3 +\n\n1\n\n\u03c32 BT y \u2212 1\n\n\u03c32 C q(v)\n\n(cid:21)\n\n7\n\n020004000600080001000010\u2212610\u2212410\u22122100IterationMSEColumn of BtrueCoefficient of BinitialColumn of BtrueCoefficient of Blearned-110ABC\fFigure 4: Effects of noise. (A) As in Figure 2E but with a small amount of additive noise in the\nobservations. The full dual still outperforms the feedforward circuit which in turn outperforms the\nreduced dual over nearly half the tested range. (B) The feedback surface hinting at noise sensitivity.\nPN phase space is colored according to activation of each of the KCs and a 2D projection around\nthe origin is shown. The average size of a zone with a uniform color is quite small, suggesting that\nsmall perturbations would change the con\ufb01guration of KCs activated by a PN, and hence the readout\nperformance.\n\nSuppose (a) the two input channels were segregated e.g. on separate dendritic compartments, and\nsuch that (b) the readout component was fast but weak, while (c) the learning component was slow\nbut strong, and (d) the v time constant was faster than both. Early after odor presentation, the main\ninput to the LN would be from the readout circuit, driving the PNs to their \ufb01xed point. The input\nfrom the learning circuit would eventually catch up and dominate that of the readout circuit, driving\nthe LN dynamics for learning. Importantly, if B has already been learned, then the output of the\nLNs, q(v), would remain essentially unchanged throughout, as both the learning and readout circuits\nwould produce the same (steady-state) activation vector in the LNs. If the matrix is incorrect, then\nthe readout is likely to be incorrect already, and so the important aspect is the learning update which\nwould eventually dominate. This is just one possibility for combining learning and readout. Indeed,\neven the ICA updates themselves are non-trivial to implement. We leave the details of both to future\nwork.\n\n8.3 Noise sensitivity\n\nAlthough our derivations for serving inference and learning rules assumed observation noise, the\ndata that we provided to the models contained none. Adding a small amount of noise reduces\nthe performance of the dual circuits, particularly that of the reduced dual, as shown in Figure 4A.\nThough this may partially be attributed to numerical integration issues (Supplementary Material),\nthere is likely a fundamental theoretical cause underlying it. This is hinted at by the plot in \ufb01gure\n4B of a 2D projection in PN space of the overlayed halfspaces de\ufb01ned by the activation of each of\nthe N KCs. In the central void no KC is active and \u03bb can change freely along \u02d9\u03bb. As \u03bb crosses into\na halfspace, the corresponding KC is activated, changing \u02d9\u03bb and the trajectory of \u03bb. The different\ncolored zones indicate different patterns of KC activation and correspondingly different changes to\n\u02d9\u03bb. The small size of these zones suggests that small changes in the trajectory of \u03bb caused e.g. by\nnoise could result in very different patterns of KC activation. For the reduced dual, most of these\nhalfspaces are absent for the dynamics since B has only a small subset of the columns of A, but\nare present during readout, exacerbating the problem. How the biological system overcomes this\napparently fundamental sensitivity is an important question for future work.\n\nAcknowledgements This work was supported by the Wellcome Trust (ST, ML).\n\n8\n\n12345678910012345678kDistance FeedforwardFull DualReduced Dual\u22120.500.5\u22120.500.5AB\fReferences\n[1] Eisthen HL. Why are olfactory systems of different animals so similar?, Brain, behavior and\n\nevolution 59:273, 2002.\n\n[2] Shen K, et al. Encoding of mixtures in a simple olfactory system, Neuron 80:1246, 2013.\n[3] Jortner RA. Personal communication.\n[4] Jortner RA, et al. A simple connectivity scheme for sparse coding in an olfactory system, The\n\nJournal of neuroscience 27:1659, 2007.\n\n[5] Perez-Orive J, et al. Oscillations and sparsening of odor representations in the mushroom body,\n\nScience 297:359, 2002.\n\n[6] Leitch B, Laurent G. Gabaergic synapses in the antennal lobe and mushroom body of the locust\n\nolfactory system, The Journal of comparative neurology 372:487, 1996.\n\n[7] Papadopoulou M, et al. Normalization for sparse encoding of odors by a wide-\ufb01eld interneu-\n\nron, Science 332:721, 2011.\n\n[8] Jouquand C, et al. A sensory and chemical analysis of fresh strawberries over harvest dates\nand seasons reveals factors that affect eating quality, Journal of the American Society for Hor-\nticultural Science 133:859, 2008.\n\n[9] Stopfer M, et al. Intensity versus identity coding in an olfactory system, Neuron 39:991, 2003.\n[10] Foucart S, Rauhut H. A mathematical introduction to compressive sensing. Springer, 2013.\n[11] Boyd SP, Vandenberghe L. Convex optimization. Cambridge University Press, 2004.\n[12] Dayan P, Abbott L. Theoretical Neuroscience. Massachusetts Institute of Technology Press,\n\n2005.\n\n[13] Neal RM, Hinton GE. In Learning in graphical models, 355, 1998.\n[14] Ng M, et al. Transmission of olfactory information between three populations of neurons in\n\nthe antennal lobe of the \ufb02y, Neuron 36:463, 2002.\n\n[15] Mazor O, Laurent G. Transient dynamics versus \ufb01xed points in odor representations by locust\n\nantennal lobe projection neurons, Neuron 48:661, 2005.\n\n[16] Stopfer M, Laurent G. Short-term memory in olfactory network dynamics, Nature 402:664,\n\n1999.\n\n[17] Murthy M, et al. Testing odor response stereotypy in the Drosophila mushroom body, Neuron\n\n59:1009, 2008.\n\n[18] Caron SJ, et al. Random convergence of olfactory inputs in the drosophila mushroom body,\n\nNature 497:113, 2013.\n\n9\n\n\f", "award": [], "sourceid": 1214, "authors": [{"given_name": "Sina", "family_name": "Tootoonian", "institution": "University of Cambridge"}, {"given_name": "Mate", "family_name": "Lengyel", "institution": "University of Cambridge"}]}