
Submitted by Assigned_Reviewer_4
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: This paper attempts to link sparse optimization methodology to the anatomical structure of locust's early olfactory system. The work is motivated by the observation that odorant molecules are sparsely represented by the population of Kenyon cells. The authors first mathematically formulate the olfactory system as a MAP decoder, and give the standard solution to the problem without considering biological constraints. Next, to make the solution more biologically plausible, the authors reformulate the olfactory system model as a decoder of a compressive sensing problem, and provide two standard solutions to the dual problem. Then, the authors argue that each of the components in the solution can be mapped/interpreted to/as a unit of the biological structure in the olfactory system. However, these maps are described without a strong justification and there are conceptual problems in linking the math with the biology.
Further clarification on the following points would help readers to gain better insights:
 While LNs and KCs are modeled using a Heaviside function, modeling that is widely accepted in the literature, the PNs output is modeled as the Lagrange multiplier. Such modeling is not intuitive, and needs to be justified. In particular, under such modeling, the output of PNs is computed by the gradient decent method in Eq. (10) for the full dual algorithm, and in Eq. (16) for the reduced dual, which would require iteration to reach an optimal solution. However, it is not clear how PNs perform the gradient decent method and the iteration mechanism. The major problem, however, is that the solution to eq. (10) might be negative and the PN output rate (lambda) has to be positive!
 The B matrix is proposed to replace the A matrix so that the role of LNs can be accommodated into the solution. However, it is not clear why the solution to Eq. (12) (with the B matrix) is the same as the solution to (10) (with the A matrix). It needs to be shown that the lambda computed by Eq. (12) converges to the one computed by Eq. (10). Also, the B matrix is square, implicitly indicating that the numbers of the PNs and LNs is the same. However, this is not true since there are more PNs than LNs. What would be the performance of the reduced dual circuit if the B matrix is not square but rectangular, and the ratio of its two dimensions follows the ratio of the numbers of LNs and PNs?
 More on the B matrix. The B matrix is not fixed, and needs to be learned every time a new input is given. As shown in Fig 3, the proposed method can not always learn the B matrix faithfully. What is the performance of the reduced dual circuit when the B matrix is poorly learned? Furthermore, B needs to be learned before being used for readout by KCs. Although, the authors depicted a procedure to deal with this issue of combing learning and readout, there is neither a justification nor simulations of such a procedure, and it is hard to evaluate its correctness.
Finally, the authors suggest that the connectivity between the PNs and the KCs is, at least structurally, related to the A matrix. Now, in the recently published article: Caron, S.J.C, Ruta, V., Abbott, L.F. and Axel, R. (2013) Random Convergence of Afferent Olfactory Inputs in the Drosophila Mushroom Body. Nature 497:113117, it is argued that the same connectivity is random. Is there a reason to believe that this does not also apply to the locust?
Originality: Attempting to bridge an optimization method to the neural circuit of insect's olfactory system.
Clarity: The paper seems at times to be prepared in a rush. The main message is not clear at the first glance.
Quality: The authors need to demonstrate that the mapping between the sparse optimization solution and neural circuit makes sense, and the proposed methods indeed solve the decoding problem.
Notes: line 43: continuosly > continuously line 155: multpliers > multipliers line 233: inhibtory > inhibitory line 234: odorevoekd > odorevoked line 259: hte > the line 314: intepreted > interpreted line 348: perfomance > performance, in correct > incorrect line 351: (B) > (C) line 352: matrics > matrices line 398: out performs > outperforms line 418: alotted > allotted
Q2: Please summarize your review in 12 sentences
The paper falls short both from a mathematical point of view and from the questionable linking of the math with the biology. Submitted by Assigned_Reviewer_6
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
NOTE: Due to the range in reviews, we were asked to discuss this paper. As a result of that discussion, we'd like the authors to address the three following points:
1) How are the firing rates (or some other physiological measure of activity) mapped to the variables y and in particular lambda. Lambda  in the current formulation  has no bounds. How does this affect the interpretation.
2) There is a need for "negative reciprocal" connectivity between the PNs and LNs (this comes out of the use of B and B_transpose) in Equation 12. Similarly there is a need for the same symmetry between the measurements of the sparse vector x and the projections up to the KCs. Is there evidence of this symmetry in the locus anatomy or physiology? Can the model work without it?
3) So far the main evidence that this model describes the olfactory circuit in locust is that the odor input is sparse in terms of molecules per input, somewhere along the way the sparse signal is measured by the A matrix such that it is projected into a lowerdimensional, denser representation. (It should be clarified where this happens... we believe it happens on the level of the ORs such that each OR responds to several molecules and each molecule binds with several ORs). Then there is some processing in this lowerdimension via the B matrix (PNs and LNs) which is then projected back up into a higherdimension through A to the KCs. We feel that while there is some structure of the biological circuit in the model, this paper needs to provide more evidence. Is there evidence of gradient descent dynamics? Is there evidence in the plasticity/processing between LNs and PNs that can be mapped to the form of ICA presented here?
THE TEXT BELOW IS MY ORIGINAL REVIEW:
Firstly, I love seeing a theoretical paper which attempts to model a very particular circuit. Typically, we see attempts to create general neural networks or to formally characterize systems which are not well characterized from a biological/experimental point of view. Here we have the opposite, which seems like a more fruitful approach, that is if one's goal is to understand nervous systems.
In this paper, the authors derive a mathematical model of processing in the locust olfactory system. The input is a bit vector representing an odor where each entry encodes the presence or absence of particular molecules. Because many ORNs each with the same OR converge on a single glomeruli, the input can be taken as noisefree. In the lowdimensional glomerular layer, the mixed signals from the ORNs interact with each other via local interneurons and PNs. Finally, the PNs then project up into the higherdimensional space of KCs where each KC codes for a particular molecule in the input.
The authors first describe the problem of recovering the sparse, highdimensional input via the KCs as a MAP optimization, but then reformulate the problem in the compressed sensing context. Next, the dual of the compressed sensing problem is presented as a model since it can more easily be interpreted in terms of the locust nervous system. This model is then changed again to further take knowledge about the biological circuit into account by ensuring that any dynamics happen on the Mdimensional signal. This requires the circuit to learn an Mdimensional subspace of the full Ndimensional space to operate in which is implemented using ICA.
This paper is highly original in the way it brings together the compressed sensing problem, ICA, and a particular neural circuit. Also, the presentation of the model is clear and easy to follow. However, I would like a clearer explanation of the biological circuit and how it maps back to the model. Is it that each OR can bind with several molecules such that the glomeruli each represent a mixture of molecules? Then the PNs readout the glomeruli and also take part in the dynamics which ultimately allows the readout by the KCs which sample a mixture of PNs? The basic question is where exactly does the mixing happen. After reading this paper and reviewing references, I'm still not completely certain. I think where I get lost is whether each OR binds with several or only one molecule. It would be helpful to really nail down how the model and biology map to each other since this paper's strongest aspect is that it models a particular circuit. Maybe to make room you can condense the sections where you present models that are ultimately not used.
As it stands, the current model is very interesting and I believe that with some clarifiations it is motivated by the biology. However, the real key to making this a significant paper would be to show that the dynamics of the model (either the gradient descent dynamics or something about the ICA dynamics) match the dynamics of the true circuit. So far you've shown that this model can perform sparse regression/compressed sensing and that it has some structural similarities to the biological circuit. I'd like to see a figure that compares a simulation of this model with the true circuit even if it's only that individual neuron's responses match physiological data. Or maybe there is evidence of an ICA implementation in the locust olfactory system? Do they dynamics of ICA seem comparable to any behavioral measurements from locust? Or maybe the ability to discriminate odors wouldl show that performance was more like the reduced dual than the full dual circuit? I would love to see something like this addressed in a revised version of this paper or in a future paper. Q2: Please summarize your review in 12 sentences
This paper presents a novel model for a particular circuit  the locust olfactory circuit  which combines ideas from compressed sensing and ICA. While stronger evidence that this model in fact captures the physiology of the biological circuit would greatly increase the significance of the work, it would still be of interest to the NIPS community as it stands. Submitted by Assigned_Reviewer_21
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary:
The authors study the olfaction circuit in the locust. They propose that KC neurons reconstruct estimates of odor vectors via dynamics in the antennal lobe. Specifically, they propose that LNPN connections are updated via ICA, while the PN dynamics are determined by gradient descent of the dual of a compressedsensing optimization problem.
They show performance of this "reduced dual" circuit offers reasonable performance compared to a simple feedforward circuit.
Quality:
The reduced dual circuit is an intriguing hypothesis. If true this would be an exceptional paper. The mathematical development flows from 1) basic MAP inference for identifying odors to 2) lowdimensional dynamics of the problem's dual to 3) the reduced dual circuit and finally to 4) learning LN representations by ICA. This stepbystep development is elegant and is thus very appealing.
However, the plausibility of this hypothesis does not seem sufficiently explored. The paper contains no compelling comparisons to biological data. PN neurons are known to exhibit transient dynamics in response to odors. Do the dynamics of eqn 12 yield sensible dynamical trajectories? Can anything be said about physiological/behavioral adaptations of locusts to new odor environments  and whether such experimental observations comply with expectations of eqn 16? If no such data are available, it would be useful to discuss which expectations of the reduced dual circuit can be tested with existing data and which are currently hypothetical. Overall, these concerns are underdeveloped in the paper.
What is the motivation to focus specifically on the locust? It seems that the locust is only constraining the choice of M and N.
The paper has one unfinished figure (line 294)
Clarity:
In section 4 some notation is confusing. x is overloaded to represent both the odor vector and the KC estimate of the odor. I believe this is motivated by developing the general mathematical result first and adding biological mappings later, but it was slightly confusing. Another example: A represents both the ORN>glom map and the PN>KC map.
Otherwise the paper is clear and wellwritten.
Originality:
To my knowledge this is an especially original paper.
Significance:
The significance of this study seems to rely on experimental verification. Conditioned on that, I think it has potential to be a highimpact paper.
Q2: Please summarize your review in 12 sentences
Simple biological constraints motivate an alternative formulation for a standard optimization problem (compressed sensing) in locust olfaction. The development of these ideas is elegant, but the paper would be strengthened by verifying expectations of the model with current experimental data. Submitted by Assigned_Reviewer_37
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposed an inferential and readout circuit model for decoding odor in locusts. In this circuit, the PN and LN neurons in the antennal lobe engage in inferential computation and the Kanyon cells in the mushroom body perform readout of the distinct odors. Biological data were taken into consideration in the development of a mathematical model of the circuit's computation, leading to the formulation of the reduced dual circuit, which uses a recurrent circuit between PN cells (lambda), glomerulus activity (y) and LN neurons, which implement the B matrix, to infer and represent the odor in distributed form. The projection from PN activity to KC allows the odor to be read out in sparse representation, with one KC represents roughly one or a few odors. Although some of these ideas about the olfactory circuit have been suggested earlier by G. Laurent and colleagues. This might be the first time the entire computational circuit is put together and mapped into a plausible functional and mathematical form. The insight of using PN and LN cells in recurrent circuit to perform inference is forced by a serious consideration of the biological facts. This model thus provides significant insight into our understanding of olfactory computation, with a fairly plausible and rigorous mathematical framework. The hypothetical connections from LN to PN and their relationship to the projection from PN to KC also provides concrete testable predictions of the model. While parts of this dense paper might need clarification, the work is original and important. Q2: Please summarize your review in 12 sentences
A novel mathematical model of the olfactory computation circuit that takes both computational functions as well as biological facts into account. It provides deep insight into the computations of the olfactory circuits. Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point. We thank the reviewers for their detailed feedback which we found very useful for improving the clarity and the potential impact of our submission. Below we address the reviewers' comments. We've answered questions specific to a single reviewer individually. Several reviewers asked for clarification in the mapping to the biological circuit, which we will provide in our revised submission. Other questions common to several reviewers are addressed in our response to the Meta Review.
* Reviewer_21 1. "What is the motivation to focus specifically on the locust?"
We selected the locust for concreteness. We expect the same principles to apply to other insects since the relative disparity between M and N is the same.
* Reviewer_4 1. "not clear how PNs perform gradient descent"
The PN dynamics themselves (eq. 10/12) perform gradient descent, since the rate of change of the PN activity vector is proportional to the gradient of the dual Lagrangian. When the dynamics reach a fixed point, the solution has been found. We will clarify the mapping of gradient descent to dynamics in our submission. A mapping of these dynamics in the model to experimentally measured time courses of PN activities is a logical extension of our work  we will expand on this in section 8.1 where we already make some relevant suggestions (see also point 5 and meta review point 3 below).
2. convergence of lambda
The solutions of eq. 10 and 12 are similar but not identical, otherwise the performance of the full and dual systems would be identical, since readout by KCs is performed identically for both. We will further describe the relationship between the two sets of solutions in our updated submission.
3. square B matrix and the numbers of the PNs and LNs
We assumed a square B matrix mainly for mathematical convenience. The performance of the reduced dual circuit using the actual ratio of LNs to PNs would be the same, since all that is required is that that B contain the columns of A that were combined to produce the observations y. Since the sparseness assumption makes the number of columns that were combined low (<10), this requirement can still be met. Nevertheless, this is an important point and we will discuss it in our updated submission.
4. performance of the reduced dual circuit with poorly learned B matrix
ICA as used in our paper can learn the B matrix up to permutation and sign ambiguity. The latter ambiguity is caused by our relaxation of x from binary to real valued when performing ICA. We've performed further work since our submission which allows x to retain its binary nature during ICA, resolving this ambiguity and allowing B to be learned perfectly. We will include this updated implementation of ICA in our final submission.
5. justification for our suggestion for combining learning and readout
Our suggested procedure exploits changes in the electrical properties of LNs during a cycle that would facilitate ICA updates and readout. The viability of this scheme would require a mapping of this work to spiking neurons, which is an important goal for future work (see also point 1 above, and meta review point 3 below)
6. relation to Caron et al. 2013 demonstrating that PN to KC connectivity is random
This is a very interesting point that we also considered before submission. In fact, the same analysis used by Caron et al applied to a binarized version of our connectivity matrix would also yield random connectivity, as we will show in the supplemental information for our revised submission, so there is no inconsistency between the two lines of work.
* Meta Reviews 1. mapping of firing rates to variables
In practice, neither the glomerular activity y nor the PN activity lambda grows very large due to the effects of averaging random tunings, and the increased inhibition by LNs, respectively. The fact that these rates can be both positive and negative while the physiological rates are strictly nonnegative can be interpreted as deviations from a baseline rate, as is common in theoretical models (see e.g. Dayan and Abbott 2001). We will include a discussion of this point.
2. "negative reciprocal" connectivity between the PNs and LNs
Only structural information is available about the PN to LN connectivity, specifically that the average PNtoLN and LNtoPN connectivity probability is 5070% (Roni Jortner, personal communication). This is consistent with negative reciprocal connectivity, but experiments must be performed to measure the strength of the connectivity directly. There are also no data available on the tunings of receptors in the locust, and the symmetry between these and the projections to the KCs is a key prediction of our paper. Future work will determine whether these symmetry conditions can be relaxed. We will make these predictions crisper in our dicussion.
3. evidence for gradient descent dynamics / additional biological evidence for the model
The projection to a lower dimension is indeed carried out at the ORs via the affinity matrix, as the reviewers describe. Figure 2D shows temporal patterning in the evoked firing rates of some PNs, consistent with, though not to the same degree as in real PNs. The current evidence for gradient descent, ICA, etc. is very limited because experiments have not explicitly looked for these computations yet, and a contribution of our work is to make specific, experimentally testable predictions regarding these. Supporting evidence for gradient descent is that upon extended odor presentation the PN activity vector settles to a fixed point (Mazur, Laurent 2005). There is also evidence for significant plasticity in the AL (Stopfer, Laurent 1999), as required by our proposal, but whether this is consistent with what we predict for ICA has yet to be tested. We will highlight the current biological evidence and the future experiments needed to test the predictions of our model in the discussion.
 