Paper ID: 870
Title: Perfect Associative Learning with Spike-Timing-Dependent Plasticity
Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors consider associative learning in networks of spiking neurons, and argue that a form of STDP with postsynaptic hyper-polarization is equivalent to the perceptron learning algorithm. The basic form of STDP proposed by the authors relies on traces (similarly to Morrison, Diesmann & Gerstner, “Phenomenological models of synaptic plasticity based on spike timing”, Biol Cybern, 2008, 98, 459-478, which should have been mentioned here), and allows for both potentiation and depression of the synapse. The authors then introduce the perceptron learning rule (PLR) for binary variables, in a form where the weighted sum of inputs is compared to a threshold in order to determine the update. As is well known, the PLR is a supervised learning algorithm requiring a target to be specified at the post-synaptic site. Therefore, when the authors move to establish the equivalence of the PLR to STDP they demand that the post-synaptic neuron is forced to spike for a target of 1, and prevented from spiking when the target is 0. Using a hyper-polarization mechanism, and taking into account a second post-synaptic event, they show that the supervised PLR can be translated into an unsupervised update rule, given in eq. (14). Finally, the authors introduce an additional depression term into the STPD update, in order to mitigate the effects of post-synaptic spikes following inputs which should not lead to a spike.
Overall the authors have presented an interesting approach to mapping between physiologically motivated models such as STDP and their computational interpretation within a supervised learning context. As such the paper is interesting, and, as far as I am aware, novel. Several comments/questions are in place.
(1) The authors refer to learning throughout the paper. However, they do not address learning but rather memorization, as there is no discussion of generalization beyond the training set. Note also that the PLR depends on initial conditions, as it terminates update once perfect classification is achieved. (2) The authors take care to introduce the various time scales (dendritic, axonal) but no discussion is devoted to their impact on performance. Moreover, it is not clear to me how robust are the result in section 4 to the choice of these parameters (and others of which there are many). (3) The algorithm is defined in a context where the target for each neuron is known, even though it is translated into an associative form using a newly suggested form of hyper-polarization. It is not clear to me how the effects of the network would influence performance. In this sense, it seems that reward based update rules, based on the so-called three-component Hebbian updates (which have been experimentally supported), would be more effective, as they directly relate to a global error signal, absent in the present formalism.
Q2: Please summarize your review in 1-2 sentences
A spike based learning algorithm requiring post-synaptic hyper-polarization is presented which is claimed to be equivalent to the perceptron algorithm. This suggests an interesting functional interpretation to the suggested plasticity.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper presents a learning rule for a simple spiking neuron model
based on STDP, where the STDP for excitatory synapses is anti-Hebbian
and STDP for inhibitory synapses is Hebbian (i.e., it is reversed in
terms of the classical rule). The authors provide a proof that the
learning rule implements the Perceptron Learning Rule (PLR) in a
specific setup with synchronous input spikes and a kind of teacher
forcing. They also show that a modified version of the rule is able to
learn spatio-temporal spike patterns to some extend, relating it to
tempotron learning. The authors claim that this learning scheme is
biologically plausible.

General assessment:

Quality: The paper addresses an interesting topic. There seems to be a
small bug in the theory, but this does not cause a big problem. One
problem I see is that the manuscript does not contain a comparison
to the simplified tempotron learning rule which is in my opinion at
least as biologically plausible as the proposed learning
scheme. Another concern is that the simulations for spatio-temporal
patterns are done with a modified neuron and plasticity model with
some quite ad-hoc modifications. So it is unclear how the initial
theoretical investigation corresponds to that.

Clarity: The paper is reasonably well written, but clarity could be
improved substantially (see below).

Originality: The approach is quite probably novel, although there
exists some previous work that has not been cited.

Significance: The paper seems to be of some relevance. Since the
authors do not point to any specific system where this setup could be
implemented, I currently do not see any biological evidence for it
apart from the fact that anti-Hebbian STDP has been reported.

Specific points:

1) Clarity of the presentation
1.1) In line 90, it is stated that neurons are recurrently connected. Yet, I don't' see any place in the paper where this is the case. If I understand correctly, both the theory and the simulations consider simple feed-forward architectures. This is confusing as well as an overstatement and should be corrected.
1.2) It is unclear to me why the authors first introduce the PLR with a {-1,1} coding and then switch to the {0,1} coding. This is just confusing since additional variable-symbols are needed. Just start with the {0,1} coding. I believe that in many text-books it is defined in this way anyways and the mapping between the codings is close to trivial.
1.3) Figure 2 is misleading. The PSPs in Fig. 2 (red traces) are alpha-shaped with a finite rise time. This is misleading since according to the definition the PSPs should really be single exponentials. It is confusing because it does not match the definitions of ~h and ~x in lines 184-186.
1.4) Parameters alpha and beta are used in two different contexts in lines 189 and 310-311.

2) Formal proof of equivalence
There is one point which maybe I did not understand correctly.
I understand that the margin \epsilon for the case of a desired output spike (y_0=1) can be achieved by the after-spike depolarization. This means that during training, because of the teacher spike, the membrane potential has to be somewhat larger than during testing.
But I do not see how the analogous case works if the neuron should not spike (y_0=0). In this case, the depolarization needed to elicit a spike during training is exactly the same as during testing. Hence, the neuron cannot learn to bring the depolarization below (threshold-margin). Therefore, using "s" in the right term of the equation in line 204 is probably not correct. It is again misleading that in line 198 the authors suggest that beta can be chosen freely. I think this is not the case.
I actually see no way to correct for this. The only thing that can be done is to state that the margin for the PLR can only be learned for positive examples (i.e. y_0=1).

3) Comparison to tempotron learning
The authors compare their learning rule to tempotron learning in the text, but they do not exhibit an experimental comparison. First, the authors state that the tempotron learning rule is biologically unrealistic (line 55), but this applies only to the full rule, not to the simplified one. In order to see whether the current rule has any advantages over that one, an experimental comparison is needed (possibly also a comparison to the full tempotron rule to see how much one looses).

4) Learning of spatio-temporal patterns
4.1) Several modifications to the original setup are used here which are not well motivated. For example, it is unclear why the hyperpolarization variable nu is introduced in line 284. I suspect it is needed because one wants to avoid that the output neuron spikes more than twice. Also the modification of the learning rule eq. (18) is not clear. It looks like a hack and a assume that the parameters have to be tuned carefully if something changes in the input statistics. I did not understand the explanation in lines 300-303.
4.2) Related to the last point. I am not sure whether the initial setup works for spatio-temporal patterns (therefore probably the modifications). The reason is that the second spike of the output neuron acts as a teacher for the following spikes. So even if the desired output is y_0=0, if the neuron erroneously spiked at time t, for input spikes at t'>t, it looks like this was actually a positive pattern. What is the value for "T" in line 320? This value deems critical to me.

5) Relevance
The learning setup is that the trained neuron is forced to fire at the same time as the synchronous inputs or before a spatio-temporal input pattern. The former condition (for synchronous input) seems somewhat plausible in some contexts, the latter much less so. In general, if the teacher spike sometimes comes after the input spikes (which seems to be the more plausible situation in vivo) then the system breaks down.
Anti-Hebbian STDP has been reported for distal excitatory synapses in pyramidal cells, but on the other hand ref [10] of the manuscript also reported that depolarization would abolish that or make it Hebbian. This would quite probably destroy the properties needed here.

Minor points:

6) Literature:
The authors may want to compare their rule to ReSume (e.g. Ponulak and Kasinski, Neural Computation 2010) and discuss the relation to the investigations in (Legenstein et al., Neural Computation 2005).

7) It is a bit problematic to define the PLR over gradient descent on eq (7) since this error is not differentiable at E=0.
8) In line x_0^i >= 0 is used, later x_0^i \in {0,1}.
9) line 360: "biologically highly plausible" is an overstatement.
10) line 367: "substantial capacity". This is not clear. What is a substantial capacity in this case? Comparison with tempotron capacity etc.
Q2: Please summarize your review in 1-2 sentences
The paper addresses an interesting topic. One
problem I see is that the manuscript does not contain a comparison
to the simplified tempotron learning rule. Some simulations are done with a neuron and plasticity model with some quite ad hoc modifications. The paper does not contain much information to assess the biological relevance of the proposed learning scheme.


Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This is an interesting and well written paper describing how the reverse spike-time dependent plasticity in combination with spike afterhypolarization can provide a biological mechanisms for both Tempotron and Perceptron learning. This approach has two advantages. First, it does not rely on an explicit teacher signal as in Ref. 12. Second, it does not require neurons to solve the credit assignment problem with respect to the contributions ot its membrane potentials at a certain time in the past (as in the original Tempotron paper). No weaknesses are noted.
Q2: Please summarize your review in 1-2 sentences
A well written and interesting paper describing the importance of reverse spike time dependent plasticity rules for learning. These mechanisms in the brain (refs 9-11) for both inhibitory and excitatory neurons. This paper helps suggest a computational function for these empirical observations.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their feedback, especially reviewer #2 who provided a lot of detailed and helpful critique. Reviewer #2 pointed out an error in the proof of equivalence: for y_0 = 0 no margin is learned. We found a surprisingly simple solution to this problem: Introducing a subthreshold LTD into the learning rule establishes the margin also for y_0 = 0, as well as unifies the learning rules used in both sections of the manuscript.

We now adress the other questions in order of their appearence.
To reviewer #1:
(1) The learning rule we present inherits all the properties of the PLR, and terminates as soon as the margin is reached. The margin assures a certain generalization, which is independent of initial conditions.
(2) \tau_a and \tau_d do influence the margin, as they set \tilde x and \tilde y. As long as \tau_pre and \tau_post are much larger, the influence of \tau_a/d is neglible. Originally, we did not check robustness against variation of parameters in section 4. In later simulations, however, we now find that learning is quite robust.
(3) Learning in many neuronal systems certainly depends also on reward signals. Here we demonstrate the general possibility of pure associative learning with a biologically realistic mechanism and provide a strict proof of equivalence to the Perceptron Rule for a specific case. It would be very easy to also include some modulatory signal. However, this would obscure our central point. We decided to remove the mention of recurrentness in section 2, as we only employ feed-forward networks. This should remove confusion regarding network effects.

Impact Score: To our knowledge this is the first time the artificial PLR was realized with plausible neurobiological mechanisms (see rev. #2 and #3).

To reviewer #2:
General assessment, Quality:

Provided that the reviewer refers to the implementation using voltage convolution by the simplified tempotron learning rule, we admit that we did not treat this rule. In fact, even though the capacity is reduced by a factor of ~2, it is still higher with this rule than in our realization of tempotron learning. However, we do not agree on the biological plausibility. In the simplified tempotron rule, the teacher signal is unspecified and presumably convoluted, while in our rule this is provided by a simple spike with subsequent hyperpolarization, which is a well-known phenomenon.

Regarding the modifications in section 4: Having the subthreshold LTD improves convergence for the tempotron-like learning, and as mentioned above we now use it in the PLR proof as well. We decided to seperate hyperpolarization and membrane time constant to allow for substantial weight changes over the whole interval (T = 500 ms, which is not critical), while keeping a realistically shaped EPSP.

Significance: Our learning rule is a novel concept that depends on the interplay of a synaptic plasticity rule and somatic dynamics. Experimental studies in general treat those mechanisms seperately, and even if considered in conjunction in plasticity experiments, the membrane potential is simply clamped at a fixed value. This necessarily leads to a neglect of dynamic effects. Therefore, if successful, our idea should spark new research looking into this interplay. Also, it underlines the relevance of core biological mechanisms (STDP, hyperpolarization) for learning in real neuronal systems.

Specific points:
1), 1.1) to 1.4) are correct, and we will change them in the revised version.

2) See above.

3) The parameters in section 4 (EPSP shape, T) are very similar to those used by Gütig and Sompolinski, 2006. We did this to make the results comparable. Our capacity is close to 1/4 N, which is about one order of magnitude from the 3 N, the original tempotron capacity, and therefore also smaller than with the simplified rule. However, .3 N is still substantial (sufficiently different from zero), e.g. if compared to the capacity of the Hopfield network.

4) 4.1) For \nu and subthreshold LTD, see above. We will provide an improved explanation.

4.2) and 5) After submission we found that teacher spikes could hold any fixed position in the interval. This condition also lifts the requirement of a long hyperpolarization. With these changes, and the subthreshold LTD, in simulations the learning is quite robust against the choice of parameters. We suppress erroneous second spikes by a strong bias towards LTD in this setup.

We thank the reviewers for the literature suggestions. We will look into these studies and discuss them as necessary.