NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4223
Title:Spike-Train Level Backpropagation for Training Deep Recurrent Spiking Neural Networks

Reviewer 1

Mein Concerns: The main motivation of the paper, to solve Backprop in spiking neurons, is not an open problem in computational neuroscience. In fact, learning in spiking neural networks using standard methods is not a problem at all as recent work shows. It has been demonstrated multiple times that Backprop can be applied without much changes by applying pseudo-derivatives to circumvent the non-differentiable spikes. See: [6-8]. This works very well in practice and scales up to midscale benchmark problems (and possibly beyond) without performance loss compared to classical (analog) neural networks. In this context it hard to pinpoint the main innovation of the manuscript. The presentation of the model is mixing existing ideas with details that are original to the present paper. For example the outline of the spiking backpropagation in Figure 2 is very hard to decode. It is unclear what the dashed lines represent without additional knowledge. A caption, labels and/or legend would help. The figure is also not sufficiently explained in the main text. The derivations on page 4-6 could be compressed a great deal since they are very standard. Additional details could be pushed to a supplement making space for the relevant details. Finally it is unclear why the proposed model is more biologically plausible than previous models. The learning rules that are presented are applications of standard Backprop to spiking networks. This was shown before. For example, also the model presented in ref. 4 in the manuscript could be applied to spiking recurrent networks (not just LSTMs). Other recent approaches that use feedback weights (synthetic gradients) for deep learning in SNNs [1,2] and their recent extension to recurrent neural networks [3], don't need unrolling the network over time. These prior related work should be discussed and cited. What is the advantage of the proposed model compared to this prior related work? References: [1] Lillicrap, T. P., Cownden, D., Tweed, D. B., and Akerman, C. J. (2016). Random synapticfeedback weights support error backpropagation for deep learning.Nature communications,7:13276. [2] Samadi, A., Lillicrap, T. P., and Tweed, D. B. (2017). Deep learning with dynamic spikingneurons and fixed feedback weights.Neural computation, 29(3):578–602. [3] Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets Guillaume Bellec, Franz Scherr, Elias Hajek, Darjan Salaj, Robert Legenstein, Wolfgang Maass.

Reviewer 2

The paper introduces a method of applying back propagation for fitting parameters of spiking neural networks. The approach relies on firing rates of spike trains to define the errors and uses what seems fairly similar to standard backpropagation with some exception for the recurrent case. I find two main issues in the paper: 1. In event based processing, you typically want to treat events dynamically. The proposed algorithm treats a chunk of data at a time since it needs a history of spikes to define the error and therefore is not really event based. 2. This point is probably rephrasing the former in some way. Essentially it appears that the algorithm is not a new back propagation algorithm but rather a new model for spike data that allows for processing with standard back-propagation (except maybe in the recurrent case). Therefore, in terms of separating data model from optimization/fitting method the work is not presented clearly. Questions: You cite the paper [23, 38, 4, 33] for providing competitive performance using a similar approach to yours however it appears that you do not compare with any of them. Why is that? Please clarify or modify. Minor Comments: Probably the exposition of your algorithm can be simplified by some restructuring so as to avoid superscripts/subscripts. It just seems too verbose at the moment. Update after review: The rebuttal clarifies considerably both in terms of a comparison with BPTT and for the event based processing. I will modify the score accordingly

Reviewer 3

The paper addresses the problem of training spiking neural networks (SNNs), in particular recurrent SNNs, with a new form of backpropagation (BP). BP on SNNs is hard because of temporal effects and the discontinuous nature of spikes. The topic of deriving BP rules for SNNs has received increasing attention in the previous years, and the most relevant works are referenced. The present paper is valuable in extending the scope to recurrent networks, and giving more attention to temporal effects, which are neglected in other approaches. So the topic itself is not original, but it is a valuable extension of the state-of-the-art. The main contribution is the ST-RSBP learning rule, which backpropagates over spike trains, rather than unrolling recurrent networks completely over time. Spike-train level Post-synaptic potentials (S-PSPs) are introduced, which accumulate the contributions of the pre-synaptic neuron to the PSP right before the spike-time of the post-synaptic neuron. From this, an approximation of the spike count is derived, which in turned is used as the basis for the backpropagation derivation. The following remains unclear to me: - The BP algorithm requires a desired output firing count y (in (3)). How is this determined? Is there a constant firing rate for the correct output assumed, and if yes, what is it? - When is the BP update performed? Is it a batch-mode update, or can online learning be performed as well? - How suitable is the learning rule for hardware implementations of SNNs in neuromorphic or digital platforms? - How are firing thresholds set? Section 4.1 says it is "depending on the layer", but does not give more results. Experiments are performed on 4 datasets, where in every case an improvement over previous BP-based SNN training methods is shown. There is also a comparison to non-spiking approaches, but I am not sure how the authors picked the references. For example for Fashion MNIST there are much better solutions available than the ones reported here, there are even multiple tutorials that show how to reach 93% or more. - I strongly recommend also reporting the non-spiking state of the art for each example to give a fair comparison and not over-sell the SNN results. - I am not sure why a recurrent model was chosen for the (static) Fashion MNIST dataset. Overall I think this is a good contribution, and the presentation is OK, although a lot of the derivations can only be understood with the help of the supplementary material. Recurrent SNN training is not a major topic at NeurIPS, but is of interest to the community. As a final suggestion I would recommend re-working the abstract, because it tells a lot of very generic facts before discussing the actual contributions of the paper. =================== Update after author feedback: I want to thank the authors for addressing my and the other reviewers' questions in their author feedback. My main concerns have been addressed, and I think this paper should be accepted, accordingly I will raise my score to 8. Still, I think there are some improvements to be made, as addressed by the reviewer concerns and the author feedback: - please make explicitly clear that you are not addressing CNN-type networks - I think it would be great if you can include an outlook on the hardware implementation on FPGAs, because I think this is an important point for this rule - Please make the original contributions compared to existing spiking backprop variants clearer - Please include all hyperparameters (thresholds, learning rates, ...) also in the supplementary material, and not just in the code.