Reviews: Universality and individuality in neural dynamics across large populations of recurrent networks

UPDATE after rebuttal: authors have addressed some of my concerns, so I'm updating my score to 8. To summarize, this paper aims to shed light on the connections between artificial recurrent neural networks and biological networks, in order to gain insight into neural circuit functionality through studying RNNs. More specifically, the paper comments on the ability for RNNs to mimic the behavior SNNs and neural recordings despite a vast difference in their inherent architectures. Such a phenomenon may suggest neural invariants, which act universally (in the context of a task) across either all RNN and SNN architectures, or broader groups containing various architectures in each. The paper does not look at neural recordings or SNNs, but instead trains 96 RNNs of various combinations of architectures, activations, network sizes, and L2 regularizations on three separate tasks (discrete memory, pattern formation, and analog memory) common to computational neuroscience. For each task singular value canonical correlation analysis (SVCCA) and MDS are used to determine the representational geometry of the RNNs and a numerical approach to dynamical systems analysis (and again with MDS) is used to gain insight into the topological stability structure. Three low-dimensional tasks are used: the 3-bit flip-flop is 3D, sine wave is 2D, and CDI is 2D. Given that these intrinsic structure of the problems, it is not surprising that the trained RNNs find low-dimensional solutions. Details about the training accuracy is not provided (please do). I'm assuming all RNNs achieved near 100% performance? The effectiveness of analysis (SVCCA & fixed point graph) are only presented visually using MDS. Although this is convincing in most cases, it is qualitative only. I suggest computing some sort of similarity metric that summarizes these results. (not suggesting any hypothesis test) line 193: MDS is very weak support of this claim Since Vanila is a special case of all the rest (which should be mentioned), any difference found indicates non-trivial difference in the solutions. Yet, the analysis suggests that there is little difference except in CDI. If Vanila can solve the task (I don't know since the performance is not reported in the paper), these differences are due to which solutions are easier to train from the initial conditions, right? Does the gated RNNs diverge from the tanh RNN solution if initialized close by? For the sine wave task, was there always just 1 fixed point per fixed input? If that's not the case, how would one create a topology of fixed points? In theory, the RNN can find solutions without input dependent fixed point, right? Also, the key feature here is the limit cycle, not the fixed point. This seems to be a major limitation of this method. Fig 2f does not mean much to me. Am I missing something? Why is this interesting? line 212: what is the implication of Fig 2e? For CDI analysis, a line attractor is presumed for the analysis. If there was always one fixed point per condition, it is very likely that every network would give identical "topology" due to the task structure, not due to what the network learned. Fig 1c needs colorbar Fig 1d at first glance I couldn't tell that these are two different MDS visualizations Fig 1e fixed point graph is too small to see line 86: Supplement line 156: Fig. 1b should be 1c line 238: "These results show that the all", line 220: "of architecture (right) and activation (left)" these should be reversed

Reviewer 2

POST REBUTTAL UPDATE: I thank the reviewers for their detailed rebuttal. I agree with the argument for CCA usage. Artificial neural networks – both feedforward and recurrent – are increasingly being used in neuroscience as hypothesis generators. That is, networks are trained on cognitive-like tasks, and then aspects of the artificial network are compared to biology. Despite the increasing usage of this approach, there are very few systematic efforts to understand the “rules of the game”. The present contribution is thus a very timely systematic investigation of which aspects of trained RNNs are variant and which are invariant. Specifically, the authors compare a “tensor” of (architecture * task * analysis method). They conclude that network geometry, as assessed by CCA is highly architecture dependent. In contrast, fixed point topology is mostly task dependent. Major comments: 1. The tasks used seem to have only one possible solution strategy, and thus the topology result could be somewhat trivial. Is that indeed the case? Can different tasks break this? 2. CCA assumes linearity. RNNs are nonlinear. Did you try a nonlinear method? Minor comments: 3. Related work and line 60-61 “theoretical clarity… completely lacking”. While there is very little theory, there is some that could be relevant [1]–[3]. 4. Page footer is NeurIPS 2018 5. K bit flip-flop. Input statistics (rate) are not specified 6. MDS graphs (e.g. figure 1D) : Perhaps I’m missing something, but I expected to see the same distribution of points colored differently in the left and right parts of the panel. The text says that you used a similarity matrix between all networks to construct the MDS space, and then projected to 2D. This should give a scatter of uncolored points, that you can later color according to tanh/relu or architecture. 7. Figure 1C : axes labels, colorbar – either on the figure, or in the caption. Are the networks ordered according to clustering? According to MDS? 8. Figure 1E the graph looks undirected, which is misleading. 9. Line 211, Figure 2F : “systematic differences” : Are there error bars? Is this reproducible? If so, can you hypothesize on mechanisms or implications? 10. Line 238 “that THE all” 11. Figure 4 : Was this effect similar for all configurations (task,architecture,unit type)? Or just relu/tanh in the contextual integration task? [1] A. Rivkind and O. Barak, “Local Dynamics in Trained Recurrent Neural Networks,” Phys. Rev. Lett., vol. 118, no. 25, p. 258101, Jun. 2017. [2] F. Mastrogiuseppe and S. Ostojic, “Linking Connectivity, Dynamics, and Computations in Low-Rank Recurrent Neural Networks,” Neuron, vol. 99, no. 3, pp. 609-623.e29, Aug. 2018. [3] F. Mastrogiuseppe and S. Ostojic, “A Geometrical Analysis of Global Stability in Trained Feedback Networks,” Neural Comput., vol. 31, no. 6, pp. 1139–1182, Apr. 2019.

Reviewer 3

UPDATE AFTER REBUTTAL: I want to thank the authors for the rebuttal letter and for their willingness to extend the discussion motivating the choice of tasks in terms of the computational primitives that they expose. - Originality: This type of systematic investigation is sorely needed in the literature of neural networks and RNNs in particular, despite not being particularly original in strict sense of the term - Quality: The methodology seems solid and nicely takes into account quantitative methods previously developed in the literature - Clarity: The paper is clear and well-written. The motivation is very convincingly laid out. - Significance: Beside being significant for its results, the paper is significant in that it exemplifies a type of scientific approach that is sorely missing in the literature.

Paper ID:	9068
Title:	Universality and individuality in neural dynamics across large populations of recurrent networks

Reviewer 1

Reviewer 2

Reviewer 3