Reviews: Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

UPDATE after reading author rebuttal: Look forward to the changes in the final version of the paper. Detailed comments: 1. Understanding of RNNs for sentiment classification task - theoretical analysis backed by empirical observations: This work takes up the sentiment classification task. This work figured out some fixed points and centered their analysis of RNNs around them. The RNN states can be cast into a 1-dimensional manifold of these fixed points. The PCA of RNN states across examples reveal that training helps RNNs figure out a lower-dimensional representation. Interestingly the movement along this low dimensional manifold is minimal in absence of inputs or presence of neutral/un-informative words, whereas they show more movements if polarity bearing words are present, thus, showing linear separability effects along this 1-D manifold. Further analysis using eigenvalue decomposition also support the low-dimensional manifold argument. However, the authors also state that approximation of full non-linear LSTM by linear dynamics is not the best idea in the long run even though for some steps the approximation may be close enough. 2. Theoretically and empirically, the paper shows that even the complex dynamics of RNNs can be approximated by linear and low-dimensional dynamics which are more human interpretable: The presence of low-dimensional manifold around fixed points has been backed by multiple theoretical analysis and empirical observations : (1) PCA of RNN states, (2) Eigen value decomposition and through figures 2 and 3. Moreover, the linear separability depending on polarity bearing words has been demonstrated through Figure 4. Figure 6 show consistency across different variants of RNNs on different datasets. A very positive thing about this paper is the authors limit themselves in overstating the significance about their contributions and pave the way for future research work by (a) pointing out negated polarity bearing 'not bad' may not be perfectly understood by this framework (b) sentiment classification is a particular representative task and it remains to be seen how this mechanism work for other settings. 3. Some critical comments: a) The paper is not very well-written and a little hard to read. It would be better to structure the paper with a 'preliminary' section before Section 3. There it would be better to set the notations and definitions like fixed points, linear attractor dynamics, etc. for more novice readers. b) Many of the mathematical results (for example eq 3) could be structured as theorem/lemma/result. c) Many mathematical observations are simply stated inside paragraphs and get lost, some degree of highlighting will be essential to get the main message out. d) The figures could be better explained in relation to the mathematical results including more details. Overall my scores are as below(out of 10): Originality: 8 Quality: 7 Clarity: 5 Significance: 8

Reviewer 2

Although this paper relies heavily on the well-established field of dynamic systems analysis, I found this paper to be refreshingly innovative in that it takes these tools an applies them in a new way. This is a refreshingly research focused paper compared to the other papers I've reviewed for NeurIPS this year. I like that the authors tackle a difficult problem, that of interpretability of RNN's, in a very principles manner. The paper's quality is good in that it explores a variety of important avenues (e.g., multiple data sets, multiple models, multiple measures of the dynamics, etc.) The paper is very clear and easy to read. I comment the authors for this. As for significance, interpretability is of key relevance in academia, business, and government. This paper provides a new lens for looking at this problem.

Reviewer 3

POST REBUTTAL UPDATE: The authors answered my concerns, and I'm increasing the score to 8. The authors train RNNs on a basic NLP task – sentiment classification. They then use dynamical systems tools to show that the network implements this as a line attractor – perhaps the simplest model of evidence accumulation. Every word is projected onto the line attractor according to its valence, and moves the dynamics towards the correct decision. This mechanism was shown in tasks that were neuroscience-inspired [1], and it’s an important contribution to show that it also arises in tasks that are “pure” machine learning. Major comments: 1. There is inherent variability in the dynamical objects observed: a. Different architectures have different input projection separation (LSTM on IMDB for instance). b. Different points on the line attractor have different q values (not shown, but likely given prior work [2], [3]) c. Different points on the line attractor have different time constants (Fig. 3c for instance) d. Different points on the line attractor have different linearized dynamics error (Fig. 5b) All this variability can be harnessed to try and understand which factors contribute to performance[3]. For instance, If the drift is suddenly larger – do you see that evidence accumulates faster at these points? 2. Bigrams are mentioned, but not analyzed. It could be that this analysis is complex, or the results are inconclusive. But this should be reported. At the very least, show what happens in the dynamical level for the expression “not bad”. Minor comments: 3. Appendix A2 shows that bag of words is not always worse than trained RNNS. (line 238-239) 4. Figure 1 is not clear. Is this an individual neuron for many documents? Many neurons for one document? 5. Section 3.1 – add a reference to appendix A2 6. Line 111 “no input”. Is the natural choice zero input, or the average of all neutral words, or average of all words? 7. Line 114 – Figure 1D does not exist 8. 123: “that THAT the” 9. LSTM vs. VRNN on SST seem to show an opposite trend in their performance compared to their input projections. [1] V. Mante, D. Sussillo, K. V. Shenoy, and W. T. Newsome, “Context-dependent computation by recurrent dynamics in prefrontal cortex,” Nature, vol. 503, no. 7474, pp. 78–84, Nov. 2013. [2] D. Sussillo and O. Barak, “Opening the Black Box: Low-dimensional dynamics in high-dimensional recurrent neural networks,” Neural Comput., vol. 25, no. 3, pp. 626–649, 2013. [3] D. Haviv, A. Rivkind, and O. Barak, “Understanding and Controlling Memory in Recurrent Neural Networks,” ArXiv190207275 Cs Stat, Feb. 2019.

Paper ID:	9119
Title:	Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

Reviewer 1

Reviewer 2

Reviewer 3