This is a nice illuminating study into the dynamics of how RNNs learn. The reviewers were all very positive about this paper, for the following reasons: 1) it’s very well-written and presented, 2) makes clear theoretical contributions and insights into the dynamics of RNNs throughout training, and 3) provides a nice analytical treatment and useful framework that could be applied in a more general sense. The rebuttal included new analyses on a more complex task, SST-2, with impressive results. I suggest the authors try to fit these into the main text, if possible. The findings should be of broad interest to the NeurIPS audience, and I strongly recommend accept.