This paper makes a nice connection between standard ways of regularizing the dynamics of SGD and that of RNN. Although there are some disagreements between reviewers regarding the theoretical justification, the contribution is of interest to NeurIPS audience.