Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
After considering the author response and discussing the paper, all reviewers agreed that the paper presented interesting and novel approaches to the problem. It is however unclear what advantages the approach provides given the extra complexity and computational burden (over 2.2x due to the ODE solver used). The approach did not significantly affect the accuracy; however, it did result in less jumpy attention that tended to form coherent blocks. This was captured in the proposed TLT metric; however, the submission labels this metric as indicating greater interpretability. It makes general sense that less jumpy attention would lead to greater interpretability but I encourage authors to clarify this point in future revisions (more so even than in the rebuttal). While not revolutionizing attention, this work proposes an interesting direction and delivers a useful measure (TLT) for evaluating stability of attention over time.