This paper received mixed reviews: R2 & R4 recommended accept (score 7), R1 recommended weak reject (score 5), and R5 recommended a clear reject (score 3). However, R5 mentioned that the reviewer is unfamiliar with the topic as well as the prior work in this domain. Therefore, I am discarding R5's review. R2 & R4 acknowledged that this paper makes a strong theoretical contribution to optimizing information transfer of human-like visual attention mechanisms; they praised the mathematical construction is "well-founded," "solid," and "inspiring" (R1 also agreed with this after the rebuttal/discussion phase). Perhaps one weakness of this paper is insufficient empirical evidence. All three reviewers (R1, R2 & R4) raised similar concerns on this point. However, given strong theoretical justifications/analyses provided in the paper (e.g., the arduous derivation of the fourth-order dynamics to justify the second-order approximation, the ambitious attempt at combining several computationally difficult tasks in one framework, etc.) I think this is of sufficient quality to be presented at NeurIPS.