NeurIPS 2020

RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning


Meta Review

The paper received two accept reviews and one borderline reject [R1]. The main concern of R1 is the paper relies on simple/not the most recent approaches for both captioning and continual learning. The other reviewers and I agree to that but believe that for one of the first papers in continual learning for captioning that this is reasonable, even if it is not optimal. R1 did not respond after the rebuttal. The reviewers appreciate the the paper's contributions, including 1) First paper in continual learning in image captioning. 2) The experiment evaluation both automatic and with human evaluation 3) Effective approach for attention masking (adapted from HAT to RNN). 4) The caption model, while not SOTA, is acceptable because of its simplicity and representativeness. I agree with this evaluation and accept, however, I expect the authors to include the clarifications and improvements suggested by the reviewers and made in the author response, including clearly describing the technical difference to HAT early on in the paper. I encourage the authors to include results for a more recent can powerful captioning model (e.g. based on region features instead of conv features) in the final version as also suggested by all reviewers. PS: The authors should consider including a discussion of recent/concurrent works in the space: https://arxiv.org/pdf/1909.08745.pdf https://nips2018vigil.github.io/static/papers/accepted/18.pdf https://arxiv.org/pdf/2001.01578.pdf https://arxiv.org/pdf/2005.00785.pdf