NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
After feedback and reviewer discussion, this paper received final ratings of 6, 7 and 7. Although the novelty of the proposed model is relatively minor in the context of previous work proposing Adaptive Computation Time (Graves 2016), the reviewers were impressed by the empirical performance and praised the detailed ablation studies (including the additional experiments with single-headed attention in the author feedback, which was important in reaching the final consensus view of reviewers to accept this paper). We encourage the authors to follow the suggestion of R1 (cut down space devoted to standard captioning components in Secs 3.2.1, 3.2.2 and 3.2.5) in order to make space in the final version for the experiments from the author feedback.