Review for NeurIPS paper: Neural Execution Engines: Learning to Execute Subroutines

NeurIPS 2020

Neural Execution Engines: Learning to Execute Subroutines

Meta Review

After reading each others reviews and discussing the author rebuttal, opinions on this submission sit around the borderline. The hesitation towards acceptance largely comes from a confusion around the key motivations of the work, the amount of important details residing in the supplement, and uncertainty around the relationship to prior work in program synthesis. The work focus on strong generalization in neural networks trained to perform algorithmic reasoning. The experiments focus on generalization in a fairly narrow domain -- algorithmic subroutines such as finding the minimum of a list, merging two sorted lists, taking a sum, etc. And the type of generalization examined is concerned with extending the length of the input list by scaling up the associated problem (sorting, MST, shortest path) and generalizing to never-before seen numbers or number combinations. The paper demonstrates poor generalization along these dimensions for standard transformers under identical input/output schemes. A modification to the architecture and number embedding is suggested and shown to generalize to significantly longer inputs and to new numbers. The supplement provides more detailed ablation of these changes for the sorting setting. The main experiments section lacks the standard Old vs Ours comparison between the proposed NEE and a vanilla transformer; however, the previous section establishes the vanilla model's relative weakness. Though it does seem that the setting there for sorting is slightly different than the find_mind used in the final experiments. That said, direct comparisons with vanilla transformers throughout would make the paper's presentation stronger and provide additional data points in confirmation with the prior result. Overall, the work suffers perhaps in accessibility and presentation more than in conception or execution. Leaving many details for the supplement. Not providing clear comparison with the baseline. Not contrasting with prior work [3] that is closely related in motivation. The paper is likely to be of interest to a group of researchers; however, it could be made stronger with some revision to improve the presentation.