Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
There exists some disagreement among the reviewers of this paper. Two of the reviewers believe that the introduction of the memory layer with product keys and its incorporation into the Transformer architecture is novel, interesting, and can open doors to new usecases for efficient memory-augmented neural nets. The other reviewer believes that the memory layer is merely an implementation detail, which is not proved useful for applications other than large-scale language modeling. I believe that the contributions of the paper are significant enough to grant an acceptance, especially given how important language modeling has become in modern NLP. Furthermore, because the proposed architecture is very different from common NLP and Computer Vision architectures, I recommend acceptance as a spotlight. Please improve the clarity of the technical details and address the reviewers' comments.