NeurIPS 2020

Untangling tradeoffs between recurrence and self-attention in artificial neural networks


Meta Review

The paper provides theoretical analysis of self-attention and vanishing gradients. Experiments are of toy problems with non-SOTA results but validate the main theoretical contributions of the paper.