NeurIPS 2020

Limits to Depth Efficiencies of Self-Attention

Meta Review

All reviewers agree this paper provides rigorous analysis of depth-width tradeoffs of a simplified Transformer model, which explains, to some extent, the observed mild effect of depth on Transformers. While reviewers raised some concerns about lack of more experimental verification of the results all reviewers agree that the paper makes enough interesting theoretical contributions and suggest acceptance. I agree with this and note to authors that it is imperative they make the changes/clarifications as in reviews/response.