NeurIPS 2020

Deep Transformers with Latent Depth

Meta Review

The paper introduces a probabilistic approach to learn to select layers of a deep transformer for a language pair in a multi-lingual translation setup. The reviewers found the approach interesting and the potential applications of the technique useful. I would recommend that the paper be accepted. The authors should address the reviewer comments in the final version.