Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper proposes a new model for the deep learning pipeline in vision, by replacing convolutions with self-attention layers in vision models. A new stand-alone architecture is proposed with good experiments. Initially two reviewers already proposed an acceptance while one reviewer asked for some improvement and explaination. The reviewers exchanged several comments and all agree on the fact that the answers in the rebuttal were convincing. Therefore, also for the meta-reviewer the final rate is “ accept”.