I congratulate the authors for taking simple ideas (namely, adversarially-motivated improvements to ensembles) and showing they work well. Ensembles are a strong baseline for uncertainty and natural out-of-distribution shift, and this work takes an important step further in highlighting the benefits of averaging multiple predictions for adversarial robustness. All reviewers found the paper well-written and convincing against baselines. As 3/4 reviewers stated, I recommend adding discussion making compute complexity more explicit. Finally, as R3 states and as promised in the revision, the paper is significantly lacking in discussion and citations of related work. Namely, there are many works which use a collection of neural networks applied to out-of-distribution uncertainty and generalization---from ensembles specifically to Bayesian methods and distributions over neural networks more broadly. This work carves out its own contributions in addressing adversarial robustness with algorithmic changes, but discussion with the greater context would benefit it. Papers I recommend discussing are: * Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy. Ludmila I. Kuncheva & Christopher J. Whitaker. https://link.springer.com/article/10.1023/A:1022859003006 * Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell. https://arxiv.org/abs/1612.01474 * BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning. Yeming Wen, Dustin Tran, Jimmy Ba. https://arxiv.org/abs/2002.06715 * Weight Uncertainty in Neural Networks. C. Blundell, J. Cornebise, K. Kavukcuoglu, D. Wierstra. https://arxiv.org/abs/1505.05424 * Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors. Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran