Reviews: Adversarial training for free!

This paper proposes a new, more efficient method for performing adversarial training. The performance of the proposed training protocol is comparable to state-of-the-art results in adversarial training, while being efficient enough to adversarially train a model on ImageNet on a workstation. Experimental results are presented on CIFAR-10, CIFAR-100 and ImageNet. Originality: The idea of using the backward pass necessary for training to also compute adversarial samples seems indeed novel. Projected gradient descent (PGD) adversarial samples require multiple backward passes. In order to obtain strong adversarial samples for training, the same minibatch is used for training the model consecutively and to produce the PGD iterations each time on the updated gradient. The total number of epochs used for training is divided by the number of iterations on the same minibatch to ensure an equivalent number of training iterations as with natural training. Thus, the computation time for the proposed protocol is in the end comparable with that of natural training. Quality: The idea of "warm starting" each new minibatch with the perturbation values from the previous minibatch is not particularly founded, and no justification or ablation study is provided to analyze the impact of this choice. What happens when no warm start is used? How much does the final attack differ from the initialization value? Is this pushing the attack towards some notion of universal perturbation? The paper puts a strong accent on the fact that the proposed protocol is designed for untargeted adversarial training. It would be good to see a comparison with previous (targeted) results on ImageNet from [Xie et al., 2019] and [Kannan et al., 2018]. Some aspects of the experimental section are not fully convincing, as the attacks used for evaluation are arguably not too strong. The attacks against ImageNet (Table 3) seem to use $\epsilon=2$ (over 255?), which is too small a value to reflect the robustness of the model. Moreover, I was not able to find the exact parameters used when testing against the C&W attack (Table 1). Moreover, this attack was only evaluated on CIFAR-10. In most cases, evaluation against the PGD attack does not seem to use random restarts (except for one configuration int Table 1). This feature is known to make the attack considerably stronger. The paper mentions the SPSA black-box attack in the experimental section, but then fails to compare against it, claiming that it would not perform great anyway. The number of repetitions of the same minibatch $m$ seems to have a strong impact on both clean and adversarial accuracies (trade-off). How would one tune it efficiently in practice? Clarity: The paper is overall well written. Using both K-PGD and PGD-K notations can be a source of confusion. Significance: Provided that the method proposed in the paper is sound and obtains the claimed performance, it would offer a more efficient alternative to train a robust model. Minor remarks: - Lines 74-75: The main difference between BIM and PGD is actually the projection step performed by PGD (and giving the name of the method) but not by BIM. - Line 164: Possibly incorrect reference to Section 4. - Line 193: Extra word "reference". - Alg. 1, line 8: Extra square bracket. - Alg. 1, line 12: This should probably be a projection operation, not clipping, in order to generalize beyong $L_{\inf}$. [UPDATE] I would like to thank the authors for their detailed explanations and additional experiments. These have provided some additional clarity and should be included in the paper. In view of the rebuttal, some concerns still remain. I believe that testing the proposed adversarial training strategy against stronger attacks (e.g., using high confidence in C&W attack, larger eps budget for the others) would prove the robustness of the obtained model beyond a doubt. I am however increasing my rating from 5 to 7 in view of the rebuttal.

Reviewer 2

Originality: The paper has mainly one original idea - using the backward pass of backprop algorithm to also compute the adversarial example. On one hand, it is really impactful because the authors show empirically that it speeds up the training process while maintaining equal robustness to adversarial attacks, but on the other hand the idea itself isn't really outstanding. Quality: The paper gives experimental verification of the idea, and claim to achieve the state of the art robustness on CIFAR datasets. The paper also gives detailed results of the experiment like the training time taken, and show that it is indeed close the time taken for natural training. They also have a section explaining how the loss surface for their technique is flat and smooth; the adversarial examples for their technique look like the actual target class. These properties are also seen in standard adversarial training. Thus their technique is similar to the standard adversarial training even in these aspects. Therefore, quality of the paper is good. Clarity: The paper is well written. Significance: The significance would be really high because training robust models would be almost as fast as training non-robust models. This would greatly benefit the robust machine learning research. Having said that, other than this one idea, there aren't any other ideas or contributions of the paper.

Reviewer 3

I really enjoyed this paper. The idea of just simultaneously computing the gradient with respect to model parameters and the input images at the same time is simple and. elegant. It is surprising it works so well. I am also glad now that like the same PGD based technique works well on Cifar and ImageNet. Few Comments: 1) Since your method is so much faster it would be *great* to have error bars on your main tables/results. I am sure the variance is small but it would be nice to have that in the paper, it is also good practice for the community. 2) I am not sure what figure 3 is supposed to sho, it might be nice to contrast this with a non-robust model to really show the difference between the landscape between the two "robust"models (free and pgd) 3) You say there is *no* cost to increasing m in the cifar-100 section, but this is only true as long as m is less than the total number of epochs. I presume these algorithms wouldn't converge well in a low number of epochs (lets say 1 epoch). In fact it would be good to have that plot/table. 4) It would be nice to see the code to see how the fast gradient computation worked :) 5) Again since your method is so much faster it would be excellent to see how much more robustness you get on ImageNet by training adversarial on the *entire* Imagenet-11k (10 million example) training set. Recent work "Unlabeled Data Improves Adversarial Robustness" By Carmon et al has shown that adding more unlabeled data can improve robustness and "Adversarially Robust Generalization Requires More Data" by Schmidt et al postulates that robust generalization requires more data, so adversarial training on all 10 million ImageNet images which have roughly the same label quality as the standard 1.2 million training set might greatly improve performance. 6) It would be good to try the "stronger" attacks presented in https://github.com/MadryLab/cifar10_challenge just to see if these break your model in anyway 7) A plot like figure 7 in https://arxiv.org/pdf/1901.10513.pdf would be great, just another test to make sure your models are truly robust.

Paper ID:	1853
Title:	Adversarial training for free!

Reviewer 1

Reviewer 2

Reviewer 3