NeurIPS 2020

Adversarial Weight Perturbation Helps Robust Generalization

Review 1

Summary and Contributions: In this work, the authors propose adversarial weight perturbation to regularize the flatness of weight loss landscape in order to perturb both data inputs and model weights.

Strengths: The method is simple yet novel and easily understood. The empirical results show consistent improvement on the different datasets.

Weaknesses: There is no theoretical justification provided as to why adversarial weight perturbation (AWP) works. Also, I am concerned about the convergence. If the weights of a model are constantly perturbed in the worst case scenario during training, the model may find it difficult to learn the task.

Correctness: The empirical methodology seems correct.

Clarity: The paper is clear and well written.

Relation to Prior Work: Yes the authors describe how their work differs from prior works in the field.

Reproducibility: Yes

Additional Feedback: Could you provide some justification as to why AWP works and why the model will learn with its weights being perturbed during training? Could you explain how the methods that do implicit weight perturbation differ from AWP? What sort of perturbations do these methods do to the weights of the model? Suggestion: Could you check that the weights of the model learned using AWP differ from AT. You could try this on a simple model and compare the differences in the weights learned by the methods. ========================================= I acknowledge that I read the rebuttal and thank the authors for providing explanations to the questions and concerns I had.

Review 2

Summary and Contributions: In this work, the authors propose an Adversarial Weight Perturbation (AWP) method, thus a modified adversarial training is formed with adversarial perturbation on both input and weight. The authors have identified the strong connection between the weight landscape loss flatness and the robustness generalization gap. The authors also perform extensive experiments to demonstrate the effectiveness of enhancing the adversarial robustness compared to several prior works.

Strengths: 1. The authors convey a new idea of performing adversarial training. 2. Part of the results show adversarial robustness improvement compared to the vanilla counterparts.

Weaknesses: 1. There lacks the theoretical support of the proposed adversarial training with AWP. 2. Only CIFAR-10 and single neural network are tested, results on the larger-scale dataset are expected. 3. The experimental result shows the trade-off between natural data accuracy and attack accuracy. 4. The experimental configuration given in the manuscript is a little bit messy.

Correctness: The proposed method seems empirically correct, but the experiment results are not strong enough.

Clarity: Overall, the manuscript is well written. The experimental part may need a better organization.

Relation to Prior Work: The contribution of this work is well discussed to differ from prior works.

Reproducibility: Yes

Additional Feedback: 1. Does the middle figure of fig.1(a) show the loss landscape with alpha value embed into training or post-training? If tuning the alpha at the post-training stage, it is more like achieve lower post-attack loss via the gradient obfuscation. 2. In table-2, the proposed AWP based method shows the performance improvement on all the metrics of TRADES, MART, and Pre-training. However, for the AT and RST, it seems the post-attack accuracy is enhanced via the trade-off of natural accuracy. The author stated that AT (i.e., adversarial training) is still the most effective approach. 3. What is the experimental configuration for table-2? Is that A=1, K_1=10, K_2 = 1? 4. Based on the configuration of A=1, K_1=10, K_2 = 1, the adversarial training overhead is only 8%? Although K_2 is set as 1, the reviewer thinks the overhead should be doubled, since extra back-propagation on weight is required.

Review 3

Summary and Contributions: The paper studies the relationship between the robust generalization gap and the weight loss landscape. More specifically, the authors observe that there is a correlation between the flatness of the weight loss landscape and a smaller robust generalization gap. Based on this observation, adversarial weight perturbation (AWP) is proposed to flatten the weight loss landscape. Combining AWP with adversarial training (AT), the paper conceptualizes a training algorithm that adversarially perturbs both inputs and weights cyclically during each training batch. Extensive experiments and comparison with existing AT-based methods show that AWP improves adversarial robustness on top of adversarial training and flattens the weight loss landscape after training. The proposed AWP defense has the potential to improve many other existing methods, giving a boost to mitigating the risk of adversarial examples. While it is interesting to draw an empirical observation between flatness of the weight loss landscape and robustness, there is a lack of theoretical discussion to substantiate the paper's claim on this relationship which is important to extend this to adversarial examples of other types or other new defenses. --Update-- I have read the response and increased my score since the authors have addressed my concerns.

Strengths: - Detailed empirical observations on the relationship between robustness generalization gap and flatness of the weight loss landscape in adversarially trained models. This observation may lay the foundation for more theoretical work or inspire other defenses. - Extensive experiments on evaluating proposed AWP to validate claims that it improves robustness. - Though there have been previous studies that seek to observe the link between weight loss landscape and robust generalization gap in standard and robust training, this paper successfully concludes a flatter landscape correlates with a smaller generalization gap for adversarially trained models. - To my knowledge, AWP is novel and is shown here to improve robustness across a variety of AT-based methods, showing its wide-reaching potential to improve other defenses.

Weaknesses: The main weakness and concern I have about this paper is the lack of theoretical study/results to support the claim that flatness of the weight loss landscape leads to smaller robustness generalization gap and robustness gain of AWP. If the author can include discussion on this, I am willing to reconsider the score.

Correctness: The claims are supported by the empirical results presented in the paper.

Clarity: Good organization of content, albeit with awkward phrasing at times. - Line 277: “it implies that the perturbation size cannot be too small to ineffectively regularize the flatness of weight loss landscape and also cannot be too large to make DNNs hard to train.”, suggested edit: “it implies that the perturbation size cannot be too small to effectively regularize the flatness of weight loss landscape and also cannot be too large, which makes DNNs hard to train.” - Line 164: “Why Need Weight Loss Landscape?”: ungrammatical phrase

Relation to Prior Work: There is a related work that explores the theory between robustness and weight perturbations which I recommend the authors to discuss: “Towards Certificated Model Robustness Against Weight Perturbations” AAAI 2020

Reproducibility: Yes

Additional Feedback: Theoretical treatment of the paper’s findings would better support the claims in this paper. What would be the effect of combining AWP with non-AT defenses (such as certified defenses)? It would be interesting to see if it also improves their robustness.

Review 4

Summary and Contributions: This paper experimentally reveals that flat loss surface in weight parameter space can improve the generalization performance of adversarial training. The authors thoroughly investigate the flatness of loss surface before and after overfitting of adversarial training and show that the flat weight loss surface can achieve good generalization performance in terms of adversarial robust accuracy. From this observation, the authors propose the adversarial weight perturbation (AWP), which adversarially perturbs weight parameters besides input data in adversarial training. Adversarial training with AWP outperforms naive adversarial training, and AWP also improves recent strong defense methods.

Strengths: 1. The proposed method (AWP) can widely improve previous defense methods. Thus, this paper reveals that flattening the loss surface in parameter space seems to be a promising research direction. 2. Since the loss surface in weight space is not well explored in previous studies, this paper has impacts and provides insightful results. In addition, the generalization performance of adversarial robustness is an important problem because it is still clearly inferior to the generalization performance of clean accuracy. 3. This paper provides insightful experimental results. These results can inspire researchers to propose a new defense method or analyze the generalization gap of adversarial robust accuracy.

Weaknesses: 1. This paper does not provide any theoretical contributions, and thus, the claims of this paper are not supported by theoretical results. 2. The computation cost of AWP can be more than twice the cost of naive adversarial training.

Correctness: I think the claims of this paper are well supported by thorough experiments. To investigate the relation between flatness and generalization performance of adversarial training, the authors use several datasets and models. To show the effectiveness of AWP, the authors evaluated the combination of AWP and adversarial training, TRADES, MART, Pre-training, and RST. Furthermore, the authors provide the ablation studies, and their results also support the claims. My concern is only that \rho(w+v)-\rho(w) in equation (6) might be not the criteria of the flatness. This is because the minimization of this term can just increase \rho(w) and decrease \rho(w+v). To make this term explicit flatness regularization term, I think that it should be |\rho(w+v)-\rho(w)| or {\rho(w+v)-\rho(w)}^2. However, this might be a minor issue since the experimental results show that AWP makes the loss surface flat.

Clarity: This paper is well written and easy to follow.

Relation to Prior Work: The difference between this work and previous work is well discussed. However, the difference between this paper and [44, 63] is not so clear. Though they seem to be the most related work, the authors do not explain their contributions in detail.

Reproducibility: Yes

Additional Feedback: Why don't you use |\rho(w+v)-\rho(w)| or {\rho(w+v)-\rho(w)}^2 as the regularization term? I think the regularization by using this term is more reasonable than solving the mini-max problem with respect to parameters. In equation (11), why did you use the gradient with respect to w+v and add and subtract v? This equation can be written just the gradient with respect to w, which is equivalent to the gradient with respect to w+v, without addition and subtraction of v. ======= I have read the author feedback, and my minor concerns are addressed. About the weak points of this paper, It is difficult to verify theoretical justification only from author feedback in the discussion period, and I would like to see experimental comparison of runtime. Since weak points still remain but I think this paper exceeds the acceptance threshold, I maintain my score.