NeurIPS 2020

### Review 1

Summary and Contributions: This paper first formulates a data-dependent form of operator norm regularization on the Jacobian of a feed-forward neural network. They then show that this regularization scheme is equivalent to the well-known adversarial training paradigm of Madry et al. They empirically show that both adversarial training and their regularization scheme shrink the singular values of the Jacobian of a trained model and that models trained using these schemes are significantly more linear around then data than are regularly trained models.

Strengths: + The connection between the well-known data- _independent_ spectral normalization scheme of Miyato et al. and the data _dependent_ scheme introduced here is both simple and compelling. + Given recent progress toward understanding the success of PGD and the prevalence of adversarial examples, this work is well-motivated. + Theorem 1 is quite compelling. I have seen many works that try to explain why PGD is one of the only defenses that holds up well against a variety of attacks. And this paper makes the best case for PGD that I have seen. + The experiments are very good. The authors made a clear effort to be very thorough here and it shows. A variety of experimental settings are considered. It may not be surprising that for example data-dependent SNR dampens the singular values of the Jacobian more effectively than the (admittedly looser bound in) data independent SNR. However, it was quite important to the fidelity of the claims made to verify this, and the authors do so successfully. - The discussion in the appendix about Frobenius norm regularization is necessary as it would be a natural question one might ask.

Weaknesses: - The theorem may slightly overstate its result in the following way: it seems that in order for this correspondence between adversarial training and the proposed regularization scheme to hold, \epsilon must be quite small. That is, we are assuming here that all of the points in an \epsilon ball around some data point x are mapped by the model to the same activation pattern \phi_x (i.e. that B_\epsilon^p(x) \subset X(\phi_x)). I would imagine that this may not hold for "realistic" values of \epsilon (e.g. 8/255) all the time. Indeed, my concern is that while this theorem is certainly compelling, it may be the case that it only holds for \epsilon so small that it may not hold in practice. Perhaps the authors can clarify here. I see there is an experiment to this effect in Section 7.16, but this seems to be for only one data point. [EDIT: post-rebuttal] Based on the authors response and a closer look at Section 5.4, I'm satisfied that the authors looked into this potential weakness and were able to add explanation as to its implications.] - Figure 1 is too small to really be useful. It's not really clear what the arrows represent. A more detailed and larger figure here would be appreciated. - The notation when describing the power iteration is a bit strange. This is a small thing, but I think that it would make more sense just to rearrange the steps. For example, in (6) it would be more clear to write \tilde{u} \gets ..., then u_k \gets ..., then \tilde{v}\gets ..., and finally v_k\gets ... so that you have these steps written in the order that you apply them.

Correctness: I looked through the proofs in the appendix and everything seems sound to me.

Clarity: This paper is very well written . The sections are clearly defined and the narrative flows well from one section the next. One typo I found: on page 3 near the bottom: totherther --> together

Relation to Prior Work: The related work is a little bit brief. A slightly more detailed related works section should probably be included in the final version.

Reproducibility: Yes

Additional Feedback: I really enjoyed this paper. The writing was very good, the ideas were compelling, and the contribution is impressive. A clear accept from my perspective.

### Review 2

Summary and Contributions: This paper establishes a theoretical link between adversarial training and operator norm regularization. Specifically, this paper provides a data-dependent variant of spectrum norm regularization and proves that l_p norm constrained PGD with an l_q norm loss is equivalent to data-dependent (p,q) operator norm regularization. This reveals the connection between the network’s sensitivity to adversarial examples and its spectrum properties. Experiments support the theoretical findings.

Strengths: This paper presents a global spectral norm regularization for training robust models against adversarial examples. This paper shows that adversarial training (using l_q norm loss on the output logits) is a form of operator norm regularization, and confirms that a network’s sensitivity to adversarial examples is tied to its spectral properties. Experiments are conducted to support the theoretical findings.

Weaknesses: The theoretical results are not fascinating. Basically, this paper only shows the connection between adversarial training and data-dependent operator norm regularization, while we are still not clear about how such regularization affects the training of robust classifiers, and how it characterizes the sensitivity of the model to adversarial examples. The assumption made in this paper is not consistent with the practice. In particular, the derived theory requires that the loss function is l_q norm between the logits of the clean and perturbed inputs, while in practice people prefer cross-entropy or KL-divergence based loss functions. Additionally, Theorem requires extremely small epsilon such that B^p_epsilon(x)\subset X(phi_x) throughout the entire training period, which is also not practical.

Correctness: They are correct.

Clarity: The paper is clearly written and easy to follow.

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback: Based on the experiment, the authors showed the local linearity and activation patterns of the neural network after training. Can you also plot the same figures regarding the model throughout the training, it is interesting to explore whether adversarial training/operator norm regularized training can help stabilize the activation pattern change against adversarial examples. ########## After reading the authors' response 1. I agree that It is also necessary to include a small amount of activation pattern changes in Theorem 1. For example, a better way of presenting Theorem 1 is to show the connection between adversarial training and operator norm regularization under the assumption that the number of activation pattern changes is upper bounded by some small quantity. What if we only consider the target logit? Then the Jacobian matrix will be a vector, will the theoretical result still hold? Can we observe the same singular value spectrum in the experiment? ########## after reading authors' rebuttal Thanks for pointing out related works that use lq norm losses for adversarial training. One thing I have to mention is that actually logit pair algorithms still use standard adversarial examples rather than generating them by maximizing lq norms. I agree that it is better to move the discussion of cross-entropy loss to the main part of the paper.

### Review 3

Summary and Contributions: This paper builds the link between adversarial training and operator norm regularization for the learning by neural network and shows that $l_p$ norm constrained projected gradient ascent based adversarial training with an $l_q$ norm loss on the logit of clean and perturbed inputs is equivalent to (p,q) norm regularization. Empirically, experimental results verify the theoretical discussions. ** I have read all reviews and the rebuttal from the authors. After discussions, I believe that my evaluation is fair and proper. **

Strengths: This paper proposes a data-dependent spectral norm regularization variant which directly regularizes large singular values of a neural network. And by Theorem 1, it proves that $l_p$-norm constrained projected gradient ascent with an $l_q$-norm loss on the logits of clean and perturbed inputs is equivalent to data-dependent (p, q) operator norm regularization.

Weaknesses: This paper only discusses and proves the case for adversarial training with an $l_q$-norm loss. For other types of losses, it is not clear. The practical applicability appears to be narrow.

Correctness: In this paper, it uses the dataset CIFAR10 to verify the theoretical discussions and derivations. It seems insufficient to validate the effectiveness of an algorithm with only one data set. More empirical validations are needed.

Clarity: Basically it is clearly presented. In this paper, there are several typos or formatting issues needed to correct, such as the format consistency of conference references [26] and [27].

Relation to Prior Work: Yes. This paper discusses the differences with other prior work.

Reproducibility: Yes

Additional Feedback: More experiments on more datasets are needed to validate the effectiveness of the theoretic analysis in this paper.