Review for NeurIPS paper: Robust Pre-Training by Adversarial Contrastive Learning

NeurIPS 2020

Robust Pre-Training by Adversarial Contrastive Learning

Review 1

Summary and Contributions: The paper incorporates adversarial training into the pre-training step, which makes the pre-training techniques even more robustness-aware. This can be seen as an extension of SimCLR (with the incorporation of adversarial training). ==== Post rebuttal update==== I appreciate the authors' response, which clarifies and answers most of my concerns. I believe with a minor revision, the paper can be accepted to NeurIPS. Therefore, I raised my score to 7, an accept.

Strengths: The idea to introduce adversarial robustness into self-supervised learning is novel. One clear advantage of this is to obtain robust models with unlabeled data, which is more readily available.

Weaknesses: More baselines should be included. The paper mainly compares the proposed method to SimCLR, which is the base model that doesn't consider any adversarial during pre-training. Therefore, the results are somewhat trivial (adversarial pre-training should outperform standard pre-training). - Since this is the first work that proposed adversarial-aware pretraining (as the authors claimed), I think it is appropriate to compare its adversarial robustness to some supervised learning methods (such as [1]) or even incorporate these supervised learning methods into the fine-tuning phase (instead of standard adversarial training). - Also, robustness against unforeseen attacks (l1, l2, etc.) should be evaluated. For standard pre-training methods (e.g. SimCLR), every attack is unseen, but those still show improved robustness. I wonder if the adversarial robustness-aware pre-train method can help the model to be even more robust against other unseen attacks? [1] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in Proceedings of the 36th International Conference on Machine Learning, 2019.

Correctness: The related work section doesn't discuss methods that tackle the adversarial robustness problem (including supervised learning or semi-supervised learning). I believe these methods are important in the discussion and can also act as baselines.

Clarity: There are some typos in the papers. These do not hurt the readability, but still make the paper feel less well written. - Line 138: in in -> in - Line 315: intoduing -> introducing - Line 317: discusses -> discuss - Line 219: to aligned - Line 242: from \theta_bn to \theta_bn,adv -> from \theta_bn,adv to \theta_bn - Line 182: except impulse noise -> except Gaussian noise (according to figure 2) - etc. The proofreading could have been better.

Relation to Prior Work: The authors adequately discussed the related works.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This work shows that as a way of unsupervised pre-training, contrasting features to random and adversarial perturbations for consistency can benefit robustness-aware pre-training even further. This idea, as naturally motivated by the cause of adversarial fragility, yielded state-of-the-art results for adversarial defense. On CIFAR-10/100, the authors presented well-executed experiment on pretraining (for supervised fine-tuning) and semi-supervised learning (up to very low label rates).

Strengths: The paper’s main idea is easy to follow: extending a recently successful contrastive learning framework SimCLR [2] to adversarial training. While SimCLR is already popular for a number of tasks, exploring its usage for adversarial defense appears to be new and original. The authors explained why SimCLR might be particularly suitable for the goal of adversarial robustness: one cause of adversarial fragility is the lack of feature invariance to small input perturbations, and SimCLR learns representations by maximizing feature invariance under differently augmented views. That makes this paper well motivated and grounded. The main technical part of this paper explores options to formulate the contrastive task. As adversarial perturbation is a very strong and hostile form of data augmentations, how to balance its role with SimCLR’s own random augmentations is an important question to address. The authors compares three options: Adversarial-to-Adversarial (A2A), Adversarial-to-Standard (A2S), and Dual Stream (DS) which includes two contrasting pairs (one standard, the other adverserial) and appeared to the best default option. One more contribution is to extend this framework to semi-supervised learning, following the idea of [9]. The experiments are very thorough, and the results are strong. As pre-training for supervised adversarially robust models, this new method outperforms a very recent self-supervised robust pre-training approach [1] with large margins, making it a new state-of-the-art. It proves by injecting adversarial augmentations, contrastive pre-training indeed contributes to learning data-efficient robust models. As a pre-training way (with no extra data), this method has a plug-n-play nature and can work with many more adversarial defense methods. Their robustness gains can even extend to unforeseen attacks (Figure 2), shown by applying 19 unforeseen attacks by [25]. The authors further demonstrate their pre-training can improve over SOTA semi-supervised adversarial training like UAT++ [9]. In a challenging low-label setting of 1% rate, Table 2 shows this method only see a minor decrease (~1%) compared to 100% full label rate, while other baselines drop 13% - 30%. This result (if reliable?) is amazing and could boost a strong community interest in studying unlabeled data for robust training, e.g., does data-only (with few to no labels) suffice for good robustness?

Weaknesses: I’m in general positive about this paper: contributions are clear, and the motivations/logics/experiments all appear to be thoughtful and convincing. A few nitpicks/suggestions: - I’m intrigued by your very good semi-supervsed results at 1% low label rate. Could you possibly include more comparison methods, and could you elaborate more why your method is particularly successful compared to others? - It would be nice to include more backbones besides ResNet18: currently that is the only one used. - Another great thing to add would be ImageNet-scale experiments, although I understand it very hard/impossible for rebuttal window. But please do consider it, as your method shall likely show benefits on it too.

Correctness: Yes

Clarity: The writeup is excellent, and the explanation is rather clear. Figure 1 helped understand the technical content. Besides, I quite like Section 4 that lays out a well-sorted roadmap, connecting dots from previous literature and clearly linking them to this work.

Relation to Prior Work: Yes

Reproducibility: Yes

Additional Feedback: ----------------------------- I have read the author's rebuttal, and decide not to change my score.

Review 3

Summary and Contributions: The paper presents Adversarial Contrastive Learning (ACL) to combine adversarial training and contrastive representation learning. Empirical experiments verify the effectiveness of the presented approach. -------- The rebuttal is doing a good job. Nonetheless, I also like to point out that although the changes are easy to made in a revision, but the changes are a lot. I have updated my rating from 3 to 5 by considering less of the imprecise presentation in the paper.

Strengths: The presented approach is reasonable and may benefit the field of adversarial robustness. The sets of experiments also seem to be comprehensive.

Weaknesses: First, the paper isn't easy to follow. It requires the readers to have sufficient background in adversarial robustness. Second, the author hides lots of details for the presented approach. Third, the presentations and discussions in the experimental section should be further improved. Details will be provided later.

Correctness: I do not find incorrectness in the paper.

Clarity: First, the author assumes the readers are experts in adversarial robustness, and hence it has lots of presentation flaws. For instance, the author mentions PGD way early in Figure 1. However, the author explains what PGD is until line 96. Another example is Section 2.2.1. In lines 131-132, the author skips all the details of supervised adversarial training and only refers to [1]. Second, the structure of the presentation is messy. Equations (3) explains the semi-supervised loss step iii. However, the author does not provide supervised loss and semi-supervised loss step i and ii. I can hardly follow the presentation. The author also does not discuss what the distilling model is in equation (3). Third, in the experimental section, the author shall consider explaining the evaluation metrics beforehand. For instance, it surprises me when I first see "TA, RA" in lines 176. And later, I realize the definition of "TA, RA" is hidden in Table 1. Another example is the lambda in line 171. Where can I find the definition of lambda? Furthermore, the author shall mention the ablation study for ACL (section 3.2) is performed on supervised contrastive learning.

Relation to Prior Work: The paper provides reasonable discussions on prior work.

Reproducibility: No

Additional Feedback: In lines 71-72, are the authors revealing their identity?