Review for NeurIPS paper: Contrastive Learning with Adversarial Examples

NeurIPS 2020

Contrastive Learning with Adversarial Examples

Review 1

Summary and Contributions: This paper proposes using adversarial techniques to create positive and negative examples that are more difficult for a model being trained in a self-supervised fasion with contrastive loss to correctly classify. This, in effect, makes the SSL task more difficult and the authors demonstrate empirically that their methods leads to some modest accuracy gains on common image classification datasets. The main contribution is novel adaptation of adversarial techniques to perturb positive and negative example pairs such that they are more difficult to classify correctly.

Strengths: The strength of this paper is a convincing presentation of a novel idea with clear benefits as demonstrated by experiments. The authors show that this method can be applied to various different contrastive learning tasks and models. I think this paper would be interesting to the NeurIPS community.

Weaknesses: One weakness is that the authors do not evaluate their methods on larger models and datasets that most of the other SSL papers (such as SimCLR used as a baseline in this paper) commonly use. Therefore, it is difficult to say whether this approach scales to bigger model and data sizes.

Correctness: It appears, based on the authors evaluation, that the claims and the proposed method does work.

Clarity: The paper is well written, however it can be a relatively dense reading with a lot of notation, that (at least for me) was not the easiest read. Nonetheless I believe the authors managed to get their point across well. I would suggest adapting the main figure in the paper (Figure 1) to make it a bit more clear what's going on there and perhaps adjust the colors.

Relation to Prior Work: I found that this paper situates itself quite well in the literature, though I am less familiar with the adversarial learning literature.

Reproducibility: Yes

Additional Feedback: I've read the author rebuttal and thank the authors for their clarifications. I am believe my rating is still appropriate for this work.

Review 2

Summary and Contributions: This paper enhances various unsupervised learning methods with an additional adversarial data generation path and the associated loss. Contrastive learning is based on distinguishing different augmented versions of an image from other independently chosen images. This work proposes tweaking the augmented examples adversarially so that they are even more indistinguishable form other images (that are not augmented versions from the same image), that is "fooling" the classifier that distinguishes that augmented versions of the image from other images. The paper evaluates the quality of the pre-trained model by evaluating on the "downstream" supervised classification task

Strengths: This is clever method that works in several completely unsupervised setups with small, but consistent improvements across the board. The authors verify on downstream tasks for several datasets that the additional adversarial training improves the final classification score when the model is not fine-tuned. This is a strong indication that the method has true merits. The method is evaluated in multiple datasets and is used for augmenting multiple unsupervised approaches and exhibits the similar pattern: there is a sweet-spot for the size of the optimal adversarial perturbation and the improvements diminish if the perturbation exceeds that magnitude. The experiments are well designed and convincing and the method is novel.

Weaknesses: The paper shows compelling evidence that the adversarial training path helps the final downstream classification results when the model is not fine-tuned for supervised task. However any theoretical justification is absent. In general the idea is not extremely creative, but is certainly novel.

Correctness: The methodology has no obvious flaws and the results look consistent with rest of the literature. The numbers look good and give a clear supporting indication for the quality of the idea. However, no attempt at a mathematical analysis of the observed results or any explanation thereof is given.

Clarity: The paper is nicely written and motivated. Especially Figure 1 gives a very concise and intuitive description of the flow of the algorithm. The writing style is terse but easy to understand and focuses on the main features of the method. Overall, this paper is easy to follow and is convincing. In general, a pleasure to read.

Relation to Prior Work: The related works section is somewhat limited but it extensive enough to highlight the differences and similarities to previous methods. The method seems novel and interesting.

Reproducibility: Yes

Additional Feedback: The broader impact section misses the point, however I don't see any obvious ethical concerns regarding this research.

Review 3

Summary and Contributions: The papers improve contrastive learning by generating adversarial examples. It generates more challenging positive pairs and harder negative pairs. Experiments are conducted on three image classification dataset.

Strengths: 1. The ablation studies look sufficient. 2. The paper is well organized.

Weaknesses: [1] Adversarial Contrastive Estimation. ACL2018 [2] Self-supervised Approach for Adversarial Robustness CVPR 2020 1. Some related work. The paper states that " no attention has been previously devoted to the design of adversarial attacks for SSL". [2] has devoted to the design of adversarial attacks for SSL. Also, [1] "view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler. The resulting adaptive sampler finds harder negative examples, which forces the main model to learn a better representation of the data." This is not a problem considering the NIPS deadline. I am just write it down as an inference. It would be good if authors can explain the difference in later version. 2. Since the method insert adding adversarial noise into the data augmentation process of contrastive learning, a naive baseline of adding gaossian noise should be compared. Other noises should be considered too. 3. Since the improvement on cifar10 looks trivial, it would be great to see bigger improvement on more dataset. e.g. ImageNet, fine-graind dataset, etc. Is there a computation cost concern when apply your method to larger dataset? 4. Previous contrastive learning have results on transfer to a downstream task. ( ImageNet pretrain and detect on COCO ) Since you are compared with the naive contrastive learning method, it would be great if you can compare. 5. I understand that the computation cost of this method may be high. But since your major competitors are contrastive learning methods, it would be greater to compare with them under various dataset previous methods have used. SimCLR have results on (ImageNet Food CIFAR10 CIFAR100 Birdsnap SUN397 Cars Aircraft VOC2007 DTD Pets Caltech-101 Flowers). 6.If you can not show superior results on these datasets, why is this method useful, especially when your method is more complicated and take more computation time? 7. The experiments seem not very solid and promising to me. This is the major reasons I made my overall score. -------------After rebuttal----------------- I appreciate the author's rebuttal. It address some of my concerns. I would improve my rates to 4: An okay submission. Adding adversarial samples in SSL is not very novel. So I would expect a paper with solid experiments and extensive ablation studies. In rebuttal, authors conduct experiments on a small version of ImageNet due to hardware limitation. So whether it work on large scale dataset remain unknown. SSL is sensitive to parameters as shown in Fig.5. From the results on ImageNet100, I don't feel like there would be certain improvement on large scale dataset.

Correctness: yes

Clarity: yes

Relation to Prior Work: no

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: This paper introduce an adversarial attack mechanism to contrastive learning. By doing so, the model can have more challenging examples within a batch. The author also perform experiments on several datasets over baseline models.

Strengths: The paper is clear written. the proposed method is simple but reasonable. The experiments shows the improvement over baseline models.

Weaknesses: The choice of adversarial method FGSM needs further discussion, why this method is chosen, not other methods. The author may need to compare other baselines or statement, such as https://arxiv.org/pdf/1907.13625.pdf. Is it possible to also have adverbial examples on x_i?

Correctness: Yes

Clarity: yes

Relation to Prior Work: No, the author do not discuss much on other NCE enhanced algorithms.

Reproducibility: Yes

Additional Feedback: