NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:253
Title:Metric Learning for Adversarial Robustness

Reviewer 1

The paper proposes the use of triplet loss in order to achieve more desirable geometric relationships between an example, its adversarial counterpart and examples from the other classes. This is an intuitive proposal and the triplet loss has proven very useful in other ML contexts. The paper is generally well written and the triplet loss proposal for adversarial examples is original, to the best of my knowledge. Experimental results demonstrate the use of the proposed loss is very promising for improving adversarial robustness. It doesn't seem at all suprising that the latent representation of an adversarial example is shifted towards the false class (or away from the true class). After all, isn't this the basis of the optimization used to generate adversarial examples in the first place? Furthermore, using t-SNE to visualize the behaviour of adversarial examples is not a new idea and has been used in a variety of papers in a similar way, see e.g. -Generalizability vs. Robustness: Adversarial Examples for Medical Imaging -Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions -IMPROVING THE GENERALIZATION OF ADVERSARIAL TRAINING WITH DOMAIN ADAPTATION For these reasons, I do not believe this component of the contribution is as significant as the proposal of the new loss function. A question for the future would the underlying idea be further refined to the popular quadruplet loss, which is often used to extend the triplet loss.

Reviewer 2

This paper analyzes the property of high dimensional latent representation of adversarial examples, finding that the attack makes the embedding move closer to the false class so that the adversarial and natural images are almost indistinguishable. The authors implemented the proposed method on several datasets including MNIST, CIFAR-10, Tiny Imagenet and achieved good performance. They also made comparison with different adversarial methods including FGSM, BIM, C&W, PGD and so on. This paper was in general well written. The authors provided a lot of visualization about their analysis and result. The authors also uploaded their code in the supplementary and the experiments seems sufficient and support strongly for their views. Some major concerns are listed as follows: 1. As a general framework, the authors implemented their idea on only one model structure. This seems insufficient. More experiments on a different model structure may make the paper more convincing. 2. Although, the paper provided a lot of experiments and visualization, the authors analyze adversarial examples only from the experiments, more theoretical analysis may be needed.

Reviewer 3

a) Originality:This paper propose a new method based on metric learning, and related work cited adequately. b) Quality: The paper is well written in general. It has the intuitive explanation for the motivation, the theoretical analysis and the experiment results. And experimental results support the authors' claim. c) Clarity:The paper is written clearly and easy to understand. d) Significance: The paper provides a new method which is easy to understand and implement. e) Some concerns: 1) Differ from other works' visualization of the last layer, this paper shows that of latent representation. Authors have said that the penultimate layer tends to have more information, but the proposed triplet loss is added on the output layer, is there a gap between the visualization and the proposed algorithm TLA? 1) Why do authors focus on the infinite norm adversarial attack? Is it the limitation of the proposed method? Can the method be applied to other ones? 2) It is too few to choose only AT and ALP as baseline. 3) There is a mistake in Fig 3. The figure does not agree with the notations in this paper.