NeurIPS 2020

DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles

Review 1

Summary and Contributions: The paper introduces a novel adversarial defence method DVERGE, where the adversarial weaknesses among models in an ensemble are diversified. I.e. if an adversarial input successfully attacks a model in the ensemble, it is unlikely to be effective for the others. The method has been verified against black and white box attacks on a CIFAR10 classifier.

Strengths: 1. Clarity of the prose. The paper is well-written. It is easy to follow the logic. The paper provides a good walk through the prior researches in this area. 2. Focus on black-box attack. I believe black-box attacks are more of practical concern for practitioners. It is commendable that the method is designed particularly against black-box attack. The training objective itself is, in a way, designed to prevent transferability. 3. Simple and effective method. The idea of training models with adversarial examples from the other models is intuitive and straightforward. It seems quite easy to implement.

Weaknesses: It would be nice if the work covers (1) computational efficiency and scalability and (2) more analysis on N. See "additional feedback" for the questions.

Correctness: The conceptual claims and the experimental settings look good to me.

Clarity: The paper is well written.

Relation to Prior Work: Relation to prior work is discussed well. Experiments include suitable baseline methods to compare against.

Reproducibility: Yes

Additional Feedback: I have a few questions. 1. How efficient is DVERGE compared to the baseline methods like ADP and GAL? In Figure 4, does DVERGE achieve the improved adversarial robustness with the equal or less compute than the compared baselines? 2. Will this method scale well with respect to data complexity (e.g. greater input resolution) and data size ? For example, for training an ensemble on ImageNet, will it be possible to employ e.g. model parallelism with many GPUs to achieve a reasonable time budget for training a robust set of models? Can the models be trained asynchronously in this setup? Please describe the scalability under the distributed setup - this will increase the attractiveness of the method. 3. Diversification is an important goal for ensemble learning. Does DVERGE help improving the original task performance compared to a baseline ensemble? I believe DVERGE is likely to induce orthogonal decision boundaries and help diversifying the cues for recognition itself too. 4. Would be nice to see an analysis with respect to the number of models (N). Will the adversarial robustness saturate with increasing N? Will this also be the case for higher input resolutions (e.g. 224 x 224) where the adversary has more resources to still be able to attack all N models in an ensemble? ***POST REBUTTAL*** I have read the rebuttal and other reviews. I am happy with the rebuttal, particularly additional experiments requested by myself and other reviewers.

Review 2

Summary and Contributions: In this paper, the authors made a study on how to train a more robust ensemble by promoting the diversity of individual members. Different from other works, the diversity is measured by the non-overlap of non-robust features among ensembling models. Maximizing this diversity allows the ensembles to diversify the vulnerabilities from each individual model. Therefore, when averaging the prediction, the ensemble achieves better adversarial robustness. The authors also conduct extensive experiments to demonstrate the effectiveness of DVERGE in defending adversarial attacks.

Strengths: The motivation of the proposed method DVERGE is clearly presented. Figure 1 convinces me that the proposed method can train an ensemble whose individual members share the minimum non-robust features. The key contribution of this paper is the vulnerability diversity. This metric is very intuitive and easy to implement in practice. Although it is not bounded, it can be relaxed to another stronger bound in real training, which is friendly to optimization. In the experiments, the proposed DVERGE is fairly compared to ADP and GAL, which are two recently proposed adversarial defending methods related to promoting diversity in an ensemble. DVERGE exhibits a significant improvement over ADP and GAL. More importantly, the improvement is fully interpretable. Figure 3 shows that the improvement comes from the fact that each ensemble member has a high chance to defend an attack transferred from the other members. The rolling diversity and rolling transferability also shows the effectiveness of DVERGE in diversifying vulnerabilities. In the appendix, the authors demonstrate that DVERGE maintains a good performance over transferability up to ensemble size 8. How to promote ensemble diversity is a significant research question even out of adversarial robustness domain. The proposed method (defining diversity metric based on the intermediate feature space of individual members) can potentially inspire a lot of other research works in the reliable deep learning domain.

Weaknesses: The computational overhead is not explained clearly. The proposed method requires another forward pass of a sampled batch, and a number of PGD steps. This leads to at least 2X more compute than any deterministic deep network. This is fine because most methods improving adversarial robustness also induces computational overhead. It would be better if the authors can provide a table comparing computational time between DVERGE, ADP and GAL. Also, the appendix mentioned DVERGE starts from the trained ensemble baselines. What happens if training DVERGE from the scratch? In general, diversity promoting regularizers have a diminishing effect as the ensemble size increases. The appendix shows the transferability plot up to ensemble size 8. The paper would be more convincing if there is plot on white box/black box accuracy (fixing \epsilon) v.s the ensemble size, so that one can compare the scalability of DVERGE, ADP and GAL with respect to ensemble size. The choice layer l is randomly chosen during training to prevent overfitting. Another design choice is to make use of the batch norm statistics to define diversity. Just replace the activation in equation 1 with batch norm statistics. This reduces the number of dimensions of the objective in the PGD step. It would be interesting if the authors can discuss such similar design choices. Or even an ablation study on fixing layer l during training and randomly sample l. The clean test accuracy is reported in the appendix. Only the ensemble accuracy is reported. A table including each individual member's accuracy (compared to ADP and GAL) can supplement the claim that DVERGE diversifies the vulnerabilities.

Correctness: The claims and method are correct and the empirical methodology looks correct to me.

Clarity: Yes

Relation to Prior Work: The paper discusses related works on improving ensemble adversarial robustness by promoting its diversity.

Reproducibility: Yes

Additional Feedback: Please see the above strengths and weaknesses. Overall, I think this is a good paper with strong empirical results. It also has potential to inspire many future works. POST REBUTTAL: Most of my concerns have been answered. I raised my score to 7.

Review 3

Summary and Contributions: The paper proposes to use distillation and diverse output to subvert adversarial attacks in an ensemble of networks, which gets better as the number of networks in the ensembles are added. The paper proposes a well-motivates objective that is interesting and novel in the context of adversarial learning, but has some similarity to previous work (see second point in weaknesses). The objective is intuitive and practical. The empirical evaluation is thorough and very convincing.

Strengths: - Well motivated objective in eqn - Fig 1 and Alg 1 are very nice and detailed - Multiple baselines are chosen, which appear to be SOTA in the field - The empirical performance is very convincing over the baselines. - As the number of model in the ensembles increase, the baselines seem to deteriorate, but the proposed method improves. This makes it a very practical improvement.

Weaknesses: - I understand the paper is not about ensembles but for ensembles for adversarial robustness, but I would like to see some papers that discuss ensembles and variants being cited in related works, since ensembles follow a very rich and established line of research. Namely [1,2,3,4,5]. - Eqn 3 and 4 are similar (to an extent) to the motivation of the proposed objective in [4]. I would like to see that being addressed, and how they are different. - NIT: in alg 1. i prefer \alpha being used instead of "lr" to denote learning rate - Resnet-20 is not a standard resnet variant to my knowledge. Is this similar to resnet-18? - Ensembles are very closely related to Bayesian NN, I would like to see a comment about - Whitebox accuracy performance is very similar to the baselines, but still good. - Fig 5. is also interesting, but is it possible to see other baselines with adv training as well on the same plot? - No error bars in any of the plots is very concerning to me. I much rather prefer seeing error bars to see beyond just the mean performance. [1] [2] [3] [4] [5]

Correctness: - Yes, but I would like to see error bars on all the plots. - The experiments are thorough otherwise.

Clarity: The paper is well written, and I like both the clarity of Fig 1 and Alg 1. The results are presented in an understandable and intuitive form which makes it easy for the reader to see the performance compared to the baselines. It would be if dashed and dotted lines were used in Fig 4 for better accessibility.

Relation to Prior Work: I would like to see more thorough related work on ensemble networks, since they follow a very rich line of research, which should be mentioned. I would also like to see a (small) discussion on the differences between eqn 3, 4 to [1]. [1]

Reproducibility: Yes

Additional Feedback: I am happy to raise my score if my suggestions are incorporated in the rebuttal. POST REBUTTAL: My questions have been answered. I have increased my rating. This is a good paper.

Review 4

Summary and Contributions: The paper proposes to improve the robustness of ensemble of CNNs for adversarial attacks. The intuition is to make the individual CNNs to have diversified adversarial vulnerability. To achieve that, the authors distil the adversarial examples of each sub-model and train other sub-models to the generated examples. Experiments are conducted on CIFAR and show the effectiveness of the proposed method.

Strengths: Although only CIFAR-10 is used for the experiments, authors provide extensive analysis and comparisons of the proposed method. The intuition is clear and sounds solid. The presentation is good overall.

Weaknesses: - Analysis about the training computation overhead is helpful to justify the proposed method. As the adversarial counterpart (ie Eq 1) is obtained by PGD for each sub-model, it might be computationally expensive for larger datasets like ImageNet. This may be worse when combining the proposed method with adversarial training. - The epsilon is sensitive to the number of sub-models (in Table 2 of the Appendix, 32.3%-40.0% vs 57.9%-52.4% for 3 or 8 sub-models). This may make the hyper-parameter choicing difficult especially when we want to deal with other larger datasets. - For black-box transfer attack, it could be better to add more baselines, in addition to ADP and GAL. For example, a few in [a]. - Discussion and comparison about how the proposed method differs from naive adversarial ensemble training are helpful to justify the proposed method. Also, it could be helpful to discuss what could be the theoretical and empirical difference between Eq 6 of the Appendix and the Eq 5 of the paper without the j!=i constraint in the summation. [a] Kurakin, Adversarial Attacks and Defences Competition --- post rebuttal updates --- I appreciate the response from the authors, and most of my concerns were addressed. Thus, I am updating the score to 6.

Correctness: The proposed method and experiments sounds technically correct.

Clarity: It is not clear what does the color in Figure 1 represent from the context. Does that indicate the prediction label?

Relation to Prior Work: Yes, ADP and GAL are discussed and compared in the paper.

Reproducibility: Yes

Additional Feedback: