NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5860 Improving Black-box Adversarial Attacks with a Transfer-based Prior

### Reviewer 1

Originality: Although there is a previous work [17] talking about the same setting, the paper have a more theoretical and general analysis, which is very good in the originality. Quality: The overall paper quality is good. However, I have a comment on the experiments on performance of gradient estimation. The average cosine similarity is so low and it seems not very useful. I would suggest to make another experiments to show the effectiveness. See the improvements. Clarity: The paper is well-written and easy to follow. Significance: It could be very useful combine the transfer-based attack with blackbox attack in the practice. Although the performance is not significantly better than the previous STOA bandit attack, it would be a good supplement. However, I have some concerns regarding the experiment results. The RGF method should be very similar or exactly same with NES. However, the results show the RGF methods always outperforms NES, which doesn’t make sense.

### Reviewer 2

### Post response comments I would like to thank the authors for addressing my concerns carefully, especially on validating advantages of setting lambda adaptively over a fixed value. A regret is about estimation for cosine similarity (my concern 2). Although the response adds the specific value of S, it is still not explained **when** and **how often** to estimate cosine similarity (see line 197–198). It should have an important impact on query complexity but ignored in experiments. It is suggested to make similarity estimation clear in a final version. *** Transfer-based attack and query-based attack are two common types of black-box adversarial attack. The idea is OK to combine transfer-based attack and query-based attack. The paper proposes a simple method where the gradient of the surrogate model is used as a prior of the true gradient. My concerns are as follows. **Concern 1** My main concern lies on the novelty of the idea. It is clearly discussed in the previous paper *Guessing smart: Biased sampling for efﬁcient black-box adversarial attacks* that the gradient of the surrogate model. The formats are almost the same except the paper proposes an "adaptive" way to set the weight $\lambda^*$. **Concern 2** To set the weight $\lambda^*$, the proposed method has to solve another esitmation problem for cosine similarity between the surrogate gradient and the true gradient (or gradient norm). The esitmiation is quite hearistic and lacks necessary analysis. It is nececessary to analyze how the estimation influences the resulted true gradient estimate. More importantly, the estimation between the surrogate gradient and the true gradient is completely neglected in the experiement section. It is necessary to empirically investigate how to estimate the gradient norm, e.g., how to set $S$. **Concern 3** When investigating whether setting $\lambda^*$ adaptively is necessary, $\lambda=0.5$ is not a good choice. As Figure 2(b) shows (it would be better to plot the distribution of $\lambda^*$.), to get a higher cosine similarity, $\lambda$ should be much smaller than $0.5$. It would be better to compare P-RGF ($\lambda=0.05 (?)$) rather than P-RGF ($\lambda=0.5 (?)$).

### Reviewer 3

[Edit after the author feedback]: I thank the authors for addressing my comments during the author feedback. I have read the authors' response as well as the other reviews. The authors' response, especially the updated attack results on \ell_{\infty} adv trained models (Table B), addresses my concerns on the effectiveness of P-RGF. Overall, I think this submission is interesting and provides an efficient and effective adversarial attack approach. I am happy to raise my rating. ========================================================== Summary: To improve attack success rates and query efficiency for black-box adversarial attacks, this paper proposed a prior-guided method which is able to better estimate the gradient in a high-dimensional under the black-box scenario. Theoretically, this paper establishes the optimal coefficient of the proposed algorithm. Empirically, the proposed algorithm can use fewer queries to achieve higher attack success rates compared with previous approaches. Pros: - This paper derives the optimal coefficient, i.e., $\lambda^{*}$, in the proposed algorithmic framework, which makes the algorithm more efficient and effective. Also, the algorithm is simple and easy to implement. - The proposed attack framework is general and is able to incorporate various data-dependent prior Information. - The empirical results demonstrate the effectiveness of the proposed method, especially when attacking defensive models. Limitation & Questions: - As described in Table 1, it seems that the improvement over the RGF method is not significant. - As adversarial training, like [22], has been shown to be an effective approach to defend against adversarial attacks, it would be better to add the attack results on adversarial trained defensive models to section 4.3, such as attacking models trained by adversarial training ($\ell_{2}$ model with $\epsilon = 0.5$).