Paper ID: | 6368 |
---|---|

Title: | Theoretical evidence for adversarial robustness through randomization |

-- The two theoretical results are, to the best of my knowledge, novel and make use of well-established ideas from information and probability theory in an interesting manner for their proofs. My concern with both the results is their significance and interpretation. The first theorem claims to show that the use of the exponential family guarantees robustness, but the actual content of the theorem concerns the growth of the divergence with the parameters of the added noise. The probability of this divergence being large is what actually controls the risk. The relationship between the noise parameters and this growth is not elucidated in the paper, making it hard to see how the robustness actually comes about theoretically. Further, the theorem concerns a quantity that has been derived from the true risk on line 99, but the connection between misclassification (the true risk) and the magnitude of the divergence used is not clear. -- For Theorem 2, from the plots in Figure 1, the accuracy guarantee falls to 0 even with a small adversarial budget, especially for smaller noise magnitudes. It would be instructive to provide a discussion of how the accuracy guarantee can be tightened since, for real classifiers, the accuracy is much higher. Further, the gap in Theorem 2 is between two true risks, so referring to it as â€˜generalizationâ€™, which is a term usually used for the gap in performance during training and test, is a misnomer. -- The evaluation results are interesting but need more clarity in their presentation. For example, in Figure 1, how is the accuracy guarantee with no adversary calculated when from Theorem 2, only the gap can be computed? Is that an empirically obtained quantity? If yes, then that has to be mentioned. -- In general, related work has been cited adequately but a reference to Ford et al. (Adversarial Examples are a Natural Consequence of Test Error in Noise) is missing, in spite of it being very closely related.

Many adversarial defense techniques have been developed recently for improving the robustness of neural networks against different adversarial attacks. This paper focuses on defending neural networks by injecting random noise drawn from a subclass of the exponential family of distributions during both training and testing. As listed above, the paper makes significant novel contributions to formalizing robustness in the considered setting and provides both experimental and theoretical evidence for increased robustness. However, I have a few concerns below I hope the authors can address during rebuttal: 1. Line 72-74 states that the paper provides results on certified robustness. However, the setting considered in this paper is probabilistic and does not provide absolute guarantees on the absence of "any" adversarial example in an adversarial region. There has been considerable work on training with certified defenses, see [1][2] and [3]. It would be good if the authors can compare and separate their work from those based on certified defenses. 2. In table 2, it is not clear what conclusion can be drawn from the reduced success rates for the two attacks wrt Madry et al. 2018. Can it be the case that the addition of noise make the attack problem harder for both attacks and thus they are unable to find adversarial examples while the number of adversarial examples remains the same as Madry et al. 2018? References: [1] Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML'18 [2]Differentiable Abstract Interpretation for Provably Robust Neural Networks, ICML 2018. [3] On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models, Arxiv 2018.

The paper considers adversarial robustness for models with random noises added, and provides some theoretical insights to understand the improvement in robustness under this setting. The main theorem in this paper basically says that: if the output of a deterministic function f(.) can be bounded by \Delta when the input is perturbed within a ball of radius \alpha, the Renyi divergence (a generalized K-L divergence) of the output of the network, plus some exponential family noise, are epsilon-close even under perturbation. In the Gaussian noise setting, robustness is in proportion to the minimum singular value of the covariance matrix of Gaussian distribution, which makes sense. The authors also conduct experiments using different noise distributions in exponential family. Several questions: 1. The authors defined d_p(y)-(\alpha,\epsilon,\gamma) robustness in a probabilistic manner. However it seems the main theorem actually gives a deterministic bound (the probability \gamma = 0). Is it possible to improve the main theorem to a probabilistic version, and uncover the relationship between gamma and epsilon? 2. The theoretical lower bound on accuracy under attack of Theorem 2 needs the entropy term. It can be estimated using a Monte Carlo method however there is no guarantee that the sampled entropy is close to the true entropy, thus there is no real guarantee here. The derived bound thus cannot give any provable robustness certificate for a given network. Overall, this is a good paper that proposes something promising to explain the robustness improvements from randomization. I recommend to accept this paper, however the authors should make sure to address the issues above. *************** After reading the rebuttal, my first question has been addressed. However I still have concerns with the sampling method to obtain the entropy - although theoretically it is asymptotically converging but it might need a large amount of examples, especially the experiments were done using only a few samples. Overall, I am okay to recommend accepting the paper, but this paper certainly has some limitations and still has room for improvement.