{"title": "Adversarial vulnerability for any classifier", "book": "Advances in Neural Information Processing Systems", "page_first": 1178, "page_last": 1187, "abstract": "Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets.", "full_text": "Adversarial vulnerability for any classi\ufb01er\n\nAlhussein Fawzi\n\nDeepMind\n\nafawzi@google.com\n\nHamza Fawzi\n\nDepartment of Applied Mathematics\n\n& Theoretical Physics\nUniversity of Cambridge\n\nh.fawzi@damtp.cam.ac.uk\n\nOmar Fawzi\nENS de Lyon\u2217\n\nomar.fawzi@ens-lyon.fr\n\nAbstract\n\nDespite achieving impressive performance, state-of-the-art classi\ufb01ers remain highly\nvulnerable to small, imperceptible, adversarial perturbations. This vulnerability\nhas proven empirically to be very intricate to address. In this paper, we study the\nphenomenon of adversarial perturbations under the assumption that the data is\ngenerated with a smooth generative model. We derive fundamental upper bounds\non the robustness to perturbations of any classi\ufb01cation function, and prove the\nexistence of adversarial perturbations that transfer well across different classi\ufb01ers\nwith small risk. Our analysis of the robustness also provides insights onto key\nproperties of generative models, such as their smoothness and dimensionality of\nlatent space. We conclude with numerical experimental results showing that our\nbounds provide informative baselines to the maximal achievable robustness on\nseveral datasets.\n\n1\n\nIntroduction\n\nDeep neural networks are powerful models that achieve state-of-the-art performance across several\ndomains, such as bioinformatics [1, 2], speech [3], and computer vision [4, 5]. Though deep networks\nhave exhibited very good performance in classi\ufb01cation tasks, they have recently been shown to be\nunstable to adversarial perturbations of the data [6, 7]. In fact, very small and often imperceptible\nperturbations of the data samples are suf\ufb01cient to fool state-of-the-art classi\ufb01ers and result in incorrect\nclassi\ufb01cation. This discovery of the surprising vulnerability of classi\ufb01ers to perturbations has led to a\nlarge body of work that attempts to design robust classi\ufb01ers [8, 9, 10, 11, 12, 13]. However, advances\nin designing robust classi\ufb01ers have been accompanied with stronger perturbation schemes that defeat\nsuch defenses [14, 15, 16].\nIn this paper, we assume that the data distribution is de\ufb01ned by a smooth generative model (map-\nping latent representations to images), and study theoretically the existence of small adversarial\nperturbations for arbitrary classi\ufb01ers. We summarize our main contributions as follows:\n\n\u2022 We show fundamental upper bounds on the robustness of any classi\ufb01er to perturbations,\nwhich provides a baseline to the maximal achievable robustness. When the latent space of\nthe data distribution is high dimensional, our analysis shows that any classi\ufb01er is vulnerable\nto very small perturbations. Our results further suggest the existence of a tight relation\nbetween robustness and linearity of the classi\ufb01er in the latent space.\n\n\u2022 We prove the existence of adversarial perturbations that transfer across different classi\ufb01ers.\nThis provides theoretical justi\ufb01cation to previous empirical \ufb01ndings that highlighted the\nexistence of such transferable perturbations.\n\n\u2217Univ Lyon, ENS de Lyon, CNRS, UCBL, LIP, F-69342, Lyon Cedex 07, France\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\f\u2022 We quantify the difference between the robustness to adversarial examples in the data mani-\nfold and unconstrained adversarial examples, and show that the two notions of robustness\ncan be precisely related: for any classi\ufb01er f with in-distribution robustness r, there exists a\nclassi\ufb01er \u02dcf that achieves unconstrained robustness r/2. This further provides support to the\nempirical observations in [17, 18].\n\n\u2022 We evaluate our bounds in several experimental setups (CIFAR-10 and SVHN), and show\n\nthat they yield informative baselines to the maximal achievable robustness.\n\nOur robustness analysis provides in turn insights onto desirable properties of generative models\ncapturing real-world distributions. In particular, the intriguing generality of our analysis implies that\nwhen the data distribution is modeled through a smooth and generative model with high-dimensional\nlatent space, there exist small-norm perturbations of images that fool humans for any discriminative\ntask de\ufb01ned on the data distribution. If, on the other hand, it is the case that the human visual\nsystem is inherently robust to small perturbations (e.g., in (cid:96)p norm), then our analysis shows that\na distribution over natural images cannot be modeled by smooth and high-dimensional generative\nmodels. Going forward in modeling complex natural image distributions, our results hence suggest\nthat low dimensional, non-smooth generative models are important constraints to capture the real-\nworld distribution of images; not satisfying such constraints can lead to small adversarial perturbations\nfor any classi\ufb01er, including the human visual system.\n\n2 Related work\n\n\u221a\nIt was proven in [19, 20] that for certain families of classi\ufb01ers, there exist adversarial perturbations\nthat cause misclassi\ufb01cation of magnitude O(1/\nd), where d is the data dimension, provided the\nrobustness to random noise is \ufb01xed (which is typically the case if e.g., the data is normalized). In\naddition, fundamental limits on the robustness of classi\ufb01ers were derived in [19] for some simple\nclassi\ufb01cation families. Other works have instead studied the existence of adversarial perturbations,\nunder strong assumptions on the data distribution [18, 21]. In this work, motivated by the success\nof generative models mapping latent representations with a normal prior, we instead study the\nexistence of robust classi\ufb01ers under this general data-generating procedure and derive bounds on\nthe robustness that hold for any classi\ufb01cation function. A large number of techniques have recently\nbeen proposed to improve the robustness of classi\ufb01ers to perturbations, such as adversarial training\n[8], robust optimization [9, 10], regularization [11], distillation [12], stochastic networks [13], etc...\nUnfortunately, such techniques have been shown to fail whenever a more complex attack strategy is\nused [14, 15], or when it is evaluated on a more complex dataset. Other works have recently studied\nprocedures and algorithms to provably guarantee a certain level of robustness [22, 23, 24, 25, 26],\nand have been applied to small datasets (e.g., MNIST). For large scale, high dimensional datasets, the\nproblem of designing robust classi\ufb01ers is entirely open. We \ufb01nally note that adversarial examples for\ngenerative models have recently been considered in [27]; our aim here is however different as our\ngoal is to bound the robustness of classi\ufb01ers when data comes from a generative model.\n\n3 De\ufb01nitions and notations\nLet g be a generative model that maps latent vectors z \u2208 Z := Rd to the space of images X := Rm,\nwith m denoting the number of pixels. To generate an image according to the distribution of natural\nimages \u00b5, we generate a random vector z \u223c \u03bd according to the standard Gaussian distribution\n\u03bd = N (0, Id), and we apply the map g; the resulting image is then g(z). This data-generating\nprocedure is motivated by numerous previous works on generative models, whereby natural-looking\nimages are obtained by transforming normal vectors through a deep neural network [28], [29], [30],\n[31], [32].2 Let f : Rm \u2192 {1, . . . , K} be a classi\ufb01er mapping images in Rm to discrete labels\n{1, . . . , K}. The discriminator f partitions X into K sets Ci = {x \u2208 X : f (x) = i} each of which\ncorresponds to a different predicted label. The relative proportion of points in class i is equal to\nP(Ci) = \u03bd(g\u22121(Ci)), the Gaussian measure of g\u22121(Ci) in Z.\n\n2Instead of sampling from N (0, Id) in Z, some generative models sample from the uniform distribution in\n\n[\u22121, 1]d. The results of this paper can be easily extended to such generative procedures.\n\n2\n\n\fThe goal of this paper is to study the robustness of f to additive perturbations under the assumption\nthat the data is generated according to g. We de\ufb01ne two notions of robustness. These effectively\nmeasure the minimum distance one has to travel in image space to change the classi\ufb01cation decision.\n\u2022 In-distribution robustness: For x = g(z), we de\ufb01ne the in-distribution robustness rin(x)\n\nas follows:\n\nrin(x) = min\n\nr\u2208Z (cid:107)g(z + r) \u2212 x(cid:107) s.t. f (g(z + r)) (cid:54)= f (x),\n\nwhere (cid:107) \u00b7 (cid:107) denotes an arbitrary norm on X . Note that the perturbed image, g(z + r) is\nconstrained to lie in the image of g, and hence belongs to the support of the distribution \u00b5.\n\u2022 Unconstrained robustness: Unlike the in-distribution setting, we measure here the ro-\nbustness to arbitrary perturbations in the image space; that is, the perturbed image is not\nconstrained anymore to belong to the data distribution \u00b5.\n\nrunc(x) = min\n\nr\u2208X (cid:107)r(cid:107) s.t. f (x + r) (cid:54)= f (x).\n\nThis notion of robustness corresponds to the widely used de\ufb01nition of adversarial pertur-\nbations. It is easy to see that this robustness de\ufb01nition is smaller than the in-distribution\nrobustness; i.e., runc(x) \u2264 rin(x).\n\nIn this paper, we assume that the generative model is smooth, in the sense that it satis\ufb01es a modulus\nof continuity property, de\ufb01ned as follows:\nAssumption 1. We assume that g admits a monotone invertible modulus of continuity \u03c9; i.e.,3\n\n\u2200z, z(cid:48) \u2208 Z,(cid:107)g(z) \u2212 g(z(cid:48))(cid:107) \u2264 \u03c9((cid:107)z \u2212 z(cid:48)(cid:107)2).\n\n(1)\n\nNote that the above assumption is milder than assuming Lipschitz continuity. In fact, the Lipschitz\nproperty corresponds to choosing \u03c9(t) to be a linear function of t. In particular, the above assumption\ndoes not require that \u03c9(0) = 0, which potentially allows us to model distributions with disconnected\nsupport.4\nIt should be noted that generator smoothness is a desirable property of generative models. This\nproperty is often illustrated empirically by generating images along a straight path in the latent space\n[30], and verifying that the images undergo gradual semantic changes between the two endpoints. In\nfact, smooth transitions is often used as a qualitative evidence that the generator has learned relevant\nfactors of variation.\nFig. 1 summarizes the problem setting and notations. Assuming that the data is generated according\nto g, we analyze in the remainder of the paper the robustness of arbitrary classi\ufb01ers to perturbations.\n\n4 Analysis of the robustness to perturbations\n\n4.1 Upper bounds on robustness\n\nWe state a general bound on the robustness to perturbations and derive two special cases to make\nmore explicit the dependence on the distribution and number of classes.\nTheorem 1. Let f : Rm \u2192 {1, . . . , K} be an arbitrary classi\ufb01cation function de\ufb01ned on the image\nspace. Then, the fraction of datapoints having robustness less than \u03b7 satis\ufb01es:\n\n(\u03a6(a(cid:54)=i + \u03c9\u22121(\u03b7)) \u2212 \u03a6(a(cid:54)=i)) ,\n\n(2)\n\nP (rin(x) \u2264 \u03b7) \u2265 K(cid:88)\n\nwhere \u03a6 is the cdf of N (0, 1), and a(cid:54)=i = \u03a6\u22121\n\ni=1\n\n(cid:32)\n\nP\n\n(cid:32)(cid:83)\n\nj(cid:54)=i\n\n(cid:33)(cid:33)\n\nCj\n\n.\n\n3This assumption can be extended to random z (see C.2 in the supp. material). For ease of exposition\n\nhowever, we use here the deterministic assumption.\n\n4In this paper, we use the term smooth generative models to denote that the function \u03c9(\u03b4) takes small values\n\nfor small \u03b4.\n\n3\n\n\fFigure 1: Setting used in this paper. The data distribution is obtained by mapping N (0, Id) through\ng (we set d = 1 and g(z) = (cos(2\u03c0z), sin(2\u03c0z)) in this example). The thick circle indicates the\nsupport of the data distribution \u00b5 in Rm (m = 2 here). The binary discriminative function f separates\nthe data space into two classi\ufb01cation regions (red and blue colors). While the in-distribution perturbed\nimage is required to belong to the data support, this is not necessarily the case in the unconstrained\nsetting. In this paper, we do not put any assumption on f, resulting in potentially arbitrary partitioning\nof the data space. While the existence of very small adversarial perturbations seems counter-intuitive\nin this low-dimensional illustrative example (i.e., rin and runc can be large for some choices of f), we\nshow in the next sections that this is the case in high dimensions.\n\nIn particular, if for all i, P(Ci) \u2264 1\n\n2 (the classes are not too unbalanced), we have\n\nP (rin(x) \u2264 \u03b7) \u2265 1 \u2212\n\ne\u2212\u03c9\u22121(\u03b7)2/2 .\n\n(3)\n\n(cid:114) \u03c0\n\n2\n\nTo see the dependence on the number of classes more explicitly, consider the setting where the classes\nare equiprobable, i.e., P(Ci) = 1\n\nK for all i, K \u2265 5, then\n\nP (rin(x) \u2264 \u03b7) \u2265 1 \u2212\n\ne\u2212\u03c9\u22121(\u03b7)2/2e\n\n\u2212\u03b7\n\nlog\n\n(4)\n\n(cid:114) \u03c0\n\n2\n\n(cid:114)\n\n(cid:16) K2\n\n(cid:17)\n\n4\u03c0 log(K)\n\n.\n\nThis theorem is a consequence of the Gaussian isoperimetric inequality \ufb01rst proved in [33] and [34].\nThe proofs can be found in the supplementary material.\n\n\u221a\n\n\u221a\n\nd \u2212 1,\n\nprobability exceeding 0.8. This should be compared to the typical norm which is at least\n\nRemark 1. Interpretation. For easiness of interpretation, we assume that the function g is Lipschitz\ncontinuous, in which case \u03c9\u22121(\u03b7) is replaced with \u03b7/L where L is the Lipschitz constant. Then,\nEq. (3) shows the existence of perturbations of norm \u03b7 \u221d L that can fool any classi\ufb01er. This\nnorm should be compared to the typical norm given by E(cid:107)g(z)(cid:107). By normalizing the data, we\ncan assume E(cid:107)g(z)(cid:107) = E(cid:107)z(cid:107)2 without loss of generality.5 As z has a normal distribution, we\nhave E(cid:107)z(cid:107)2 \u2208 [\nE(cid:107)g(z)(cid:107) \u2265 \u221a\nd] and thus the typical norm of an element in the data set satis\ufb01es\nd \u2212 1. Now if we plug in \u03b7 = 2L, we obtain that the robustness is less than 2L with\nd \u2212 1. Our\nresult therefore shows that when d is large and g is smooth (in the sense that L (cid:28) \u221a\nd), there exist\nsmall adversarial perturbations that can fool arbitrary classi\ufb01ers f. Fig. 2 provides an illustration of\nthe upper bound, in the case where \u03c9 is the identity function.\nRemark 2. Dependence on K. Theorem 1 shows an increasing probability of misclassi\ufb01cation with\nthe number of classes K. In other words, it is easier to \ufb01nd adversarial perturbations in the setting\nwhere the number of classes is large, than for a binary classi\ufb01cation task.6 This dependence con\ufb01rms\nempirical results whereby the robustness is observed to decrease with the number of classes. The\ndependence on K captured in our bounds is in contrast to previous bounds that showed decreasing\nprobability of fooling the classi\ufb01er, for larger number of classes [20].\nRemark 3. Classi\ufb01cation-agnostic bound. Our bounds hold for any classi\ufb01cation function f, and\nare not speci\ufb01c to a family of classi\ufb01ers. This is unlike the work of [19] that establishes bounds on\nthe robustness for speci\ufb01c classes of functions (e.g., linear or quadratic classi\ufb01ers).\n\n\u221a\n\n5Without this assumption, the following discussion applies if we replace the Lipschitz constant with the\n\nnormalized Lipschitz constant L(cid:48) = L\n\nE(cid:107)z(cid:107)2\nE(cid:107)g(z)(cid:107) .\n\n6We assume here equiprobable classes.\n\n4\n\n\fRemark 4. How tight is the upper bound on robustness in Theorem 1? Assuming that the\nsmoothness assumption in Eq. 1 is an equality, let the classi\ufb01er f be such that f \u25e6 g separates the\nlatent space into B1 = g\u22121(C1) = {z : z1 \u2265 0} and B2 = g\u22121(C2) = {z : z1 < 0}. Then, it\nfollows that\nP(rin(x)) \u2264 \u03b7) = P(\u2203r : (cid:107)g(z + r) \u2212 g(z)(cid:107) \u2264 \u03b7, f (g(z + r)) (cid:54)= f (g(z))))\n\n= P(\u2203r : (cid:107)r(cid:107)2 \u2264 \u03c9\u22121(\u03b7), sgn(z1 + r1)sgn(z1) < 0)\n= P(z \u2208 B1, z1 < \u03c9\u22121(\u03b7)) + P(z \u2208 B2, z1 \u2265 \u2212\u03c9\u22121(\u03b7)) = 2(\u03a6(\u03c9\u22121(\u03b7)) \u2212 \u03a6(0)),\n\nwhich precisely corresponds to Eq. (2). In this case, the bound in Eq. (2) is therefore an equality.\nMore generally, this bound is an equality if the classi\ufb01er induces linearly separable regions in the\nlatent space.7 This suggests that classi\ufb01ers are maximally robust when the induced classi\ufb01cation\nboundaries in the latent space are linear. We stress on the fact that boundaries in the Z-space can be\nvery different from the boundaries in the image space. In particular, as g is in general non-linear, f\nmight be a highly non-linear function of the input space, while z (cid:55)\u2192 (f \u25e6 g)(z) is a linear function in\nz. We provide an explicit example in the supplementary material illustrating this remark.\n\n\u221a\nFigure 2: Upper bound (Theorem 1) on the median of the normalized robustness rin/\nd for different\nvalues of the number of classes K, in the setting where \u03c9(t) = t. We assume that classes have equal\nmeasure (i.e., P(Ci) = 1/K).\n\nRemark 5. Adversarial perturbations in the latent space While the quantities introduced in\nSection 3 measure the robustness in the image space, an alternative is to measure the robustness in\nthe latent space, de\ufb01ned as rZ = minr (cid:107)r(cid:107)2 s.t. f (g(z + r)) (cid:54)= f (g(z)). For natural images, latent\nvectors provide a decomposition of images into meaningful factors of variation, such as features\nof objects in the image, illumination, etc... Hence, perturbations of vectors in the latent space\nmeasure the amount of change one needs to apply to such meaningful latent features to cause data\nmisclassi\ufb01cation. A bound on the magnitude of the minimal perturbation in the latent space (i.e., rZ)\ncan be directly obtained from Theorem 1 by setting \u03c9 to identity (i.e., \u03c9(t) = t). Importantly, note\nthat no assumptions on the smoothness of the generator g are required for our bounds to hold when\nconsidering this notion of robustness.\nRelation between in-distribution robustness and unconstrained robustness.\nWhile the previous bound is speci\ufb01cally looking at the in-distribution robustness, in many cases, we\nare interested in achieving unconstrained robustness; that is, the perturbed image is not constrained\nto belong to the data distribution (or equivalently to the range of g). It is easy to see that any\nbound derived for the in-distribution robustness rin(x) also holds for the unconstrained robustness\nrunc(x) since it clearly holds that runc(x) \u2264 rin(x). One may wonder whether it is possible to get\na better upper bound on runc(x) directly. We show here that this is not possible if we require our\nbound to hold for any general classi\ufb01er. Speci\ufb01cally, we construct a family of classi\ufb01ers for which\nrunc(x) \u2265 1\n\n2 rin(x), which we now present:\n\n7In the case where Eq. (1) is an inequality, we will not exactly achieve the bound, but get closer to it when\n\nf \u25e6 g is linear.\n\n5\n\n02004006008001000Dimension d00.020.040.060.080.1Normalized robustnessK = 2K = 10K = 100K = 1000\fFor a given classi\ufb01er f in the image space, de\ufb01ne the classi\ufb01er \u02dcf constructed in a nearest neighbour\nstrategy:\n\n\u02dcf (x) = f (g(z\u2217)) with\n\nz\u2217 = arg min\n\nz\n\n(cid:107)g(z) \u2212 x(cid:107).\n\n(5)\n\nNote that \u02dcf behaves exactly in the same way as f on the image of g (in particular, it has the same risk\nand in-distribution robustness). We show here that it has an unconstrained robustness that is at least\nhalf of the in-distribution robustness of f.\nTheorem 2. For the classi\ufb01er \u02dcf, we have runc(x) \u2265 1\nThis result shows that if a classi\ufb01er has in-distribution robustness r, then we can construct a classi\ufb01er\nwith unconstrained robustness r/2, through a simple modi\ufb01cation of the original classi\ufb01er f. Hence,\nclassi\ufb01cation-agnostic limits derived for both notions of robustness are essentially the same. It\nshould further be noted that the procedure in Eq. (5) provides a constructive method to increase the\nrobustness of any classi\ufb01er to unconstrained perturbations. Such a nearest neighbour strategy is useful\nwhen the in-distribution robustness is much larger than the unconstrained robustness, and permits the\nlatter to match the former. This approach has recently been found to be successful in increasing the\nrobustness of classi\ufb01ers when accurate generative models can be learned in [35]. Other techniques\n[17] build on this approach, and further use methods to increase the in-distribution robustness.\n\n2 rin(x).\n\n4.2 Transferability of perturbations\n\nOne of the most intriguing properties about adversarial perturbations is their transferability [6, 36]\nacross different models. Under our data model distribution, we study the existence of transferable\nadversarial perturbations, and show that two models with approximately zero risk will have shared\nadversarial perturbations.\nTheorem 3 (Transferability of perturbations). Let f, h be two classi\ufb01ers. Assume that P(f \u25e6 g(z) (cid:54)=\nh \u25e6 g(z)) \u2264 \u03b4 (e.g., if f and h have a risk bounded by \u03b4/2 for the data set generated by g). In\naddition, assume that P(Ci(f )) + \u03b4 \u2264 1\n\n2 for all i.8 Then,\n\n(cid:27)\n\n(cid:26)\n\nP\n\n\u2203v : (cid:107)v(cid:107)2 \u2264 \u03b7 and f (g(z) + v) (cid:54)= f (g(z))\nh(g(z) + v) (cid:54)= h(g(z))\ne\u2212\u03c9\u22121(\u03b7)2/2 \u2212 2\u03b4.\n\n(cid:114) \u03c0\n\n\u2265 1 \u2212\n\n2\n\n(6)\n\nCompared to Theorem 1 which bounds the robustness to adversarial perturbations, the extra price to\npay here to \ufb01nd transferable adversarial perturbations is the 2\u03b4 term, which is small if the risk of\nboth classi\ufb01ers is small. Hence, our bounds provide a theoretical explanation for the existence of\ntransferable adversarial perturbations, which were previously shown to exist in [6, 36]. The existence\nof transferable adversarial perturbations across several models with small risk has important security\nimplications, as adversaries can, in principle, fool different classi\ufb01ers with a single, classi\ufb01er-agnostic,\nperturbation. The existence of such perturbations signi\ufb01cantly reduces the dif\ufb01culty of attacking\n(potentially black box) machine learning models.\n\n4.3 Approximate generative model\n\nIn the previous results, we have assumed that the data distribution is exactly described by the\ngenerative model g (i.e., \u00b5 = g\u2217(\u03bd) where g\u2217(\u03bd) is the pushforward of \u03bd via g). However, in many\ncases, such generative models only provide an approximation to the true data distribution \u00b5. In\nthis section, we speci\ufb01cally assume that the generated distribution g\u2217(\u03bd) provides an approximation\nto the true underlying distribution in the 1-Wasserstein sense on the metric space (X ,(cid:107) \u00b7 (cid:107)); i.e.,\nW (g\u2217(\u03bd), \u00b5) \u2264 \u03b4, and derive upper bounds on the robustness. This assumption is in line with\nrecent advances in generative models, whereby the generator provides a good approximation (in\nthe Wasserstein sense) to the true distribution, but does not exactly \ufb01t it [31]. We show here that\nsimilar upper bounds on the robustness (in expectation) hold, as long as g\u2217(\u03bd) provides an accurate\napproximation of the true distribution \u00b5.\n\n8This assumption is only to simplify the statement, a general statement can be easily derived in the same way.\n\n6\n\n\fTheorem 4. We use the same notations as in Theorem 1. Assume that the generator g provides a \u03b4\napproximation of the true distribution \u00b5 in the 1-Wasserstein sense on the metric space (X ,(cid:107) \u00b7 (cid:107));\nthat is, W (g\u2217(\u03bd), \u00b5) \u2264 \u03b4 (where g\u2217(\u03bd) is the pushforward of \u03bd via g), the following inequality holds\nprovided \u03c9 is concave\n\n(cid:32) K(cid:88)\n\ni=1\n\nrunc(x) \u2264 \u03c9\n\nE\nx\u223c\u00b5\n\n\u2212a(cid:54)=i\u03a6(\u2212a(cid:54)=i) +\n\n(cid:33)\n\n+ \u03b4,\n\n2\u03c0\n\ne\u2212a2(cid:54)=i/2\u221a\n(cid:33)\n\n+ \u03b4.\n\nwhere runc(x) is the unconstrained robustness in the image space. In particular, for K \u2265 5 equiprob-\nable classes, we have\n\n(cid:32)\n\nlog(4\u03c0 log(K))\n\n(cid:112)2 log(K)\n\nrunc(x) \u2264 \u03c9\n\nE\nx\u223c\u00b5\n\nIn words, when the data is de\ufb01ned according to a distribution which can be approximated by a\nsmooth, high-dimensional generative model, our results show that arbitrary classi\ufb01ers will have small\nadversarial examples in expectation. We also note that as K grows, this bound decreases and even\ngoes to zero under the sole condition that \u03c9 is continuous at 0. Note however that the decrease is\nslow as it is only logarithmic.\n\n5 Experimental evaluation\n\nWe now evaluate our bounds on the SVHN dataset [37] which contains color images of house numbers,\nand the task is to classify the digit at the center of the image. In all this section, computations of\nperturbations are done using the algorithm in [38].9 The dataset contains 73, 257 training images, and\n26, 032 test images (we do not use the images in the \u2019extra\u2019 set). We train a DCGAN [30] generative\nmodel on this dataset, with a latent vector dimension d = 100, and further consider several neural\nnetworks architectures for classi\ufb01cation.10 For each classi\ufb01er, the empirical robustness is compared\nto our upper bound.11 In addition to reporting the in-distribution and unconstrained robustness, we\nalso report the robustness in the latent space: rZ = minr (cid:107)r(cid:107)2 s.t. f (g(z + r)) (cid:54)= f (g(z)). For this\nrobustness setting, note that the upper bound exactly corresponds to Theorem 1 with \u03c9 set to the\nidentity map. Results are reported in Table 1.\nObserve \ufb01rst that the upper bound on the robustness in the latent space is of the same order of\nmagnitude as the empirical robustness computed in the Z-space, for the different tested classi\ufb01ers.\nThis suggests that the isoperimetric inequality (which is the only source of inequality in our bound,\nwhen factoring out smoothness) provides a reasonable baseline that is on par with the robustness of\nbest classi\ufb01ers. In the image space, the theoretical prediction from our classi\ufb01er-agnostic bounds\nis one order of magnitude larger than the empirical estimates. Note however that our bound is still\nnon-vacuous, as it predicts the norm of the required perturbation to be approximately 1/3 of the\nnorm of images (i.e., normalized robustness of 0.36). This potentially leaves room for improving\nthe robustness in the image space. Moreover, we believe that the bound on the robustness in the\nimage space is not tight (unlike the bound in the Z space) as the smoothness assumption on g can be\nconservative.\nFurther comparisons of the \ufb01gures between in-distribution and unconstrained robustness in the image\nspace interestingly show that for the simple LeNet architecture, a large gap exists between these\ntwo quantities. However, by using more complex classi\ufb01ers (ResNet-18 and ResNet-101), the gap\nbetween in-distribution and unconstrained robustness gets smaller. Recall that Theorem 2 says that\nany classi\ufb01er can be modi\ufb01ed in a way that the in-distribution robustness and unconstrained robustness\n\n9Note that in order to estimate robustness quantities (e.g., rin), we do not need the ground truth label, as the\nde\ufb01nition only involves the change of the estimated label. Estimation of the robustness can therefore be readily\ndone for automatically generated images.\n\n10For the SVHN and CIFAR-10 experiments, we show examples of generated images and perturbed images in\nthe supplementary document (Section C.3). Moreover, we provide in C.1 details on the architectures of the used\nmodels.\n11To evaluate numerically the upper bound, we have used a probabilistic version of the modulus of continuity,\nwhere the property is not required to be satis\ufb01ed for all z, z(cid:48), but rather with high probability, and accounted for\nthe error probability in the bound. We refer to the supp. material for the detailed optimization used to estimate\nthe smoothness parameters.\n\n7\n\n\fError rate\nRobustness in the Z-space\nIn-distribution robustness\nUnconstrained robustness\n\nUpper bound\non robustness\n\n-\n\n16 \u00d7 10\u22123\n36 \u00d7 10\u22122\n36 \u00d7 10\u22122\n\n2-Layer LeNet\n\nResNet-18\n\nResNet-101\n\n11%\n\n6.1 \u00d7 10\u22123\n3.3 \u00d7 10\u22122\n0.39 \u00d7 10\u22122\n\n4.8%\n\n6.1 \u00d7 10\u22123\n3.1 \u00d7 10\u22122\n1.1 \u00d7 10\u22122\n\n4.2 %\n\n6.6 \u00d7 10\u22123\n3.1 \u00d7 10\u22122\n1.4 \u00d7 10\u22122\n\nTable 1: Experiments on SVHN dataset. We report the 25% percentile of the normalized robustness\nat each cell, where probabilities are computed either theoretically (for the upper bound) or empirically.\nMore precisely, we report the following quantities for the upper bound column: For the robustness\nin the Z space, we report t/E((cid:107)z(cid:107)2) such that P (minr (cid:107)r(cid:107)2 s.t. f (g(z + r)) (cid:54)= f (g(z)) \u2264 t) \u2265\n0.25, using Theorem 1 with \u03c9 taken as identity. For the robustness in image-space, we report\nt/E((cid:107)g(z)(cid:107)2) such that P (rin(x) \u2264 t) \u2265 0.25, using Theorem 1, with \u03c9 estimated empirically\n(Section C.2 in supp. material).\n\nUpper bound\non robustness\n\nVGG [40] Wide ResNet [41]\n\nError rate\nRobustness in the Z-space\nIn-distribution robustness\nUnconstrained robustness\n\n-\n\n0.016\n0.10\n0.10\n\n5.5%\n\n2.5 \u00d7 10\u22123\n4.8 \u00d7 10\u22123\n0.23 \u00d7 10\u22123\n\n3.9%\n\n3.0 \u00d7 10\u22123\n5.9 \u00d7 10\u22123\n0.20 \u00d7 10\u22123\n\nWide ResNet\n+ Adv. training\n\n[10, 15]\n\n16.0%\n\n3.6 \u00d7 10\u22123\n8.3 \u00d7 10\u22123\n2.0 \u00d7 10\u22123\n\nTable 2: Experiments on CIFAR-10 (same setting as in Table 1). See supp. for details about models.\n\nonly differ by a factor 2, while preserving the accuracy. But this modi\ufb01cation may result in a more\ncomplicated classi\ufb01er compared to the original one; for example starting with a linear classi\ufb01er, the\nmodi\ufb01ed classi\ufb01er will in general not be linear. This interestingly matches with our numerical values\nfor this experiment, as the multiplicative gap between in-distribution and unconstrained robustness\napproaches 2 as we make the classi\ufb01cation function more complex (e.g., in-distribution robustness of\n3.1 \u00d7 10\u22122 and out-distribution 1.4 \u00d7 10\u22122 for ResNet-101).\nWe now consider the more complex CIFAR-10 dataset [39]. The CIFAR-10 dataset consists of 10\nclasses of 32 \u00d7 32 color natural images. Similarly to the previous experiment, we used a DCGAN\ngenerative model with d = 100, and tested the robustness of state-of-the-art deep neural network\nclassi\ufb01ers. Quantitative results are reported in Table 2. Our bounds notably predict that any classi\ufb01er\nde\ufb01ned on this task will have perturbations not exceeding 1/10 of the norm of the image, for 25% of\nthe datapoints in the distribution. Note that using the PGD adversarial training strategy of [10] (which\nconstitutes one of the most robust models to date [15]), the robustness is signi\ufb01cantly improved,\ndespite still being \u223c 1 order of magnitude smaller than the baseline of 0.1 for the in-distribution\nrobustness. The construction of more robust classi\ufb01ers, alongside better empirical estimates of the\nquantities involved in the bound/improved bounds will hopefully lead to a convergence of these two\nquantities, hence guaranteeing optimality of the robustness of our classi\ufb01ers.\n\n6 Discussion\n\nWe have shown the existence of a baseline robustness that no classi\ufb01er can surpass, whenever the\ndistribution is approximable by a generative model mapping latent representations to images. The\nbounds lead to informative numerical results: for example, on the CIFAR-10 task (with a DCGAN\napproximator), our upper bound shows that a signi\ufb01cant portion of datapoints can be fooled with\na perturbation of magnitude 10% that of an image. Existing classi\ufb01ers however do not match the\nderived upper bound. Moving forward, we expect the design of more robust classi\ufb01ers to get closer\nto this upper bound. The existence of a baseline robustness is fundamental in that context in order to\nmeasure the progress made and compare to the optimal robustness we can hope to achieve.\n\n8\n\n\fIn addition to providing a baseline, this work has several practical implications on the robustness\nfront. To construct classi\ufb01ers with better robustness, our analysis suggests that these should have\nlinear decision boundaries in the latent space; in particular, classi\ufb01ers with multiple disconnected\nclassi\ufb01cation regions will be more prone to small perturbations. We further provided a constructive\nway to provably close the gap between unconstrained robustness and in-distribution robustness.\nOur analysis at the intersection of classi\ufb01ers\u2019 robustness and generative modeling has further led\nto insights onto generative models, due to its intriguing generality. If we take as a premise that\nhuman visual system classi\ufb01ers require large-norm perturbations to be fooled (which is implicitly\nassumed in many works on adversarial robustness, though see [42]), our work shows that natural\nimage distributions cannot be modeled as very high dimensional and smooth mappings. While\ncurrent dimensions used for the latent space (e.g., d = 100) do not lead to any contradiction with\nthis assumption (as upper bounds are suf\ufb01ciently large), moving to higher dimensions for more\ncomplex datasets might lead to very small bounds. To model such datasets, the prior distribution,\nsmoothness and dimension properties should therefore be carefully set to avoid contradictions with\nthe premise. For example, conditional generative models can be seen as non-smooth generative\nmodels, as different generating functions are used for each class. We \ufb01nally note that the derived\nresults do bound the norm of the perturbation, and not the human perceptibility, which is much harder\nto quantify. We leave it as an open question to derive bounds on more perceptual metrics.\n\nAcknowledgments\n\nA.F. would like thank Seyed Moosavi, Wojtek Czarnecki, Neil Rabinowitz, Bernardino Romera-\nParedes and the DeepMind team for useful feedbacks and discussions.\n\nReferences\n[1] M. Spencer, J. Eickholt, and J. Cheng, \u201cA deep learning network approach to ab initio protein secondary\nstructure prediction,\u201d IEEE/ACM Trans. Comput. Biol. Bioinformatics, vol. 12, no. 1, pp. 103\u2013112, 2015.\n[2] D. Chicco, P. Sadowski, and P. Baldi, \u201cDeep autoencoder neural networks for gene ontology annotation\npredictions,\u201d in ACM Conference on Bioinformatics, Computational Biology, and Health Informatics,\npp. 533\u2013540, 2014.\n\n[3] G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen,\nT. N. Sainath, and B. Kingsbury, \u201cDeep neural networks for acoustic modeling in speech recognition: The\nshared views of four research groups,\u201d IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82\u201397, 2012.\n\n[4] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d arXiv preprint\n\narXiv:1512.03385, 2015.\n\n[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, \u201cImagenet classi\ufb01cation with deep convolutional neural\n\nnetworks,\u201d in Advances in neural information processing systems (NIPS), pp. 1097\u20131105, 2012.\n\n[6] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, \u201cIntriguing\nproperties of neural networks,\u201d in International Conference on Learning Representations (ICLR), 2014.\n[7] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. \u0160rndi\u00b4c, P. Laskov, G. Giacinto, and F. Roli, \u201cEvasion\nattacks against machine learning at test time,\u201d in Joint European Conference on Machine Learning and\nKnowledge Discovery in Databases, pp. 387\u2013402, 2013.\n\n[8] I. J. Goodfellow, J. Shlens, and C. Szegedy, \u201cExplaining and harnessing adversarial examples,\u201d in Interna-\n\ntional Conference on Learning Representations (ICLR), 2015.\n\n[9] U. Shaham, Y. Yamada, and S. Negahban, \u201cUnderstanding adversarial training: Increasing local stability\n\nof neural nets through robust optimization,\u201d arXiv preprint arXiv:1511.05432, 2015.\n\n[10] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, \u201cTowards deep learning models resistant to\n\nadversarial attacks,\u201d arXiv preprint arXiv:1706.06083, 2017.\n\n[11] M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, \u201cParseval networks: Improving robustness\n\nto adversarial examples,\u201d in International Conference on Machine Learning, pp. 854\u2013863, 2017.\n\n[12] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, \u201cDistillation as a defense to adversarial perturba-\n\ntions against deep neural networks,\u201d arXiv preprint arXiv:1511.04508, 2015.\n\n[13] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, \u201cDeep variational information bottleneck,\u201d arXiv\n\npreprint arXiv:1612.00410, 2016.\n\n[14] N. Carlini and D. Wagner, \u201cAdversarial examples are not easily detected: Bypassing ten detection methods,\u201d\nin Proceedings of the 10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pp. 3\u201314, ACM, 2017.\n\n9\n\n\f[15] J. Uesato, B. O\u2019Donoghue, A. v. d. Oord, and P. Kohli, \u201cAdversarial risk and the dangers of evaluating\n\nagainst weak attacks,\u201d arXiv preprint arXiv:1802.05666, 2018.\n\n[16] J. Rauber and W. Brendel, \u201cThe robust vision benchmark.\u201d http://robust.vision, 2017.\n[17] A. Ilyas, A. Jalal, E. Asteri, C. Daskalakis, and A. G. Dimakis, \u201cThe robust manifold defense: Adversarial\n\ntraining using generative models,\u201d arXiv preprint arXiv:1712.09196, 2017.\n\n[18] J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow, \u201cAdversarial\n\nspheres,\u201d arXiv preprint arXiv:1801.02774, 2018.\n\n[19] A. Fawzi, O. Fawzi, and P. Frossard, \u201cAnalysis of classi\ufb01ers\u2019 robustness to adversarial perturbations,\u201d\n\nCoRR, vol. abs/1502.02590, 2015.\n\n[20] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard, \u201cRobustness of classi\ufb01ers: from adversarial to random\n\nnoise,\u201d in Neural Information Processing Systems (NIPS), 2016.\n\n[21] T. Tanay and L. Grif\ufb01n, \u201cA boundary tilting persepective on the phenomenon of adversarial examples,\u201d\n\narXiv preprint arXiv:1608.07690, 2016.\n\n[22] M. Hein and M. Andriushchenko, \u201cFormal guarantees on the robustness of a classi\ufb01er against adversarial\n\nmanipulation,\u201d in Advances in Neural Information Processing Systems, pp. 2263\u20132273, 2017.\n\n[23] J. Peck, J. Roels, B. Goossens, and Y. Saeys, \u201cLower bounds on the robustness to adversarial perturbations,\u201d\n\nin Advances in Neural Information Processing Systems, pp. 804\u2013813, 2017.\n\n[24] A. Sinha, H. Namkoong, and J. Duchi, \u201cCerti\ufb01able distributional robustness with principled adversarial\n\ntraining,\u201d arXiv preprint arXiv:1710.10571, 2017.\n\n[25] A. Raghunathan, J. Steinhardt, and P. Liang, \u201cCerti\ufb01ed defenses against adversarial examples,\u201d arXiv\n\npreprint arXiv:1801.09344, 2018.\n\n[26] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli, \u201cA dual approach to scalable veri\ufb01cation of\n\ndeep networks,\u201d arXiv preprint arXiv:1803.06567, 2018.\n\n[27] J. Kos, I. Fischer, and D. Song, \u201cAdversarial examples for generative models,\u201d arXiv preprint\n\narXiv:1702.06832, 2017.\n\n[28] D. P. Kingma and M. Welling, \u201cAuto-encoding variational bayes,\u201d arXiv preprint arXiv:1312.6114, 2013.\n[29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,\n\u201cGenerative adversarial nets,\u201d in Advances in neural information processing systems, pp. 2672\u20132680, 2014.\n[30] A. Radford, L. Metz, and S. Chintala, \u201cUnsupervised representation learning with deep convolutional\n\ngenerative adversarial networks,\u201d arXiv preprint arXiv:1511.06434, 2015.\n\n[31] M. Arjovsky, S. Chintala, and L. Bottou, \u201cWasserstein generative adversarial networks,\u201d in International\n\nConference on Machine Learning, pp. 214\u2013223, 2017.\n\n[32] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, \u201cImproved training of wasserstein\n\ngans,\u201d in Advances in Neural Information Processing Systems, pp. 5769\u20135779, 2017.\n\n[33] C. Borell, \u201cThe Brunn-Minkowski inequality in Gauss space,\u201d Inventiones mathematicae, vol. 30, no. 2,\n\npp. 207\u2013216, 1975.\n\n[34] V. N. Sudakov and B. S. Tsirel\u2019son, \u201cExtremal properties of half-spaces for spherically invariant measures,\u201d\n\nJournal of Soviet Mathematics, vol. 9, no. 1, pp. 9\u201318, 1978.\n\n[35] P. Samangouei, M. Kabkab, and R. Chellappa, \u201cDefense-gan: Protecting classi\ufb01ers against adversarial\n\nattacks using generative models,\u201d in International Conference on Learning Representations, 2018.\n\n[36] Y. Liu, X. Chen, C. Liu, and D. Song, \u201cDelving into transferable adversarial examples and black-box\n\nattacks,\u201d arXiv preprint arXiv:1611.02770, 2016.\n\n[37] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, \u201cReading digits in natural images with\nunsupervised feature learning,\u201d in NIPS workshop on deep learning and unsupervised feature learning,\n2011.\n\n[38] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, \u201cDeepfool: a simple and accurate method to fool deep\n\nneural networks,\u201d in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.\n\n[39] A. Krizhevsky and G. Hinton, \u201cLearning multiple layers of features from tiny images,\u201d Master\u2019s thesis,\n\nDepartment of Computer Science, University of Toronto, 2009.\n\n[40] K. Simonyan and A. Zisserman, \u201cVery deep convolutional networks for large-scale image recognition,\u201d in\n\nInternational Conference on Learning Representations (ICLR), 2014.\n\n[41] S. Zagoruyko and N. Komodakis, \u201cWide residual networks,\u201d arXiv preprint arXiv:1605.07146, 2016.\n[42] G. F. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Goodfellow, and J. Sohl-Dickstein,\n\u201cAdversarial examples that fool both human and computer vision,\u201d arXiv preprint arXiv:1802.08195, 2018.\n\n10\n\n\f", "award": [], "sourceid": 614, "authors": [{"given_name": "Alhussein", "family_name": "Fawzi", "institution": "DeepMind"}, {"given_name": "Hamza", "family_name": "Fawzi", "institution": "University of Cambridge"}, {"given_name": "Omar", "family_name": "Fawzi", "institution": "ENS Lyon"}]}