{"title": "Adversarially Robust Generalization Requires More Data", "book": "Advances in Neural Information Processing Systems", "page_first": 5014, "page_last": 5026, "abstract": "Machine learning models are often susceptible to adversarial perturbations of their inputs. Even small perturbations can cause state-of-the-art classifiers with high \"standard\" accuracy to produce an incorrect prediction with high confidence. To better understand this phenomenon, we study adversarially robust learning from the viewpoint of generalization. We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of \"standard\" learning. This gap is information theoretic and holds irrespective of the training algorithm or the model family. We complement our theoretical results with experiments on popular image classification datasets and show that a similar gap exists here as well. We postulate that the difficulty of training robust classifiers stems, at least partially, from this inherently larger sample complexity.", "full_text": "Adversarially Robust Generalization\n\nRequires More Data\n\nLudwig Schmidt\n\nUC Berkeley\n\nludwig@berkeley.edu\n\nShibani Santurkar\n\nDimitris Tsipras\n\nshibani@mit.edu\n\ntsipras@mit.edu\n\nMIT\n\nMIT\n\nKunal Talwar\nGoogle Brain\n\nkunal@google.com\n\nAbstract\n\nAleksander M \u02dbadry\n\nMIT\n\nmadry@mit.edu\n\nMachine learning models are often susceptible to adversarial perturbations of their\ninputs. Even small perturbations can cause state-of-the-art classi\ufb01ers with high\n\u201cstandard\u201d accuracy to produce an incorrect prediction with high con\ufb01dence. To\nbetter understand this phenomenon, we study adversarially robust learning from the\nviewpoint of generalization. We show that already in a simple natural data model,\nthe sample complexity of robust learning can be signi\ufb01cantly larger than that of\n\u201cstandard\u201d learning. This gap is information theoretic and holds irrespective of the\ntraining algorithm or the model family. We complement our theoretical results with\nexperiments on popular image classi\ufb01cation datasets and show that a similar gap\nexists here as well. We postulate that the dif\ufb01culty of training robust classi\ufb01ers\nstems, at least partially, from this inherently larger sample complexity.\n\n1\n\nIntroduction\n\nModern machine learning models achieve high accuracy on a broad range of datasets, yet can easily\nbe misled by small perturbations of their input. While such perturbations are often simple noise to a\nhuman or even imperceptible, they cause state-of-the-art models to misclassify their input with high\ncon\ufb01dence. This phenomenon has \ufb01rst been studied in the context of secure machine learning for\nspam \ufb01lters and malware classi\ufb01cation [7, 16, 35]. More recently, researchers have demonstrated\nthe phenomenon under the name of adversarial examples in image classi\ufb01cation [21, 51], question\nanswering [28], voice recognition [12, 13, 49, 62], and other domains (for instance, see [2, 4, 14,\n22, 25, 26, 32, 60]). Overall, the existence of such adversarial examples raises concerns about the\nrobustness of current classi\ufb01ers. As we increasingly deploy machine learning systems in safety- and\nsecurity-critical environments, it is crucial to understand the robustness properties of our models in\nmore detail.\nA growing body of work is exploring this robustness question from the security perspective by\nproposing attacks (methods for crafting adversarial examples) and defenses (methods for making\nclassi\ufb01ers robust to such perturbations). Often, the focus is on deep neural networks, e.g., see [11, 24,\n36, 37, 41, 47, 53, 59]. While there has been success with robust classi\ufb01ers on simple datasets [31,\n36, 44, 48], more complicated datasets still exhibit a large gap between \u201cstandard\u201d and robust\naccuracy [3, 11]. An implicit assumption underlying most of this work is that the same training\ndataset that enables good standard accuracy also suf\ufb01ces to train a robust model. However, it is\nunclear if this assumption is valid.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fSo far, the generalization aspects of adversarially robust classi\ufb01cation have not been thoroughly\ninvestigated. Since adversarial robustness is a learning problem, the statistical perspective is of\nintegral importance. A key observation is that adversarial examples are not at odds with the standard\nnotion of generalization as long as they occupy only a small total measure under the data distribution.\nSo to achieve adversarial robustness, a classi\ufb01er must generalize in a stronger sense. We currently do\nnot have a good understanding of how such a stronger notion of generalization compares to standard\n\u201cbenign\u201d generalization, i.e., without an adversary.\nIn this work, we address this gap and explore the statistical foundations of adversarially robust\ngeneralization. We focus on sample complexity as a natural starting point since it underlies the core\nquestion of when it is possible to learn an adversarially robust classi\ufb01er. Concretely, we pose the\nfollowing question:\n\nHow does the sample complexity of standard generalization compare to that of\nadversarially robust generalization?\n\nPut differently, we ask if a dataset that allows for learning a good classi\ufb01er also suf\ufb01ces for learning a\nrobust one. To study this question, we analyze robust generalization in two distributional models. By\nfocusing on speci\ufb01c distributions, we can establish information-theoretic lower bounds and describe\nthe exact sample complexity requirements for generalization. We \ufb01nd that even for a simple data\ndistribution such as a mixture of two class-conditional Gaussians, the sample complexity of robust\ngeneralization is signi\ufb01cantly larger than that of standard generalization. Our lower bound holds for\nany model and learning algorithm. Hence no amount of algorithmic ingenuity is able to overcome\nthis limitation.\nIn spite of this negative result, simple datasets such as MNIST have recently seen signi\ufb01cant progress\nin terms of adversarial robustness [31, 36, 44, 48]. The most robust models achieve accuracy around\n90% against large (cid:96)\u221e-perturbations. To better understand this discrepancy with our \ufb01rst theoretical\nresult, we also study a second distributional model with binary features. This binary data model\nhas the same standard generalization behavior as the previous Gaussian model. Moreover, it also\nsuffers from a signi\ufb01cantly increased sample complexity whenever one employs linear classi\ufb01ers\nto achieve adversarially robust generalization. Nevertheless, a slightly non-linear classi\ufb01er that\nutilizes thresholding turns out to recover the smaller sample complexity of standard generalization.\nSince MNIST is a mostly binary dataset, our result provides evidence that (cid:96)\u221e-robustness on MNIST\nis signi\ufb01cantly easier than on other datasets. Moreover, our results show that distributions with\nsimilar sample complexity for standard generalization can still exhibit considerably different sample\ncomplexity for robust generalization.\nTo complement our theoretical results, we conduct a range of experiments on MNIST, CIFAR10,\nand SVHN. By subsampling the datasets at various rates, we study the impact of sample size\non adversarial robustness. When plotted as a function of training set size, our results show that\nthe standard accuracy on SVHN indeed plateaus well before the adversarial accuracy reaches its\nmaximum. On MNIST, explicitly adding thresholding to the model during training signi\ufb01cantly\nreduces the sample complexity, similar to our upper bound in the binary data model. On CIFAR10,\nthe situation is more nuanced because there are no known approaches that achieve more than 50%\naccuracy even against a mild adversary. But as we show below, there is clear evidence for over\ufb01tting\nin the current state-of-the-art methods.\nOverall, our results suggest that current approaches may be unable to attain higher adversarial\naccuracy on datasets such as CIFAR10 for a fundamental reason: the dataset may not be large\nenough to train a standard convolutional network robustly. Moreover, our lower bounds illustrate\nthat the existence of adversarial examples should not necessarily be seen as a shortcoming of speci\ufb01c\nclassi\ufb01cation methods. Already in a simple data model, adversarial examples provably occur for\nany learning approach, even when the classi\ufb01er already achieves high standard accuracy. So while\nvulnerability to adversarial (cid:96)\u221e-perturbations might seem counter-intuitive at \ufb01rst, in some regimes it\nis an unavoidable consequence of working in a statistical setting.\n\n1.1 A motivating example: Over\ufb01tting on CIFAR10\n\nBefore we describe our main results, we brie\ufb02y highlight the importance of generalization for\nadversarial robustness via two experiments on MNIST and CIFAR10. In both cases, our goal is to\nlearn a classi\ufb01er that achieves good test accuracy even under (cid:96)\u221e-bounded perturbations. We follow\n\n2\n\n\fFigure 1: Classi\ufb01cation accuracies for robust optimization on MNIST and CIFAR10. In both cases,\nwe trained standard convolutional networks to be robust to (cid:96)\u221e-perturbations of the input. On MNIST,\nthe robust test error closely tracks the corresponding training error and the model achieves high robust\naccuracy. On CIFAR10, the model still achieves a good natural (non-adversarial) test error, but there\nis a signi\ufb01cant generalization gap for the robust accuracy. This phenomenon motivates our study of\nadversarially robust generalization.\n\nthe standard robust optimization approach [6, 36, 54] \u2013 also known as adversarial training [21, 51] \u2013\nand (approximately) solve the saddle point problem\n\n(cid:21)\n\nloss(\u03b8, x(cid:48))\n\n(cid:20)\n\nmin\n\n\u03b8\n\nE\nx\n\nmax\n\n(cid:107)x(cid:48)\u2212x(cid:107)\u221e\u2264\u03b5\n\nvia stochastic gradient descent over the model parameters \u03b8. We utilize projected gradient descent\nfor the inner maximization problem over allowed perturbations of magnitude \u03b5 (see [36] for details).\nFigure 1 displays the training curves for three quantities: (i) adversarial training error, (ii) adversarial\ntest error, and (iii) standard test error.\nThe results show that on MNIST, robust optimization is able to learn a model with around 90%\nadversarial accuracy and a relatively small gap between training and test error. However, CIFAR10\noffers a different picture. Here, the model (a wide residual network [61]) is still able to fully \ufb01t the\ntraining set even against an adversary, but the generalization gap is signi\ufb01cantly larger. The model\nonly achieves 47% adversarial test accuracy, which is about 50% lower than its training accuracy.1\nMoreover, the standard test accuracy is about 87%, so the failure of generalization indeed primarily\noccurs in the context of adversarial robustness. This failure may be surprising particularly since\nproperly tuned convolutional networks rarely over\ufb01t much on standard vision datasets.\n\n1.2 Outline of the paper\n\nIn the next section, we describe our main theoretical results at a high level. Section 3 complements\nthese results with experiments. We discuss related works in Section 4 and conclude in Section 5. Due\nto space constraints, a longer discussion of related work, several open questions, and all proofs are\ndeferred to the appendix in the supplementary material.\n\n2 Theoretical Results\n\nOur theoretical results concern statistical aspects of adversarially robust classi\ufb01cation. In order to\nunderstand how properties of data affect the number of samples needed for robust generalization, we\nstudy two concrete distributional models.\nWhile our two data models are clearly much simpler than the image datasets currently being used in\nthe experimental work on (cid:96)\u221e-robustness, we believe that the simplicity of our models is a strength in\nthis context. The fact that we can establish a separation between standard and robust generalization\nalready in our Gaussian data model is evidence that the existence of adversarial examples for neural\n\n1We remark that this accuracy is still currently the best published robust accuracy on CIFAR10 [3]. For\n\ninstance, contemporary approaches to architecture tuning do not yield better robust accuracies [15].\n\n3\n\n0200004000060000Training Steps020406080100AccuracyMNISTAdversarial trainAdversarial testStandard test020000400006000080000Training Steps020406080100CIFAR10Adversarial trainAdversarial testStandard test\fnetworks should not come as a surprise. The same phenomenon (i.e., classi\ufb01ers with just enough\nsamples for high standard accuracy necessarily being vulnerable to (cid:96)\u221e- attacks) already occurs in\nmuch simpler settings such as a mixture of two Gaussians. Note that more complicated distributional\nsetups that can \u201csimulate\u201d the Gaussian model directly inherit our lower bounds.\nIn addition, conclusions from our simple models also transfer to real datasets. As we describe\nin the subsection on the Bernoulli model, the bene\ufb01ts of the thresholding layer predicted by our\ntheoretical analysis do indeed appear in experiments on MNIST as well. Since multiple defenses\nagainst adversarial examples have been primarily evaluated on MNIST [31, 44, 48], it is important to\nnote that (cid:96)\u221e-robustness on MNIST is a particularly easy case: adding a simple thresholding layer\ndirectly yields nearly state-of-the-art robustness against moderately strong adversaries (\u03b5 = 0.1),\nwithout any further changes to the model architecture or training algorithm.\n\n2.1 The Gaussian model\n\nOur \ufb01rst data model is a mixture of two spherical Gaussians with one component per class.\nDe\ufb01nition 1 (Gaussian model). Let \u03b8(cid:63) \u2208 Rd be the per-class mean vector and let \u03c3 > 0 be the\nvariance parameter. Then the (\u03b8(cid:63), \u03c3)-Gaussian model is de\ufb01ned by the following distribution over\n(x, y) \u2208 Rd \u00d7 {\u00b11}: First, draw a label y \u2208 {\u00b11} uniformly at random. Then sample the data point\nx \u2208 Rd from N (y \u00b7 \u03b8(cid:63), \u03c32I).\n\n\u221a\n\nWhile not explicitly speci\ufb01ed in the de\ufb01nition, we will use the Gaussian model in the regime where\nthe norm of the vector \u03b8(cid:63) is approximately\nd. Hence the main free parameter for controlling the\ndif\ufb01culty of the classi\ufb01cation task is the variance \u03c32, which controls the amount of overlap between\nthe two classes.\nTo contrast the notions of \u201cstandard\u201d and \u201crobust\u201d generalization, we brie\ufb02y recap a standard de\ufb01nition\nof classi\ufb01cation error.\nDe\ufb01nition 2 (Classi\ufb01cation error). Let P : Rd\u00d7{\u00b11} \u2192 R be a distribution. Then the classi\ufb01cation\nerror \u03b2 of a classi\ufb01er f : Rd \u2192 {\u00b11} is de\ufb01ned as \u03b2 = P(x,y)\u223cP [f (x) (cid:54)= y].\nNext, we de\ufb01ne our main quantity of interest, which is an adversarially robust counterpart of the\nabove classi\ufb01cation error. Instead of counting misclassi\ufb01cations under the data distribution, we allow\na bounded worst-case perturbation before passing the perturbed sample to the classi\ufb01er.\nDe\ufb01nition 3 (Robust classi\ufb01cation error). Let P : Rd \u00d7 {\u00b11} \u2192 R be a distribution and let\nB : Rd \u2192 P(Rd) be a perturbation set.2 Then the B-robust classi\ufb01cation error \u03b2 of a classi\ufb01er\nf : Rd \u2192 {\u00b11} is de\ufb01ned as \u03b2 = P(x,y)\u223cP [\u2203 x(cid:48) \u2208 B(x) : f (x(cid:48)) (cid:54)= y].\nSince (cid:96)\u221e-perturbations have recently received a signi\ufb01cant amount of attention, we focus on ro-\nbustness to (cid:96)\u221e-bounded adversaries in our work. For this purpose, we de\ufb01ne the perturbation set\nB\u03b5\u221e(x) = {x(cid:48) \u2208 Rd |(cid:107)x(cid:48) \u2212 x(cid:107)\u221e \u2264 \u03b5}. To simplify notation, we refer to robustness with respect to\nthis set also as (cid:96)\u03b5\u221e-robustness. As we remark in the discussion section, understanding generalization\nfor other measures of robustness ((cid:96)2, rotations, etc.) is an important direction for future work.\n\nStandard generalization. The Gaussian model has one parameter for controlling the dif\ufb01culty of\nlearning a good classi\ufb01er. In order to simplify the following bounds, we study a regime where it is\npossible to achieve good standard classi\ufb01cation error from a single sample.3 As we will see later,\nthis also allows us to calibrate our two data models to have comparable standard sample complexity.\nConcretely, we prove the following theorem, which is a direct consequence of Gaussian concentration.\nNote that in this theorem we use a linear classi\ufb01er: for a vector w, the linear classi\ufb01er fw : Rd \u2192\n{\u00b11} is de\ufb01ned as fw(x) = sgn((cid:104)w, x(cid:105)).\nwhere c is a universal constant. Let (cid:98)w \u2208 Rd be the vector (cid:98)w = y \u00b7 x. Then with high probability, the\nTheorem 4. Let (x, y) be drawn from a (\u03b8(cid:63), \u03c3)-Gaussian model with (cid:107)\u03b8(cid:63)(cid:107)2 =\nlinear classi\ufb01er f(cid:98)w has classi\ufb01cation error at most 1%.\n2We write P(Rd) to denote the power set of Rd, i.e., the set of subsets of Rd.\n3We remark that it is also possible to study a more general setting where standard generalization requires a\n\nd and \u03c3 \u2264 c \u00b7 d1/4\n\n\u221a\n\nlarger number of samples.\n\n4\n\n\fTo minimize the number of parameters in our bounds, we have set the error probability to 1%.\nBy tuning the model parameters appropriately, it is possible to achieve a vanishingly small error\nprobability from a single sample (see Corollary 19 in Appendix D.1).\n\nRobust generalization. As we just demonstrated, we can easily achieve standard generalization\nfrom only a single sample in our Gaussian model. We now show that achieving a low (cid:96)\u221e-robust\nclassi\ufb01cation error requires signi\ufb01cantly more samples. To this end, we begin with a natural strength-\nening of Theorem 4 and prove that the (class-weighted) sample mean can also be a robust classi\ufb01er\n(given suf\ufb01cient data).\nd and \u03c3 \u2264 c1d1/4. Let (cid:98)w \u2208 Rd be the weighted mean vector (cid:98)w = 1\nTheorem 5. Let (x1, y1), . . . , (xn, yn) be drawn i.i.d. from a (\u03b8(cid:63), \u03c3)-Gaussian model with (cid:107)\u03b8(cid:63)(cid:107)2 =\n\u221a\ni=1 yixi. Then with high\nprobability, the linear classi\ufb01er f(cid:98)w has (cid:96)\u03b5\u221e-robust classi\ufb01cation error at most 1% if\n\n(cid:80)n\n\nn\n\n(cid:26)1\n\nn \u2265\n\n\u221a\n\nd\n\nc2 \u03b52\n\nfor \u03b5 \u2264 1\nfor\n\n1\n\n4 d\u22121/4\n\n4 d\u22121/4 \u2264 \u03b5 \u2264 1\n\n4\n\n.\n\nWe refer the reader to Corollary 22 in Appendix D.1 for the details. As before, c1 and c2 are two\nuniversal constants. Overall, the theorem shows that it is possible to learn an (cid:96)\u03b5\u221e-robust classi\ufb01er\nin the Gaussian model as long as \u03b5 is bounded by a small constant and we have a large number of\nsamples.\nNext, we show that this signi\ufb01cantly increased sample complexity is necessary. Our main theorem\nestablishes a lower bound for all learning algorithms, which we formalize as functions from data\nsamples to binary classi\ufb01ers. In particular, the lower bound applies not only to learning linear\nclassi\ufb01ers.\nTheorem 6. Let gn be any learning algorithm, i.e., a function from n samples to a binary classi\ufb01er\nfn. Moreover, let \u03c3 = c1 \u00b7 d1/4, let \u03b5 \u2265 0, and let \u03b8 \u2208 Rd be drawn from N (0, I). We also draw n\nsamples from the (\u03b8, \u03c3)-Gaussian model. Then the expected (cid:96)\u03b5\u221e-robust classi\ufb01cation error of fn is at\nleast (1 \u2212 1/d) 1\n\n2 if\n\n\u221a\n\nn \u2264 c2\n\n\u03b52\nd\nlog d\n\n.\n\nThe proof of the theorem can be found in Corollary 23 (Appendix D.2). It is worth noting that the\nclassi\ufb01cation error 1/2 in the lower bound is tight. A classi\ufb01er that always outputs a \ufb01xed prediction\ntrivially achieves perfect robustness on one of the two classes and hence robust accuracy 1/2.\nComparing Theorems 5 and 6, we see that the sample complexity n required for robust generalization\nis bounded as\n\n\u221a\n\nc\u03b52\nd\nlog d\n\n\u2264 n \u2264 c(cid:48)\u03b52\n\nd .\n\n\u221a\n\n\u221a\n\nHence the lower bound is nearly tight in our regime of interest. When the perturbation has constant\n(cid:96)\u221e-norm, the sample complexity of robust generalization is larger than that of standard generalization\nd, i.e., polynomial in the problem dimension. This shows that for high-dimensional problems,\nby\nadversarial robustness can provably require a signi\ufb01cantly larger number of samples.\nFinally, we remark that our lower bound applies also to a more restricted adversary. Our proof uses\nonly a single adversarial perturbation per class. As a result, the lower bound provides transferable ad-\nversarial examples and applies to worst-case distribution shifts without a classi\ufb01er-adaptive adversary.\nWe refer the reader to Section 5 for a more detailed discussion.\n\n2.2 The Bernoulli model\n\nAs mentioned in the introduction, simpler datasets such as MNIST have recently seen signi\ufb01cant\nprogress in terms of (cid:96)\u221e-robustness. We now investigate a possible mechanism underlying these\nadvances. To this end, we study a second distributional model that highlights how the data distribution\ncan signi\ufb01cantly affect the achievable robustness. The second data model is de\ufb01ned on the hypercube\n{\u00b11}d, and the two classes are represented by opposite vertices of that hypercube. When sampling a\ndatapoint for a given class, we \ufb02ip each bit of the corresponding class vertex with a certain probability.\nThis data model is inspired by the MNIST dataset because MNIST images are close to binary (many\npixels are almost fully black or white).\n\n5\n\n\fDe\ufb01nition 7 (Bernoulli model). Let \u03b8(cid:63) \u2208 {\u00b11}d be the per-class mean vector and let \u03c4 > 0 be the\nclass bias parameter. Then the (\u03b8(cid:63), \u03c4 )-Bernoulli model is de\ufb01ned by the following distribution over\n(x, y) \u2208 {\u00b11}d \u00d7 {\u00b11}: First, draw a label y \u2208 {\u00b11} uniformly at random from its domain. Then\nsample the data point x \u2208 {\u00b11}d by sampling each coordinate xi from the distribution\n\n(cid:26) y \u00b7 \u03b8(cid:63)\n\n\u2212y \u00b7 \u03b8(cid:63)\n\nxi =\n\ni with probability 1/2 + \u03c4\ni with probability 1/2 \u2212 \u03c4\n\n.\n\nAs in the previous subsection, the model has one parameter for controlling the dif\ufb01culty of learning.\nA small value of \u03c4 makes the samples less correlated with their respective class vectors and hence\nleads to a harder classi\ufb01cation problem. Note that both the Gaussian and the Bernoulli model are\nde\ufb01ned by simple sub-Gaussian distributions. Nevertheless, we will see that they differ signi\ufb01cantly\nin terms of robust sample complexity.\n\nStandard generalization. As in the Gaussian model, we \ufb01rst calibrate the distribution so that we\ncan learn a classi\ufb01er with good standard accuracy from a single sample.4 The following theorem is a\ndirect consequence of the fact that bounded random variables exhibit sub-Gaussian concentration.\nuniversal constant. Let (cid:98)w \u2208 Rd be the vector (cid:98)w = y \u00b7 x. Then with high probability, the linear\nTheorem 8. Let (x, y) be drawn from a (\u03b8(cid:63), \u03c4 )-Bernoulli model with \u03c4 \u2265 c \u00b7 d\u22121/4 where c is a\nclassi\ufb01er f(cid:98)w has classi\ufb01cation error at most 1%.\n\nTo simplify the bound, we have set the error probability to be 1% as in the Gaussian model. We refer\nthe reader to Corollary 28 in Appendix F.1 for the proof.\n\nRobust generalization. Next, we investigate the sample complexity of robust generalization in\nour Bernoulli model. For linear classi\ufb01ers, a small robust classi\ufb01cation error again requires a large\nnumber of samples:\nTheorem 9. Let gn be a linear classi\ufb01er learning algorithm, i.e., a function from n samples to a\nlinear classi\ufb01er fn. Suppose that we choose \u03b8(cid:63) uniformly at random from {\u00b11}d and draw n samples\nfrom the (\u03b8(cid:63), \u03c4 )-Bernoulli model with \u03c4 = c1 \u00b7 d\u22121/4. Moreover, let \u03b5 < 3\u03c4 and 0 < \u03b3 < 1/2. Then\nthe expected (cid:96)\u03b5\u221e-robust classi\ufb01cation error of fn is at least 1\n\n2 \u2212 \u03b3 if\n\nn \u2264 c2\n\n\u03b52\u03b32d\nlog d/\u03b3\n\n.\n\nWe defer the proof to Appendix F.2. At \ufb01rst, the lower bound for linear classi\ufb01ers might suggest that\n(cid:96)\u221e-robustness requires an inherently larger sample complexity here as well. However, in contrast\nto the Gaussian model, non-linear classi\ufb01ers can achieve a signi\ufb01cantly improved robustness. In\nparticular, consider the following thresholding operation T : Rd \u2192 Rd which is de\ufb01ned element-wise\nas\n\n(cid:26)+1\n\n\u22121\n\nT (x)i =\n\nif xi \u2265 0\notherwise .\n\nIt is easy to see that for \u03b5 < 1, the thresholding operator undoes the action of any (cid:96)\u221e-bounded adver-\nsary, i.e., we have T (B\u03b5\u221e(x)) = {x} for any x \u2208 {\u00b11}d. Hence we can combine the thresholding\noperator with the classi\ufb01er learned from a single sample to get the following upper bound.\nuniversal constant. Let (cid:98)w \u2208 Rd be the vector (cid:98)w = yx. Then with high probability, the classi\ufb01er\nTheorem 10. Let (x, y) be drawn from a (\u03b8(cid:63), \u03c4 )-Bernoulli model with \u03c4 \u2265 c \u00b7 d\u22121/4 where c is a\nf(cid:98)w \u25e6 T has (cid:96)\u03b5\u221e-robust classi\ufb01cation error at most 1% for any \u03b5 < 1.\n\nThis theorem shows a stark contrast to the Gaussian case. Although both models have similar sample\ncomplexity for standard generalization, there is a\nd gap between the (cid:96)\u221e-robust sample complexity\nfor the Bernoulli and Gaussian models. This discrepancy provides evidence that robust generalization\nrequires a more nuanced understanding of the data distribution than standard generalization.\n\n\u221a\n\n4To be precise, the two distributions have comparable sample complexity for standard generalization in the\n\nregime where \u03c3 \u2248 \u03c4\u22121.\n\n6\n\n\fFigure 2: Adversarially robust generalization performance as a function of training data size for\n(cid:96)\u221e adversaries on the MNIST, CIFAR-10 and SVHN datasets. For each choice of training set size\nand \u03b5test, we plot the best performance achieved over \u03b5train and network capacity. This clearly\nshows that achieving a certain level of adversarially robust generalization requires signi\ufb01cantly more\nsamples than achieving the same level of standard generalization.\n\nIn isolation, the thresholding step might seem speci\ufb01c to the Bernoulli model studied here. However,\nour experiments in Section 3 show that an explicit thresholding layer also signi\ufb01cantly improves the\nsample complexity of training a robust neural network on MNIST. We conjecture that the effectiveness\nof thresholding is behind many of the successful defenses against adversarial examples on MNIST\n(for instance, see Appendix C in [36]).\n\n3 Experiments\n\nWe complement our theoretical results by performing experiments on multiple common datasets. We\nconsider standard convolutional neural networks and train models on datasets of varying complexity.\nSpeci\ufb01cally, we study the MNIST [34], CIFAR-10 [33], and SVHN [40] datasets. We use a simple\nconvolutional architecture for MNIST, a standard ResNet model [23] for CIFAR-10, and a wider\nResNet [61] for SVHN. We perform robust optimization to train our classi\ufb01ers on perturbations\ngenerated by projected gradient descent. Appendix G provides additional details for our experiments.\n\nEmpirical sample complexity evaluation. We study how the generalization performance of adver-\nsarially robust networks varies with the size of the training dataset. To do so, we train networks with\na speci\ufb01c (cid:96)\u221e adversary while reducing the size of the training set. The training subsets are produced\nby randomly sub-sampling the complete dataset in a class-balanced fashion. When increasing the\nnumber of samples, we ensure that each dataset is a superset of the previous one.\nWe evaluate the robustness of each trained network to perturbations of varying magnitude (\u03b5test). For\neach choice of training set size N and \ufb01xed attack \u03b5test, we select the best performance achieved\nacross all hyperparameters settings (training perturbations \u03b5train and model size). On all three\ndatasets, we observed that the best standard accuracy is usually achieved for the standard trained\nnetwork, while the best adversarial accuracy for almost all values of \u03b5test was achieved when training\nwith the largest \u03b5train. We maximize over the hyperparameter settings since we are not interested in\nthe performance of a speci\ufb01c model, but rather in the inherent generalization properties of the dataset\nindependently of the classi\ufb01er used. Figure 2 shows the results of these experiments.\nThe plots demonstrate the need for more data to achieve adversarially robust generalization. For any\n\ufb01xed test set accuracy, the number of samples needed is signi\ufb01cantly higher for robust generalization.\nIn the SVHN experiments (where we have suf\ufb01cient training samples to observe plateauing behavior),\nthe standard accuracy reaches its maximum with signi\ufb01cantly fewer samples than the adversarial\naccuracy. We report more details of our experiments in Section H of the supplementary material.\n\nThresholding experiments. Motivated by our theoretical study of the Bernoulli model, we investi-\ngate whether thresholding can also improve the sample complexity of robust generalization against\nan (cid:96)\u221e adversary on MNIST.\n\n7\n\n103104Training Set Size020406080100Test Accuracy (%)MNISTtest=0test=0.1test=0.2test=0.3103104Training Set Size020406080100Test Accuracy (%)CIFAR-10test=0test=2test=4test=8103104105Training Set Size020406080100Test Accuracy (%)SVHNtest=0test=1test=2test=4\fFigure 3: Adversarial robustness to (cid:96)\u221e attacks on the MNIST dataset for a simple convolution\nnetwork [36] with and without explicit thresholding \ufb01lters. For each training set size choice and\n\u03b5test, we report the best test set accuracy achieved over choice of thresholding \ufb01lters and \u03b5train. We\nobserve that introducing thresholding \ufb01lters signi\ufb01cantly reduces the number of samples needed to\nachieve good adversarial generalization.\n\nWe repeat the above sample complexity experiments with networks where thresholding \ufb01lters are\nexplicitly encoded in the model. Here, we replace the \ufb01rst convolutional layer with a \ufb01xed thresholding\nlayer consisting of two channels, ReLU(x\u2212 \u03b5f ilter) and ReLU(1\u2212 x\u2212 \u03b5f ilter), where x is the input\nimage. Figure 3 shows results for networks trained with such a thresholding layer. For standard\ntrained networks, we use a value of \u03b5f ilter = 0.1 for the thresholding \ufb01lters, whereas for adversarially\ntrained networks we set \u03b5f ilter = \u03b5train. For each data subset size and test perturbation \u03b5test, we plot\nthe best test accuracy achieved over networks trained with different thresholding \ufb01lters, i.e., different\nvalues of \u03b5. We separately show the effect of explicit thresholding in such networks when they are\ntrained adversarially using PGD.\nAs predicted by our theory, the networks achieve good adversarially robust generalization with\nsigni\ufb01cantly fewer samples when thresholding \ufb01lters are added. Further, note that adding a simple\nthresholding layer directly yields nearly state-of-the-art robustness against moderately strong adver-\nsaries (\u03b5 = 0.1), without any other modi\ufb01cations to the model architecture or training algorithm. It\nis also worth noting that the thresholding \ufb01lters could have been learned by the original network\narchitecture, and that this modi\ufb01cation only decreases the capacity of the model. Our \ufb01ndings\nemphasize network architecture as a crucial factor for learning adversarially robust networks from a\nlimited number of samples.\nWe also experimented with thresholding \ufb01lters on the CIFAR10 dataset, but did not observe any\nsigni\ufb01cant difference from the standard architecture. This agrees with our theoretical understanding\nthat thresholding helps primarily in the case of (approximately) binary datasets.\n\n4 Related Work\n\nDue to the large body of work on adversarial robustness, we focus on related papers that also provide\ntheoretical explanations for adversarial examples. We defer a detailed discussion of related work to\nAppedix A and discuss here the works most closely related to ours.\nWang et al. [55] study the adversarial robustness of nearest neighbor classi\ufb01ers. In contrast to our\nwork, the authors give theoretical guarantees for a speci\ufb01c classi\ufb01cation algorithm, and do not\nsee a separation in sample complexity between robust and regular generalization. Recent work by\nGilmer et al. [20] explores a speci\ufb01c distribution where robust learning is empirically dif\ufb01cult with\noverparametrized neural networks. The main phenomenon is that even a small natural error rate\non their dataset translates to a large adversarial error rate. Our results give a more nuanced picture\nthat involves the sample complexity required for generalization. In our data models, it is possible to\nachieve an error rate that is essentially zero by using a very small number of samples, whereas the\nadversarial error rate is still large unless we have seen a lot of samples.\nThe work of Xu et al. [58] establishes a connection between robust optimization and regularization\nfor linear classi\ufb01cation. In particular, they show that robustness to a speci\ufb01c perturbation set is exactly\nequivalent to the standard support vector machine. Subsequent work by Xu and Mannor [57] builds\na deeper connection between robustness and generalization. They prove that for a certain notion\nof robustness, robust algorithms generalize. Moreover, they show that robustness is a necessary\n\n8\n\n103104Training Set Size020406080100Test Accuracy (%)Standard Training103104Training Set Size020406080100Test Accuracy (%)Standard Training + Thresholding103104Training Set Size020406080100Test Accuracy (%)Adversarial Training103104Training Set Size020406080100Test Accuracy (%)Adversarial Training + Thresholdingtest=0test=0.1test=0.2test=0.3\fcondition for generalization in an asymptotic sense. Bellet and Habrard [5] gives similar results\nfor metric learning. However, these results do no imply sample complexity bounds since they are\nasymptotic. Our results stand in stark contrast: we show that generalization can, in simple models, be\nsigni\ufb01cantly easier than robustness when sample complexity enters the picture.\nFawzi et al. [18] relate the robustness of linear and non-linear classi\ufb01ers to adversarial and\n(semi-) random perturbations. Their work studies the setting where the classi\ufb01er is \ufb01xed and does not\nencompass the learning task. Fawzi et al. [19] give provable lower bounds for adversarial robustness\nin models where robust classi\ufb01ers do not exist. In contrast, we are interested in a setting where robust\nclassi\ufb01ers exist, but need many samples to learn. Papernot et al. [43] discuss adversarial robustness at\nthe population level. We defer a more detailed discussion of these works to Appendix A.\nThere is also a long line of work in machine learning on exploring the connection between various no-\ntions of margin and generalization, e.g., see [46] and references therein. In this setting, the (cid:96)p margin,\ni.e., how robustly classi\ufb01able the data is for (cid:96)\u2217\np-bounded classi\ufb01ers, enables dimension-independent\ncontrol of the sample complexity. However, the sample complexity in concrete distributional models\ncan often be signi\ufb01cantly smaller than what the margin implies.\n\n5 Discussion and Conclusions\n\nThe vulnerability of neural networks to adversarial perturbations has recently been a source of much\ndiscussion and is still poorly understood. Different works have argued that this vulnerability stems\nfrom their discontinuous nature [51], their linear nature [21], or is a result of high-dimensional\ngeometry and independent of the model class [20]. Our work gives a more nuanced picture. We show\nthat for a natural data distribution (the Gaussian model), the model class we train does not matter and\na standard linear classi\ufb01er achieves optimal robustness. However, robustness also strongly depends on\nproperties of the underlying data distribution. For other data models (such as MNIST or the Bernoulli\nmodel), our results demonstrate that non-linearities are indispensable to learn from few samples. This\ndichotomy provides evidence that defenses against adversarial examples need to be tailored to the\nspeci\ufb01c dataset (even for the same type of perturbations) and hence may be more complicated than a\nsingle, broad approach. Understanding the interactions between robustness, classi\ufb01er model, and\ndata distribution from the perspective of generalization is an important direction for future work. We\nrefer the reader to Section B in the appendix for concrete questions in this direction.\nThe focus of our paper is on adversarial perturbations in a setting where the test distribution (before\nthe adversary\u2019s action) is the same as the training distribution. While this is a natural scenario from a\nsecurity point of view, other setups can be more relevant in different robustness contexts. For instance,\nwe may want a classi\ufb01er that is robust to small changes between the training and test distribution.\nThis can be formalized as the classi\ufb01cation accuracy on unperturbed examples coming from an\nadversarially modi\ufb01ed distribution. Here, the power of the adversary is limited by how much the\ntest distribution can be modi\ufb01ed, and the adversary is not allowed to perturb individual samples\ncoming from the modi\ufb01ed test distribution. Interestingly, our lower bound for the Gaussian model\nalso applies to such worst-case distributional shifts. In particular, if the adversary is allowed to shift\nthe mean \u03b8(cid:63) by a vector in B\u03b5\u221e, our proof sketched in Section C transfers to the distribution shift\nsetting. Since the lower bound relies only on a single universal perturbation, this perturbation can\nalso be applied directly to the mean vector.\nWhat do our results mean for robust classi\ufb01cation of real images? Our Gaussian lower bound implies\nthat if an algorithm works for all (or most) settings of the unknown parameter \u03b8(cid:63), then achieving\nstrong (cid:96)\u221e-robustness requires a sample complexity increase that is polynomial in the dimension.\nThere are a few different ways this lower bound could be bypassed. It is conceivable that the noise\nscale \u03c3 is signi\ufb01cantly smaller for real image datasets, making robust classi\ufb01cation easier. Even if that\nwas not the case, a good algorithm could work for the parameters \u03b8(cid:63) that correspond to real datasets\nwhile not working for most other parameters. To accomplish this, the algorithm would implicitly\nor explicitly have prior information about the correct \u03b8(cid:63). While some prior information is already\nincorporated in the model architectures (e.g., convolutional and pooling layers), the conventional\nwisdom usually is not to bias the neural network with our priors. Our work suggests that there are\ntrade-offs with robustness here and that adding more prior information could help to learn more\nrobust classi\ufb01ers.\n\n9\n\n\fAcknowledgements\n\nDuring this research project, Ludwig Schmidt was supported by a Google PhD fellowship and\na Microsoft Research fellowship at the Simons Institute for the Theory of Computing. Ludwig\nwas also an intern in the Google Brain team. Shibani Santurkar is supported by the National\nScience Foundation (NSF) under grants IIS-1447786, IIS-1607189, and CCF-1563880, and the Intel\nCorporation. Dimitris Tsipras was supported in part by the NSF grant CCF-1553428 and the NSF\nFrontier grant CNS-1413920. Aleksander M \u02dbadry was supported in part by an Alfred P. Sloan Research\nFellowship, a Google Research Award, and the NSF grants CCF-1553428 and CNS-1815221.\n\nReferences\n[1] Tensor \ufb02ow models repository. https://www.tensorflow.org/tutorials/layers, 2017.\n\n[2] Anurag Arnab, Ondrej Miksik, and Philip H. S. Torr. On the robustness of semantic segmentation\nmodels to adversarial attacks. In Conference on Computer Vision and Pattern Recognition\n(CVPR), 2018. URL http://arxiv.org/abs/1711.09856.\n\n[3] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense\nof security: Circumventing defenses to adversarial examples. In International Conference on\nMachine Learning (ICML), 2018. URL https://arxiv.org/abs/1802.00420.\n\n[4] Vahid Behzadan and Arslan Munir. Vulnerability of deep reinforcement learning to policy\ninduction attacks. In International Conference on Machine Learning and Data Mining (MLDM),\n2017. URL https://arxiv.org/abs/1701.04143.\n\n[5] Aur\u00e9lien Bellet and Amaury Habrard. Robustness and generalization for metric learning.\n\nNeurocomputing, 2015. URL https://arxiv.org/abs/1209.1086.\n\n[6] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Princeton\n\nUniversity Press, 2009.\n\n[7] Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine\n\nlearning. Pattern Recognition, 2018. URL https://arxiv.org/abs/1712.03141.\n\n[8] St\u00e9phane Boucheron, G\u00e1bor Lugosi, and Pascal Massart. Concentration Inequalities: A\n\nNonasymptotic Theory of Independence. Oxford University Press, 2013.\n\n[9] Nader H. Bshouty, Nadav Eiron, and Eyal Kushilevitz. PAC learning with nasty noise. In\nAlgorithmic Learning Theory (ALT), 1999. URL https://link.springer.com/chapter/\n10.1007/3-540-46769-6_17.\n\n[10] Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples.\n\narXiv, 2016.\n\n[11] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In\nSymposium on Security and Privacy (SP), 2016. URL http://arxiv.org/abs/1608.04644.\n[12] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-\nto-text. In Security and Privacy Workshops (SPW), 2018. URL https://arxiv.org/abs/\n1801.01944.\n\n[13] Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay\nShields, David Wagner, and Wenchao Zhou. Hidden voice commands. In USENIX Security\nSymposium, 2016. URL https://www.usenix.org/conference/usenixsecurity16/\ntechnical-sessions/presentation/carlini.\n\n[14] Moustapha M Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling\ndeep structured visual and speech recognition models with adversarial examples. In Neural\nInformation Processing Systems (NIPS), 2017. URL https://arxiv.org/abs/1707.05373.\nIntriguing\nproperties of adversarial examples. arXiv, 2017. URL https://arxiv.org/abs/1711.\n02846.\n\n[15] Ekin D. Cubuk Cubuk, Barret Zoph, Samuel S. Schoenholz, and Quoc V. Le.\n\n10\n\n\f[16] Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, and Deepak Verma. Adversarial\nclassi\ufb01cation. In International Conference on Knowledge Discovery and Data Mining (KDD),\n2004. URL http://doi.acm.org/10.1145/1014052.1014066.\n\n[17] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry.\nA rotation and a translation suf\ufb01ce: Fooling CNNs with simple transformations. arXiv, 2017.\nURL https://arxiv.org/abs/1712.02779.\n\n[18] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of\nclassi\ufb01ers: from adversarial to random noise. In Neural Information Processing Systems (NIPS),\n2016. URL https://arxiv.org/abs/1608.08967.\n\n[19] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classi\ufb01er.\n\narXiv, 2018. URL https://arxiv.org/abs/1802.08686.\n\n[20] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin\nWattenberg, and Ian Goodfellow. Adversarial spheres. arXiv, 2018. URL https://arxiv.\norg/abs/1801.02774.\n\n[21] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver-\n\nsarial examples. arXiv, 2014. URL http://arxiv.org/abs/1412.6572.\n\n[22] Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick D.\nMcDaniel. Adversarial perturbations against deep neural networks for malware classi\ufb01ca-\ntion. In European Symposium on Research in Computer Security (ESORICS), 2016. URL\nhttp://arxiv.org/abs/1606.04435.\n\n[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image\nrecognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016. URL\nhttps://arxiv.org/abs/1512.03385.\n\n[24] Warren He, James Wei, Xinyun Chen, Nicholas Carlini, and Dawn Song. Adversarial example\ndefenses: Ensembles of weak defenses are not strong. In USENIX Workshop on Offensive\nTechnologies, 2017. URL https://arxiv.org/abs/1706.04701.\n\n[25] Alex Huang, Abdullah Al-Dujaili, Erik Hemberg, and Una-May O\u2019Reilly. Adversarial deep\nlearning for robust detection of binary encoded malware. In Security and Privacy Workshops\n(SPW), 2018. URL https://arxiv.org/abs/1801.02950.\n\n[26] Sandy H. Huang, Nicolas Papernot, Ian J. Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial\nattacks on neural network policies. In International Conference on Learning Representations\n(ICLR), 2017. URL https://arxiv.org/abs/1702.02284.\n\n[27] Peter J. Huber. Robust Statistics. Wiley, 1981.\n\n[28] Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems.\nIn Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017. URL\nhttps://arxiv.org/abs/1707.07328.\n\n[29] Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on\n\nComputing, 1993. URL http://dx.doi.org/10.1137/0222052.\n\n[30] Michael J. Kearns, Robert E. Schapire, and Linda M. Sellie. Toward ef\ufb01cient agnostic learning.\n\nMachine Learning, 1994. URL https://doi.org/10.1023/A:1022615600103.\n\n[31] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex\nouter adversarial polytope. In International Conference on Learning Representations (ICLR),\n2018. URL https://arxiv.org/abs/1711.00851.\n\n[32] Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. In\nSecurity and Privacy Workshops (SPW), 2018. URL http://arxiv.org/abs/1702.06832.\n\n[33] Alex Krizhevsky and Geoffrey Hinton.\n\nLearning multiple layers of features from\nTechnical report, 2009. URL https://www.cs.toronto.edu/~kriz/\n\ntiny images.\nlearning-features-2009-TR.pdf.\n\n11\n\n\f[34] Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. The mnist database of handwritten\n\ndigits. Website, 1998. URL http://yann.lecun.com/exdb/mnist/.\n\n[35] Daniel Lowd and Christopher Meek. Adversarial learning. In International Conference on\nKnowledge Discovery in Data Mining (KDD), 2005. URL http://doi.acm.org/10.1145/\n1081870.1081950.\n\n[36] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. In International Conference on\nLearning Representations (ICLR), 2018. URL https://arxiv.org/abs/1706.06083.\n\n[37] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple\nand accurate method to fool deep neural networks. In Conference on Computer Vision and\nPattern Recognition (CVPR), 2016. URL https://arxiv.org/abs/1511.04599.\n\n[38] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Uni-\nversal adversarial perturbations. In Conference on Computer Vision and Pattern Recognition\n(CVPR), 2017. URL https://arxiv.org/abs/1610.08401.\n\n[39] Nina Narodytska and Shiva Prasad Kasiviswanathan. Simple black-box adversarial perturbations\nIn Conference on Computer Vision and Pattern Recognition (CVPR)\n\nfor deep networks.\nWorkshops, 2017. URL http://arxiv.org/abs/1612.06299.\n\n[40] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng.\nReading digits in natural images with unsupervised feature learning. In NIPS Workshop on\nDeep Learning and Unsupervised Feature Learning, 2011. URL http://ufldl.stanford.\nedu/housenumbers/.\n\n[41] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as\na defense to adversarial perturbations against deep neural networks. In Symposium on Security\nand Privacy (SP), 2016. URL https://arxiv.org/abs/1511.04508.\n\n[42] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and\nAnanthram Swami. The limitations of deep learning in adversarial settings. In European\nSymposium on Security and Privacy (EuroS&P), 2016. URL https://arxiv.org/abs/\n1511.07528.\n\n[43] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the\nscience of security and privacy in machine learning. In European Symposium on Security and\nPrivacy (EuroS&P), 2018. URL https://arxiv.org/abs/1611.03814.\n\n[44] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\nIn International Conference on Learning Representations (ICLR), 2018. URL\n\nexamples.\nhttps://arxiv.org/abs/1801.09344.\n\n[45] Phillippe Rigollet and Jan-Christian H\u00fctter. High-dimensional statistics. Lecture notes, 2017.\n\nURL http://www-math.mit.edu/~rigollet/PDFs/RigNotes17.pdf.\n\n[46] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to\nAlgorithms. Cambridge University Press, 2014. URL http://www.cs.huji.ac.il/~shais/\nUnderstandingMachineLearning/.\n\n[47] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. Accessorize to a\ncrime: Real and stealthy attacks on state-of-the-art face recognition. In Conference on Computer\nand Communications Security (CCS), 2016. URL http://doi.acm.org/10.1145/2976749.\n2978392.\n\n[48] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distributional robustness\nwith principled adversarial training. In International Conference on Learning Representations\n(ICLR), 2018. URL https://arxiv.org/abs/1710.10571.\n\n[49] Liwei Song and Prateek Mittal. Inaudible voice commands. In Conference on Computer and\n\nCommunications Security (CCS), 2017. URL http://arxiv.org/abs/1708.07238.\n\n12\n\n\f[50] Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai. One pixel attack for fooling deep\n\nneural networks. arXiv, 2017. URL http://arxiv.org/abs/1710.08864.\n\n[51] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J.\nIn International\nGoodfellow, and Rob Fergus.\nConference on Learning Representations (ICLR), 2014. URL http://arxiv.org/abs/1312.\n6199.\n\nIntriguing properties of neural networks.\n\n[52] Florian Tram\u00e8r, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel.\nThe space of transferable adversarial examples. arXiv, 2017. URL http://arxiv.org/abs/\n1704.03453.\n\n[53] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick D. McDaniel.\nEnsemble adversarial training: Attacks and defenses. In International Conference on Learning\nRepresentations (ICLR), 2018. URL http://arxiv.org/abs/1705.07204.\n\n[54] Abraham Wald. Statistical decision functions which minimize the maximum risk. Annals of\n\nMathematics, 1945.\n\n[55] Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. Analyzing the robustness of nearest\nneighbors to adversarial examples. In International Conference on Machine Learning (ICML),\n2018. URL http://proceedings.mlr.press/v80/wang18c/wang18c.pdf.\n\n[56] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially\ntransformed adversarial examples. In International Conference on Learning Representations\n(ICLR), 2018. URL https://arxiv.org/abs/1801.02612.\n\n[57] Huan Xu and Shie Mannor. Robustness and generalization. Machine learning, 2012. URL\n\nhttps://arxiv.org/abs/1005.2243.\n\n[58] Huan Xu, Constantine Caramanis, and Shie Mannor. Robustness and regularization of support\nvector machines. Journal of Machine Learning Research (JMLR), 2009. URL http://www.\njmlr.org/papers/v10/xu09b.html.\n\n[59] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in\ndeep neural networks. In Network and Distributed System Security Symposium (NDSS), 2017.\nURL https://arxiv.org/abs/1704.01155.\n\n[60] Xiaojun Xu, Xinyun Chen, Chang Liu, Anna Rohrbach, Trevor Darell, and Dawn Song. Can\nyou fool AI with adversarial examples on a visual Turing test? arXiv, 2017. URL http:\n//arxiv.org/abs/1709.08693.\n\n[61] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In British Machine Vision\n\nConference (BMVC), 2016. URL http://arxiv.org/abs/1605.07146.\n\n[62] Guoming Zhang, Chen Yan, Xiaoyu Ji, Taimin Zhang, Tianchen Zhang, and Wenyuan Xu.\nDolphinatack: Inaudible voice commands. In Conference on Computer and Communications\nSecurity (CCS), 2017. URL http://arxiv.org/abs/1708.09537.\n\n13\n\n\f", "award": [], "sourceid": 2434, "authors": [{"given_name": "Ludwig", "family_name": "Schmidt", "institution": "MIT"}, {"given_name": "Shibani", "family_name": "Santurkar", "institution": "MIT"}, {"given_name": "Dimitris", "family_name": "Tsipras", "institution": "MIT"}, {"given_name": "Kunal", "family_name": "Talwar", "institution": "Google"}, {"given_name": "Aleksander", "family_name": "Madry", "institution": "MIT"}]}