{"title": "PAC-learning in the presence of adversaries", "book": "Advances in Neural Information Processing Systems", "page_first": 230, "page_last": 241, "abstract": "The existence of evasion attacks during the test phase of machine learning algorithms represents a significant challenge to both their deployment and understanding. These attacks can be carried out by adding imperceptible perturbations to inputs to generate adversarial examples and finding effective defenses and detectors has proven to be difficult. In this paper, we step away from the attack-defense arms race and seek to understand the limits of what can be learned in the presence of an evasion adversary. In particular, we extend the Probably Approximately Correct (PAC)-learning framework to account for the presence of an adversary. We first define corrupted hypothesis classes which arise from standard binary hypothesis classes in the presence of an evasion adversary and derive the Vapnik-Chervonenkis (VC)-dimension for these, denoted as the adversarial VC-dimension. We then show that sample complexity upper bounds from the Fundamental Theorem of Statistical learning can be extended to the case of evasion adversaries, where the sample complexity is controlled by the adversarial VC-dimension. We then explicitly derive the adversarial VC-dimension for halfspace classifiers in the presence of a sample-wise norm-constrained adversary of the type commonly studied for evasion attacks and show that it is the same as the standard VC-dimension, closing an open question. Finally, we prove that the adversarial VC-dimension can be either larger or smaller than the standard VC-dimension depending on the hypothesis class and adversary, making it an interesting object of study in its own right.", "full_text": "PAC-learning in the presence of evasion adversaries\n\nDaniel Cullina\n\nPrinceton University\n\nArjun Nitin Bhagoji\nPrinceton University\n\nPrateek Mittal\n\nPrinceton University\n\ndcullina@princeton.edu\n\nabhagoji@princeton.edu\n\npmittal@princeton.edu\n\nAbstract\n\nThe existence of evasion attacks during the test phase of machine learning algo-\nrithms represents a signi\ufb01cant challenge to both their deployment and understand-\ning. These attacks can be carried out by adding imperceptible perturbations to\ninputs to generate adversarial examples and \ufb01nding effective defenses and detectors\nhas proven to be dif\ufb01cult. In this paper, we step away from the attack-defense arms\nrace and seek to understand the limits of what can be learned in the presence of an\nevasion adversary. In particular, we extend the Probably Approximately Correct\n(PAC)-learning framework to account for the presence of an adversary. We \ufb01rst\nde\ufb01ne corrupted hypothesis classes which arise from standard binary hypothesis\nclasses in the presence of an evasion adversary and derive the Vapnik-Chervonenkis\n(VC)-dimension for these, denoted as the adversarial VC-dimension. We then show\nthat sample complexity upper bounds from the Fundamental Theorem of Statistical\nlearning can be extended to the case of evasion adversaries, where the sample\ncomplexity is controlled by the adversarial VC-dimension. We then explicitly\nderive the adversarial VC-dimension for halfspace classi\ufb01ers in the presence of a\nsample-wise norm-constrained adversary of the type commonly studied for evasion\nattacks and show that it is the same as the standard VC-dimension. Finally, we\nprove that the adversarial VC-dimension can be either larger or smaller than the\nstandard VC-dimension depending on the hypothesis class and adversary, making\nit an interesting object of study in its own right.\n\n1\n\nIntroduction\n\nMachine learning (ML) has become ubiquitous due to its impressive performance in domains as\nvaried as image recognition [48, 71], natural language and speech processing [22, 24, 39], game-\nplaying [11, 57, 70] and aircraft collision avoidance [42]. However, its ubiquity provides adversaries\nwith both opportunities and incentives to develop strategic approaches to fool machine learning\nsystems during both training (poisoning attacks) [8, 41, 58, 65] and test (evasion attacks) [7, 15,\n34, 55, 56, 61, 75] phases. Our focus in this paper is on evasion attacks targeting the test phase,\nparticularly those based on adversarial examples which add imperceptible perturbations to the input\nin order to cause misclassi\ufb01cation. A large number of adversarial example-based evasion attacks have\nbeen proposed against supervised ML algorithms used for image classi\ufb01cation [7, 15, 18, 34, 61, 75],\nobject detection [20, 52, 80], image segmentation [3, 31], speech recognition [16, 83] as well as\nother tasks [21, 36, 43, 82]; generative models for image data [46] and even reinforcement learning\nalgorithms [40, 47]. These attacks have been carried out in black-box [6, 10, 19, 51, 59, 60, 75] as well\nas in physical settings [27, 49, 69, 73].\nTo counter these attacks, defenses based on the ideas of adversarial training [34, 53, 76], input de-\nnoising through transformations [5, 23, 26, 67, 81], distillation [63], ensembling [1, 4, 74] and feature\nnulli\ufb01cation [77] have been proposed. A number of detectors [30, 31, 33, 35, 54] for adversarial\nexamples have also been proposed. However, recent work [12\u201314] has demonstrated that modi\ufb01ca-\ntions to existing attacks are suf\ufb01cient to generate adversarial examples that bypass both defenses\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\fand detectors. In light of this burgeoning arms race, defenses that come equipped with theoretical\nguarantees on robustness have recently been proposed [45, 64, 72]. These have been demonstrated for\nneural networks with up to four layers.\nIn this paper, we take a more fundamental approach to understanding the robustness of supervised\nclassi\ufb01cation algorithms by extending well-understood results for supervised batch learning in\nstatistical learning theory. In particular, we seek to understand the sample complexity of Probably\nApproximately Correct (PAC)-learning in the presence of adversaries. This was raised as an open\nquestion for halfspace classi\ufb01ers by Schmidt et al. [66] in concurrent work which focused on the\nsample complexity needed to learn speci\ufb01c distributions. We close this open question by showing that\nthe sample complexity of PAC-learning when the hypothesis class is the set of halfspace classifers\ndoes not increase in the presence of adversaries bounded by convex constraint sets. We note that\nthe PAC-learning framework is distribution-agnostic, i.e.\nit is a statement about learning given\nindependent, identically distributed samples from any distribution over the input space. We show this\nby \ufb01rst introducing the notion of corrupted hypothesis classes, which arise from standard hypothesis\nclasses in the binary setting in the presence of an adversary. Now, in the standard PAC learning\nsetting, i.e. adversaries present, the Vapnik-Chervonenkis (VC)-dimension is a way to characterize\nthe \u2018size\u2019 of a hypothesis class which allows for the determination of which hypothesis classes are\nlearnable and with how much data (i.e. sample complexity). In the adversarial setting, we introduce\nthe notion of adversarial VC-dimension which is the VC-dimension of the corrupted hypothesis\nclass. With these de\ufb01nitions in place, we can then prove sample complexity upper bounds from the\nFundamental Theorem of Statistical Learning in the presence of adversaries that utilize the adversarial\nVC-dimension.\nIn this setting, we explicitly compute the adversarial VC-dimension for the hypothesis class compris-\ning all halfspace classi\ufb01ers, which then directly gives us the sample complexity of PAC-learning in the\npresence of adversaries. This hypothesis class has a VC-dimension of 1 more than the dimension of\nthe input space when no adversaries are present. We prove that this does not increase in the presence\nof an adversary, i.e., the adversarial VC-dimension is equal to the VC-dimension for the hypothesis\nclass comprising all halfspace classi\ufb01ers. Our result then raises the question: is the adversarial\nVC-dimension always equal to the standard VC-dimension? We answer this question in the negative,\nby showing explicit constructions for hypothesis classes and adversarial constraints for which the\nadversarial VC-dimension can be arbitrarily larger or smaller than the standard VC-dimension.\nContributions: In this paper, we are the \ufb01rst to provide sample complexity bounds for the problem of\nPAC-learning in the presence of an evasion adversary. We show that an analog of the VC-dimension\nwhich we term the adversarial VC-dimension allows us to establish learnability and upper bound\nsample complexity for the case of binary hypothesis classes with the 0-1 loss in the presence of\nevasion adversaries. We explicitly compute the adversarial VC-dimension for halfspace classi\ufb01ers\nwith adversaries with standard (cid:96)p (p \u2265 1) distance constraints on adversarial perturbations, and show\nthat it matches the standard VC-dimension. This implies that the sample complexity of PAC-learning\ndoes not increase in the presence of this type of adversary. We also show that this is not always the\ncase by constructing hypothesis classes where the adversarial VC-dimension is arbitrarily larger or\nsmaller than the standard one.\n\n2 Adversarial agnostic PAC-learning\n\nIn this section, we set up the problem of agnostic PAC-learning in the presence of an evasion adversary\nwhich presents the learner with adversarial test examples but does not interfere with the training\nprocess. We also de\ufb01ne the notation for the rest of the paper and brie\ufb02y explain the connections\nbetween our setting and other work on adversarial examples.\nWe summarize the basic notation in Table 1. We extend the agnostic PAC-learning setting introduced\nby Haussler [37] to include an evasion adversary. In our extension, the learning problem is as\nfollows. There is an unknown P \u2208 P(X \u00d7 C).1 The learner receives labeled training data (x, c) =\n((x0, c0), . . . , (xn\u22121, cn\u22121)) \u223c P n and must select \u02c6h \u2208 H. The evasion adversary receives a labeled\nnatural example (xTest, cTest) \u223c P and selects y \u2208 N (xTest), the set of adversarial examples in the\n1Formally, we have a sigma algebra \u03a3 \u2286 2X\u00d7C of events and P(X \u00d7 C) is the set of probability measures\n\non (X \u00d7 C, \u03a3). All hypotheses must be measurable functions relative to \u03a3.\n\n2\n\n\fSymbol\n\nX\n\nC = {\u22121, 1}\nH \u2286 (X \u2192 C)\n(cid:96)(c, \u02c6c) = 1(c (cid:54)= \u02c6c)\nR \u2286 X \u00d7 X\n\nUsage\n\nSpace of examples\n\nSet of classes\n\nSet of hypotheses (labelings of examples)\n\n0-1 loss function\n\nBinary nearness relation\n\nN (x) = {y \u2208 X : (x, y) \u2208 R} Neighborhood of nearby adversarial examples\n\nTable 1: Basic notation used\n\nneighborhood of xTest. The adversary gives y to the learner and the learner must estimate cTest. Their\nperformance is measured by the 0-1 loss, (cid:96)(cTest, \u02c6h(y)).\nThe neighborhoods N (x) of possible adversarial samples are generated by the binary nearness relation\nR: N (x) = {y \u2208 X : (x, y) \u2208 R}. We require N (x) to be nonempty so some choice of y is always\navailable.2 When R is the identity relation, IX = {(x, x) : x \u2208 X}, the neighborhood is N (x) = {x}\nand y = xTest, giving us the standard problem of learning without an adversary. If R1, R2 are nearness\nrelations and R1 \u2286 R2, R2 represents a stronger adversary. One way to produce a relation R is from\na distance d on X and an adversarial budget constraint \u0001: R = {(x, y) : d(x, y) \u2264 \u0001}. This provides\nan ordered family of adversaries of varying strengths and has been used extensively in previous\nwork [17, 34, 66].\nNow, we de\ufb01ne the Adversarial Expected and Empirical Risks to measure the learner\u2019s performance\nin the presence of an evasion adversary.\nDe\ufb01nition 1 (Adversarial Expected Risk). The learner\u2019s risk under the true distribution in the\npresence of an adversary constrained by the relation R is\nLP (h, R) = E(x,c)\u223cP [ max\ny\u2208N (x)\n\n(cid:96)(h(y), c)].\n\nLet h\u2217 = argminh\u2208H LP (h, R). Then, learning is possible if there is an algorithm that, with high\nprobability, gives us \u02c6hn such that LP (\u02c6hn) \u2212 LP (h\u2217) \u2192 0.\nSince the learner does not have access to the true distribution P , it is approximated with the distribution\nof the empirical random variable, which is equal to (xi, ci) with probability 1/n for each i \u2208\n{0, . . . , n \u2212 1}.\nDe\ufb01nition 2 (Adversarial Empirical Risk Minimization (AERM)). The adversarial empirical risk\nminimizer AERMH,R : (X \u00d7 C)n \u2192 (X \u2192 C) is de\ufb01ned as\n\nAERMH,R(x, c) = argmin\n\nh\u2208H\n\nL(x,c)(h, R),\n\nwhere L(x,c) is the expected loss under the empirical distribution.\n\nClearly, a stronger adversary leads to worse performance for the best possible classi\ufb01er and in turn,\nworse performance for the learner.\nLemma 1. Let A : (X \u00d7 C)n \u2192 (X \u2192 C) be learning algorithm for a hypothesis class H. Suppose\nR1, R2 are nearness relations and R1 \u2286 R2. For all P ,\nh\u2208H LP (h, R1) \u2264 inf\n\nh\u2208H LP (h, R2).\n\ninf\n\nFor all P and all (x, c),\n\nProof. For all h \u2208 H,\n\nLP (A(x, c), R1) \u2264 LP (A(x, c), R2).\n\n(1)\n\n{(x, c) : \u2203y \u2208 N1(x) . h(y) (cid:54)= c} \u2286 {(x, c) : \u2203y \u2208 N2(x) . h(y) (cid:54)= c}\n\nso LP (h, R1) \u2264 LP (h, R2).\n\n2Additionally, for all y \u2208 X , {x \u2208 X : (x, y) \u2208 R} should be measurable.\n\n3\n\n\fIn other words, if we design a learning algorithm for an adversary constrained by R2, its performance\nagainst a weaker adversary is better. Crucially, note that the algorithm A on both sides of the\ninequality in Eq. 1 must be the same for the inequality to hold.\nWhile it is clear that the presence of an adversary leads to a decrease in the optimal performance for\nthe learner, we are now interested in the effect of an adversary on sample complexity. If we add an\nadversary to the learning setting de\ufb01ned in De\ufb01nition 3, what happens to the gap in performance\nbetween the optimal classi\ufb01er and the learned classi\ufb01er?\nDe\ufb01nition 3 (Learnability and Sample Complexity). A hypothesis class H is learnable by empirical\nrisk minimization in the presence of an evasion adversary constrained by R if there is a function\nmH,R : (0, 1)2 \u2192 N (the sample complexity) with the following property. For all 0 < \u03b4 < 1 and\n0 < \u0001 < 1, all n \u2265 mH,R(\u03b4, \u0001), and all P \u2208 P(X \u00d7 C),\n\nP n(cid:2)(cid:8)(x, c) : LP (AERMH,R(x, c), R) \u2212 inf\n\nh\u2208H LP (h, R) \u2264 \u0001(cid:9)(cid:3) \u2265 1 \u2212 \u03b4.\n\n3 Adversarial VC-dimension and sample complexity\n\nIn this section, we \ufb01rst describe the notion of corrupted hypotheses,\nwhich arise from standard hypothesis classes with the addition of an\nadversary. We then compute the VC-dimension of these hypotheses,\nwhich we term the adversarial VC-dimension and use it to prove\nthe sample complexity upper bounds learning in the presence of an\nevasion adversary.\n\n3.1 Corrupted hypotheses\n\nThe presence of an evasion adversary forces us to learn using a\ncorrupted set of hypotheses. Unlike ordinary hypotheses that always\noutput some class, these also output the special value \u22a5 that means\n\u201calways wrong\u201d. This corresponds to the adversary being able to\nselect whichever output does not match c. This is illustrated in\nFigure 1.\n\nLet (cid:101)C = {\u22121, 1,\u22a5}, where \u22a5 is the special \u201calways wrong\u201d output.\nWe can combine the information in H and R into a single set (cid:101)H \u2286\n(X \u2192 (cid:101)C) by de\ufb01ning the following mapping where \u03baR : (X \u2192\nC) \u2192 (X \u2192 (cid:101)C) and \u03baR(h) : X \u2192 (cid:101)C:\n\n\uf8f1\uf8f2\uf8f3\u22121 \u2200y \u2208 N (x) : h(y) = \u22121\n\n\u2200y \u2208 N (x) : h(y) = 1\n\n\u03baR(h) = x (cid:55)\u2192\n\n1\n\u22a5 \u2203y0, y1 \u2208 N (x) : h(y0) = \u22121, h(y1) = 1.\n\nThe corrupted set of hypotheses is then (cid:101)H = {\u03baR(h) : h \u2208 H}.\n\nWe note that the equivalence between learning an ordinary hypothe-\nsis with an adversary and learning a corrupted hypothesis without an\nadversary allows us to use standard proof techniques to bound the\nsample complexity.\nLemma 2. For any nearness relation R and distribution P ,\n\nLP (h, R) = LP (\u03baR(h), IX ).\n\n1\n\n\u22121\n\n(a) Optimal evasion attacks against\nhalfspace classi\ufb01ers with circles\nrepresenting the nearness relation\nR\n\n1\n\n\u22a5 \u22121\n\n(b) Corrupted halfspace classi\ufb01er\nFigure 1: Combining the\nfamily of hypotheses with\nthe nearness relation R. The\ntop \ufb01gure depicts some h \u2208\nH and the bottom shows\n\n\u03baR(h) \u2208 (cid:101)H.\n\nProof. Let \u02dch = \u03baR(h). For all (x, c),\n\nmax\ny\u2208N (x)\n\n(cid:96)(h(y), c) = 1(\u2203y \u2208 N (x) . h(y) (cid:54)= c) = 1(\u02dch(x) (cid:54)= c) = max\n\ny\u2208{x} (cid:96)(\u02dch(y), c)\n\nso LP (h, R) = E[maxy\u2208N (x) (cid:96)(h(y), c)] = E[maxy\u2208{x} (cid:96)(\u02dch(y), c)] = LP (\u02dch, IX ).\n\n4\n\n\fNow, we de\ufb01ne the loss class that arises from a hypothesis class. Each element of a loss class is\na function produced from the combination of the loss function with a classi\ufb01er function. The loss\nfunction derived from a classi\ufb01er h is \u03bb(h) : X \u00d7 C \u2192 {0, 1}, \u03bb(h) = (y, c) (cid:55)\u2192 (cid:96)(c, h(y)). Thus we\n\nhave the higher order function \u03bb : (X \u2192 (cid:101)C) \u2192 (X \u00d7C \u2192 {0, 1}). De\ufb01ne F, (cid:101)F \u2286 (X \u00d7C \u2192 {0, 1})\nto be the loss classes derived from H and (cid:101)H respectively: F = {\u03bb(h) : h \u2208 H} and (cid:101)F = {\u03bb(\u02dch) :\n\u02dch \u2208 (cid:101)H}.\n\nUsing this concept and notation, we can restate a standard result from the Rademacher complexity\napproach to proving sample complexity bounds.\nLemma 3 ( [68] Theorem 26.5). Let \u02c6f = \u03bb(\u03baR(AERMH,R(x, c))). With probability 1 \u2212 \u03b4,\n\nEP ( \u02c6f (x, c)) \u2212 inf\nf\u2208(cid:101)F\n\nEP (f (x, c)) \u2264 2R((cid:101)F(x, c))) +\n\n(cid:114)\n\n32 log(4/\u03b4)\n\nn\n\nwhere\n\n(cid:101)F(x, c) = {( \u02dcf (x0, c0), . . . , \u02dcf (xn\u22121, cn\u22121)) : \u02dcf \u2208 (cid:101)F}\n\n(cid:88)\n\nR(T ) =\n\n1\nn2n\n\nsT t.\n\nsup\nt\u2208T\n\ns\u2208{\u22121,1}n\n\nIn the case of non-adversarial learning, i.e. R = IX , this gives a familiar upper bound on LP (\u02c6h) \u2212\ninf h\u2208H LP (h). To get the correct generalization for the adversarial case, we needed to work with the\nloss class rather than the hypothesis class.\n\n3.2 Adversarial VC-dimension: VC-dimension for corrupted hypothesis classes\n\nWe begin by providing two equivalent de\ufb01nitions of a shattering coef\ufb01cient, which we use to\ndetermine VC-dimension for standard binary hypothesis classes and adversarial VC-dimension for\ntheir corrupted counterparts.\nDe\ufb01nition 4 (Equivalent shattering coef\ufb01cient de\ufb01nitions). The ith shattering coef\ufb01cient of a family\nof binary classi\ufb01ers H \u2286 (X \u2192 C) is \u03c3(H, i) = maxy\u2208X i |{(h(y0), . . . , h(yi\u22121)) : h \u2208 H}|.\nThe alternate de\ufb01nition of shattering in terms of the loss class F \u2286 (X \u00d7 C \u2192 {0, 1}) is\n\n\u03c3(cid:48)(F, i) =\n\nmax\n\n(y,c)\u2208X i\u00d7Ci\n\n|{(f (y0, c0), . . . , f (yi\u22121, ci\u22121)) : f \u2208 F}|.\n\nNote that these two de\ufb01nitions are indeed equivalent. If F achieves k error patterns on (y, c), then H\nachieves k classi\ufb01cation patterns on y. If H achieves k classi\ufb01cation patterns on y, then F achieves\nk error patterns on (y, c) for any choice of c. Thus, \u03c3(H, i) = \u03c3(cid:48)(F, i).\nThe ordinary VC-dimension is then VC(H) = sup{n \u2208 N : \u03c3(H, n) = 2n} = sup{n \u2208 N :\n\n\u03c3(cid:48)(\u03bb(H), n) = 2n}. The second de\ufb01nition naturally extends to our corrupted classi\ufb01ers, (cid:101)H \u2286 (X \u2192\n(cid:101)C), because \u03bb((cid:101)H) \u2286 (X \u00d7 C \u2192 {0, 1}).\n\nDe\ufb01nition 5 (Adversarial VC-dimension). The adversarial VC-dimension is\n\nAVC(H, R) = sup{n \u2208 N : \u03c3(cid:48)(\u03bb((cid:101)H), n) = 2n}.\n\nThese de\ufb01nitions and lemmas can now be combined to obtain a sample complexity upper bound for\nPAC-learning in the presence of an evasion adversary.\nTheorem 1 (Sample complexity upper bound with an evasion adversary). For a space X , a classi\ufb01er\nfamily H, and an adversarial constraint R, there is a universal constant C such that\n\nwhere d = AVC(H, R).\n\nmH,R(\u03b4, \u0001) \u2264 C\n\nd log(d/\u0001) + log(1/\u03b4)\n\n\u00012\n\n.\n\nProof. This follows from Lemma 2, Lemma 3, the Massart lemma on the Rademacher complexity of\na \ufb01nite class [68], and the Shelah-Sauer lemma [68].\n\nNote that the upper bound on sample complexity can be improved via the chaining technique [25].\n\n5\n\n\f4 The adversarial VC-dimension of halfspace classi\ufb01ers\n\nIn this section, we consider an evasion adversary with a particular structure, motivated by the practical\n(cid:96)p norm-based constraints that are usually imposed on these adversaries in the literature [17, 34]. We\nthen derive the adversarial VC-dimension for halfspace classi\ufb01ers corrupted by this adversary and\nshow that it remains equal to the standard VC-dimension.\nDe\ufb01nition 6 (Convex constraint on binary adversarial relation). Let B be a nonempty, closed, convex,\norigin-symmetric set.\nThe seminorm derived from B is (cid:107)x(cid:107)B = inf{\u0001 \u2208 R\u22650 : x \u2208 \u0001B} and the associated distance\ndB(x, y) = (cid:107)x \u2212 y(cid:107)B.\nLet VB be the largest linear subspace contained in B.\nThe adversarial constraint derived from B is R = {(x, y) : y\u2212x \u2208 B}, or equivalently N (x) = x+B.\nSince B is convex and contains the zero-dimensional subspace {0}, VB is well-de\ufb01ned. Note that\nthis de\ufb01nition of R encompasses all (cid:96)p bounded adversaries, as long as p \u2265 1.\nDe\ufb01nition 7. Let H be a family of class\ufb01ers on X = Rd.\nFor an example x \u2208 X and a classi\ufb01er h \u2208 H, de\ufb01ne the signed distance to the boundary to be\n\n\u03b4B(h, x, c) = c \u00b7 h(x) \u00b7\n\ninf\n\ny\u2208X :h(y)(cid:54)=h(x)\n\ndB(x, y)\n\nFor a list of examples x = (x0, . . . , xn\u22121) \u2208 X n, de\ufb01ne the signed distance set to be\n\nDB(H, x, c) = {(\u03b4B(h, x0), . . . , \u03b4B(h, xn\u22121)) : h \u2208 H}\n\nLet X = Rd and let H be the family of halfspace classi\ufb01ers: {(x (cid:55)\u2192 sgn(aT x\u2212 b)) : a \u2208 Rd, b \u2208 R}.\nFor simplicity of presentation, we de\ufb01ne sgn(0) = \u22a5, i.e. we consider classi\ufb01ers that do not give a\nuseful value on the boundary. It is well known that the VC-dimension of this family is d + 1 [68].\nOur result can be extended to other variants of halfspace classi\ufb01ers. In Appendix A of our pre-print,\nwe provide an alternative proof that applies to a more general de\ufb01nition.\nFor halfspace classi\ufb01ers, the set DB(H, x, c) is easily characterized.\nTheorem 2. Let H be the family of halfspace class\ufb01ers of X = Rd. Let B be a nonempty, closed,\nconvex, origin-symmetric set. Let R = {(x, y) : y \u2212 x \u2208 B}. Then AVC(H, R) = d + 1 \u2212 dim(VB).\nIn particular, when B is a bounded (cid:96)p ball, dim(VB) = 0, giving AVC(H, R) = d + 1.\nProof. First, we show AVC(H, R) \u2264 d + 1 \u2212 dim(VB). De\ufb01ne (cid:107)w(cid:107)B\u2217 = supy\u2208B wT y, the dual\nseminorm associated with B.\nFor any halfspace classi\ufb01er h, there are a \u2208 Rd and b \u2208 R such that f (x) = sgn(g(x)) where\ng(x) = aT x \u2212 b. Suppose that aT y = 0 for all y \u2208 VB. Let H(cid:48) be the set of classi\ufb01ers that\nare represented by such a. For a labeled example (x, c), the signed distance to the boundary is\nc(aT x \u2212 b)/(cid:107)a(cid:107)B\u2217. Any point on the boundary can be written as x \u2212 \u0001z for some \u0001 \u2265 0 and z \u2208 B.\nWe have aT (x \u2212 \u0001z) \u2212 b = 0 so\n\ndB(x, x \u2212 \u0001z) \u2265 \u0001 =\n\naT x \u2212 b\naT z\n\n\u2265 aT x \u2212 b\n(cid:107)a(cid:107)B\u2217\n\n.\n\n(2)\n\nBecause B is closed, there is some vector z\u2217 \u2208 B that maximizes aT z. The point x \u2212 aT x\u2212b\n(cid:107)a(cid:107)B\u2217 z\u2217 is on\nthe boundary, so the inequality 2 is tight.\nIf we add the restriction (cid:107)a(cid:107)B\u2217 = 1, each h \u2208 H(cid:48) has a unique representation as sgn\u25e6g. For\ninputs from the set G = {(a, b) \u2208 Rd+1 : (cid:107)a(cid:107)B\u2217 = 1, \u2200y \u2208 VB . aT y = 0}, the function (a, b) (cid:55)\u2192\n\u03b4B(f, x, c) is linear. Thus the function (a, b) (cid:55)\u2192 (\u03b4B(f, x0, c0), . . . , \u03b4B(f, xn\u22121, cn\u22121)) is also linear.\nG is a subset of a vector space of dimension d + 1 \u2212 dim VB, so\n\ndim(span(DB(H(cid:48), x, c))) \u2264 d + 1 \u2212 dim VB\n\nfor any choices of x and c.\n\n6\n\n\fx \u2208 X n and a corresponding list of labels c \u2208 Cn that are shattered by the corrupted classi\ufb01ers (cid:101)H.\n\nNow we consider the contribution of the classi\ufb01ers in H \\ H(cid:48). These are represented by a such\nthat aT y (cid:54)= 0 for some y \u2208 VB, or equivalently (cid:107)a(cid:107)B\u2217 = \u221e. In this case, for all (x, c) \u2208 Rd \u00d7 C,\naT (x + VB) = R. Thus x + VB intersects the classi\ufb01er boundary, \u02dch(x) = \u22a5, and the distance from x\nto the classi\ufb01er boundary is zero: \u03b4B(h, x, c) = 0. Thus DB(H \\ H(cid:48), x, c) = {0}, which is already\nin span(DB(H(cid:48), x, c)).\nLet U = span(DB(H, x, c)), let k = dim(U ), and let n > k. Suppose that there is a list of examples\nLet \u03b7 \u2208 {0, 1}n be the error pattern achieved by the classi\ufb01er h. Then \u03b7i = 1(\u03b4B(h, xi, ci) \u2264 1). In\nother words, the classi\ufb01cation is correct when ci and h(xi) have the same sign and the distance from\nxi to the classi\ufb01cation boundary is greater than 1.\nLet 1 be the vector of all ones. For each error pattern \u03b7, there is some h \u2208 H that achieves it if and\nonly if there is some point in DB(H, x, c) \u2212 1 with the corresponding sign pattern.\nSince k < n, then by the following standard argument, we can \ufb01nd a sign pattern that is not achieved\nby any point in U \u2212 1. Let w \u2208 Rn satisfy wT z = 0 for all z \u2208 U, w (cid:54)= 0, and wT 1 \u2265 0. There\nis a subspace of Rn of dimension n \u2212 k of vectors satisfying the \ufb01rst condition. Since n > k, this\nsubspace contains a nonzero vector u. At least one of u and \u2212u satis\ufb01es the third condition.\nIf a point z \u2208 U \u2212 1 has the same sign pattern as w, then wT z > 0. However, we have chosen w such\nthat wT z \u2264 0. Thus the classi\ufb01er family H does not achieve all 2n error patterns, which contradicts\nour assumption about (x, c).\n\nNow we show AVC(H, R) \u2265 d + 1 \u2212 dim(VB) by \ufb01nding (x, c) that are shattered by (cid:101)H.\n\nS x = 0 for all x \u2208 VB. We would like to use the hyperplane gS(x) = 1\n\nLet t = d \u2212 dim(VB), x0 = 0, and (x1, . . . , xt) be a basis for the subspace orthogonal to VB. For a\nS x + bS such that (i (cid:55)\u2192 g(xi)) = 1(S)\nsubset S \u2286 {1, . . . , t}, consider the af\ufb01ne function gS(x) = aT\nand aT\n2 to achieve the labeling\nS x, which is some \ufb01nite value because g only varies along\nassociated with S. Let \u03b4S = maxx\u2208B aT\nlines for which B is bounded. If \u03b4S \u2265 1\n2 for some S, then our current con\ufb01guration does not work and\nwe must rescale the points to ensure that they can be shattered. Let \u03b4 = max(1, 3 maxS\u2286{1,...,t} \u03b4S).\n\nThe example list \u03b4 \u00b7 (x0, . . . , xt) is shattered by (cid:101)H for any choice of c.\n\n5 Adversarial VC Dimension can be larger\n\nWe have shown in the previous section that the adversarial\nVC-dimension can be smaller than or equal to the standard\nVC-dimension. Here, we provide explicit constructions for the\ncounter-intuitive case when adversarial VC-dimension can be\narbitrarily larger than the VC-dimension.\nTheorem 3. For any d \u2208 N, there is a space X , an adversarial\nconstraint R \u2286 X \u00d7 X , and a hypothesis class H : X \u2192 C\nsuch that VC(H) = 1 and AVC(H, R) \u2265 d.\n\nProof. Let X = Zd. Let H = {hx : x \u2208 X}, where\n\n(cid:26)1\n\nhx(y) =\n\n\u22121\n\ny = x\ny (cid:54)= x.\n\nThe VC dimension of this family is 1 because no classi\ufb01er\noutputs the labeling (1, 1) for any pair of distinct examples.\nConsider the adversary with an (cid:96)\u221e budget of 1. No cor-\nrupted classi\ufb01er will ever output 1, only 0 and \u22a5. Take\nx = (x0, . . . , xd\u22121) \u2208 (Zd)d:\n\n(cid:26)\u22121\n\n1\n\n(xi)j =\n\nj = i\nj (cid:54)= i.\n\n7\n\nThe corrupted function \u02dch(0,1) \u2208 (cid:101)H\n\nFigure 2: The examples x0 =\n(\u22121, 1) and x1 = (1,\u22121) are\nmarked with crosses. The function\nh(0,1) \u2208 H maps the smaller square\nto 1 and everything else to \u22121.\nmaps the larger square to \u22a5 and ev-\nerything else to \u22121. Observe that\n\u02dch(0,1)(x0) = \u22a5 and \u02dch(0,1)(x1) =\n\u22121.\n\n\fNow consider the 2d classi\ufb01ers that are the indicators for y \u2208\n{0, 1}[d]: \u02dchy = \u03ba(hy). Observe that\n\n(cid:26)\u22121\n\n\u02dchy(xi) =\n\nyi = 1\n\u22a5 yi = 0\n\nif yi = 0 then d\u221e(xi, y) = 1.\n\nbecause if yi = 1 then d\u221e(xi, y) = 2 but\nThus\n(\u02dchy(x0), . . . , \u02dchy(xd\u22121)) contains \u22a5 at each index that y contains 1.\nIf the examples are all la-\nbeled with \u22121, this subset of the corrupted classi\ufb01er family achieves all 2d possible error patterns.\nThe adversarial VC dimension is at least d.\nBecause (cid:96)(c,\u22121) \u2264 (cid:96)(c,\u22a5) for all c \u2208 {\u22121, 1}, in the hypothesis class constructed in the proof of\nTheorem 3, h1 is clearly the best hypothesis. For all (x, c) \u2208 Zd \u00d7 {\u22121, 1} and all y \u2208 {0, 1}d, we\nhave (cid:96)(c, \u02dch1(x)) \u2264 (cid:96)(c, \u02dchy(x)). Thus h1 can be selected without examining any training data and\nthe sample complexity is mH,R(\u03b4, \u0001) = 0.\nTheorem 3 shows that the addition of an adversary can lead to a signi\ufb01cantly weaker upper bound\non sample complexity. It does not show that the addition of an adversary can increase the sample\ncomplexity and whether this is possible remains an open question.\n\n6 Related work and Concluding Remarks\n\nIn this paper, we are the \ufb01rst to demonstrate sample complexity bounds on PAC-learning in the\npresence of an evasion adversary. We now compare with related work and conclude.\n\n6.1 Related Work\n\nThe body of work on attacks and defenses for machine learning systems is extensive as described\nin Section 1 and thus we only discuss the closest related work here. We refer interested readers to\nextensive recent surveys [9, 50, 62] for a broader overview.\nPAC-learning with poisoning adversaries: Kearns and Li [44] studied learning in the presence of\na training time adversary, extending the more benign framework of Angluin and Laird [2] which\nlooked at noisy training data.\nClassi\ufb01er-speci\ufb01c results: Wang et al. [78] analyze the robustness of nearest neighbor classi\ufb01ers\nwhile Fawzi et al. [28, 29] analyze the robustness of linear and quadratic classi\ufb01ers under both\nadversarial and random noise. Hein and Andriushchenko [38] provide bounds on the robustness of\nneural networks of up to one layer while Weng et al. [79] use extreme value theory to bound the\nrobustness of neural networks of arbitrary depth. Both these works assume Lipschitz continuous\nfunctions. In contrast to our work, all these show how robust a given classi\ufb01er is and do not address\nthe issue of learnability and sample complexity.\nDistribution-speci\ufb01c results: Schimdt et al. [66] study the sample complexity of learning a mixture\nof Gaussians as well as Bernoulli distributed data in the presence of (cid:96)\u221e-bounded adversaries. For the\nformer, they show that for all classi\ufb01ers, the sample complexity increases by an order of\nd, while\nit only increases for halfspace classi\ufb01ers for the latter distribution. Gilmer et al. [32] analyze the\nrobustness of classi\ufb01ers for a distribution consisting of points distributed on two concentric spheres.\nIn contrast to these papers, we prove our results in a distribution-agnostic setting.\nWasserstein distance-based constraint: Sinha et al. [72] consider a different adversarial constraint,\nbased on the Wasserstein distance between the benign and adversarial distributions. They then\nstudy the sample complexity of Stochastic Gradient Descent for minimizing the relaxed Lagrangian\nformulation of the learning problem with this constraint. Their constraint allows for different samples\nto be perturbed with different budgets while we study a sample-wise constraint on the adversary.\nObjective functions for robust classi\ufb01ers: Raghunathan et al. [64] and Kolter and Wong [45] take\nsimilar approaches to setting up a solvable optimization problem that approximates the worst-case\nadversary in order to carry out adversarial training. Their focus is not on the sample complexity\nneeded for learning, but rather on provable robustness achieved against (cid:96)\u221e-bounded adversaries by\nchanging the training objective.\n\n\u221a\n\n8\n\n\f6.2 Concluding remarks\n\nWhile our results provide a useful theoretical understanding of the problem of learning with adver-\nsaries, the nature of the 0-1 loss prevents the ef\ufb01cient implementation of Adversarial ERM to obtain\nrobust classi\ufb01ers. In practice, recent work on adversarial training [34, 53, 76], has sought to improve\nthe robustness of classi\ufb01ers by directly trying to \ufb01nd a classi\ufb01er that minimizes the Adversarial\nExpected Risk, which leads to a saddle point problem [53]. A number of heuristics are used to enable\nthe ef\ufb01cient solution of this problem, such as replacing the 0-1 loss with smooth surrogates like the\nlogistic loss and approximating the inner maximum by a Projected Gradient Descent (PGD)-based ad-\nversary [53] or by an upper bound [64]. Our framework now allows for an analysis of the underlying\nPAC learning problem for these approaches. An interesting direction is thus to \ufb01nd the adversarial\nVC-dimension for more complex classi\ufb01er families such as piece-wise linear classi\ufb01ers and neural\nnetworks. Another natural next step is to understand the behavior of convex learning problems in the\npresence of adversaries, in particular the Regularized Loss Minimization framework.\n\nAcknowledgments\n\nThis work was supported by the National Science Foundation under grants CNS-1553437, CIF-\n1617286 and CNS-1409415, by Intel through the Intel Faculty Research Award and by the Of\ufb01ce of\nNaval Research through the Young Investigator Program (YIP) Award.\n\nReferences\n[1] Mahdieh Abbasi and Christian Gagn\u00b4e. Robustness to adversarial examples through an ensemble of\n\nspecialists. arXiv preprint arXiv:1702.06856, 2017.\n\n[2] Dana Angluin and Philip Laird. Learning from noisy examples. Machine Learning, 2(4):343\u2013370, 1988.\n[3] Anurag Arnab, Ondrej Miksik, and Philip H. S. Torr. On the robustness of semantic segmentation models\n\nto adversarial attacks. In CVPR, 2018.\n\n[4] Alexander Bagnall, Razvan Bunescu, and Gordon Stewart. Training ensembles to detect adversarial\n\nexamples. arXiv preprint arXiv:1712.04006, 2017.\n\n[5] Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. Dimensionality reduction as a defense against\n\nevasion attacks on machine learning classi\ufb01ers. arXiv preprint arXiv:1704.02654, 2017.\n\n[6] Arjun Nitin Bhagoji, Warren He, Bo Li, and Dawn Song. Black-box attacks on deep neural networks via\n\ngradient estimation. In ICLR Workshop, 2018.\n\n[7] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim \u02c7Srndi\u00b4c, Pavel Laskov, Giorgio\nGiacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint European\nConference on Machine Learning and Knowledge Discovery in Databases, pages 387\u2013402. Springer, 2013.\n[8] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. In\nProceedings of the 29th International Conference on Machine Learning (ICML-12), pages 1807\u20131814,\n2012.\n\n[9] Battista Biggio and Fabio Roli. Wild patterns: Ten years after the rise of adversarial machine learning.\n\narXiv preprint arXiv:1712.03141, 2017.\n\n[10] Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks\n\nagainst black-box machine learning models. In ICLR, 2018.\n\n[11] Noam Brown and Tuomas Sandholm. Superhuman ai for heads-up no-limit poker: Libratus beats top\n\nprofessionals. Science, page eaao1733, 2017.\n\n[12] Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv\n\npreprint arXiv:1607.04311, 2016.\n\n[13] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection\n\nmethods. In AISec, 2017.\n\n[14] Nicholas Carlini and David Wagner. Magnet and \u201cef\ufb01cient defenses against adversarial attacks\u201d are not\n\nrobust to adversarial examples. arXiv preprint arXiv:1711.08478, 2017.\n\n[15] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In Security\n\nand Privacy (SP), 2017 IEEE Symposium on, pages 39\u201357. IEEE, 2017.\n\n[16] Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. In\n\nDLS (IEEE SP), 2018.\n\n9\n\n\f[17] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR,\n\nabs/1608.04644, 2016.\n\n[18] Pin-Yu Chen, Yash Sharma, Huan Zhang, Jinfeng Yi, and Cho-Jui Hsieh. Ead: elastic-net attacks to deep\n\nneural networks via adversarial examples. In AAAI, 2018.\n\n[19] Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization\nbased black-box attacks to deep neural networks without training substitute models. In Proceedings of the\n10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pages 15\u201326. ACM, 2017.\n\n[20] Shang-Tse Chen, Cory Cornelius, Jason Martin, and Duen Horng Chau. Robust physical adversarial attack\n\non faster r-cnn object detector. arXiv preprint arXiv:1804.05810, 2018.\n\n[21] Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet. Houdini: Fooling deep structured\n\nprediction models. In NIPS, 2017.\n\n[22] Ronan Collobert, Jason Weston, L\u00b4eon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa.\nNatural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug):2493\u2013\n2537, 2011.\n\n[23] Nilaksh Das, Madhuri Shanbhogue, Shang-Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E\nKounavis, and Duen Horng Chau. Shield: Fast, practical defense and vaccination for deep learning using\njpeg compression. arXiv preprint arXiv:1802.06816, 2018.\n\n[24] Li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep neural network learning for speech\nrecognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP),\n2013 IEEE International Conference on, pages 8599\u20138603. IEEE, 2013.\n\n[25] R Dudley. Sizes of compact subsets of hilbert space and continuity of gaussian processes. J. Funct. Anal.,\n\n1:290\u2013330, 1967.\n\n[26] Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M Roy. A study of the effect of JPG\n\ncompression on adversarial images. arXiv preprint arXiv:1608.00853, 2016.\n\n[27] Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul Prakash, Amir Rahmati,\n\nand Dawn Song. Robust physical-world attacks on machine learning models. In CVPR, 2018.\n\n[28] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classi\ufb01ers\u2019 robustness to adversarial\n\nperturbations. Machine Learning, 107(3):481\u2013508, 2018.\n\n[29] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classi\ufb01ers: from\n\nadversarial to random noise. In NIPS, 2016.\n\n[30] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial samples\n\nfrom artifacts. arXiv preprint arXiv:1703.00410, 2017.\n\n[31] Volker Fischer, Mummadi Chaithanya Kumar, Jan Hendrik Metzen, and Thomas Brox. Adversarial\n\nexamples for semantic image segmentation. In ICLR Workshop, 2017.\n\n[32] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and\n\nIan Goodfellow. Adversarial spheres. In ICLR, 2018.\n\n[33] Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. Adversarial and clean data are not twins. arXiv preprint\n\narXiv:1704.04960, 2017.\n\n[34] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations, 2015.\n\n[35] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the\n\n(statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.\n\n[36] Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. Adversarial\nexamples for malware detection. In European Symposium on Research in Computer Security, pages 62\u201379.\nSpringer, 2017.\n\n[37] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning\n\napplications. Information and Computation, 100(1), 1992.\n\n[38] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er against\nadversarial manipulation. In Advances in Neural Information Processing Systems, pages 2263\u20132273, 2017.\n[39] Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew\nSenior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic\nmodeling in speech recognition: The shared views of four research groups. IEEE Signal Processing\nMagazine, 29(6):82\u201397, 2012.\n\n[40] Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, and Pieter Abbeel. Adversarial attacks on\n\nneural network policies. In ICLR, 2017.\n\n10\n\n\f[41] Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, and Bo Li. Manipulating\nmachine learning: Poisoning attacks and countermeasures for regression learning. In IEEE Security and\nPrivacy, 2018.\n\n[42] Kyle D Julian, Jessica Lopez, Jeffrey S Brush, Michael P Owen, and Mykel J Kochenderfer. Policy\ncompression for aircraft collision avoidance systems. In Digital Avionics Systems Conference (DASC),\n2016 IEEE/AIAA 35th, pages 1\u201310. IEEE, 2016.\n\n[43] Alex Kantchelian, JD Tygar, and Anthony D Joseph. Evasion and hardening of tree ensemble classi\ufb01ers.\n\nIn Proceedings of the 33rd International Conference on Machine Learning (ICML-16), 2016.\n\n[44] Michael Kearns and Ming Li. Learning in the presence of malicious errors. SIAM Journal on Computing,\n\n22(4):807\u2013837, 1993.\n\n[45] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In ICML, 2018.\n\n[46] Jernej Kos, Ian Fischer, and Dawn Song. Adversarial examples for generative models. arXiv preprint\n\narXiv:1702.06832, 2017.\n\n[47] Jernej Kos and Dawn Song. Delving into adversarial attacks on deep policies. In ICLR Workshop, 2017.\n[48] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classi\ufb01cation with deep convolutional\nneural networks. In Proceedings of the 25th International Conference on Neural Information Processing\nSystems - Volume 1, NIPS\u201912, pages 1097\u20131105, USA, 2012. Curran Associates Inc.\n\n[49] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv\n\npreprint arXiv:1607.02533, 2016.\n\n[50] Qiang Liu, Pan Li, Wentao Zhao, Wei Cai, Shui Yu, and Victor CM Leung. A survey on security threats\nand defensive techniques of machine learning: A data driven view. IEEE access, 6:12103\u201312117, 2018.\n[51] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and\n\nblack-box attacks. In ICLR, 2017.\n\n[52] Jiajun Lu, Hussein Sibai, and Evan Fabry. Adversarial examples that fool detectors. arXiv preprint\n\narXiv:1712.02494, 2017.\n\n[53] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards\n\ndeep learning models resistant to adversarial attacks. In ICLR, 2018.\n\n[54] Dongyu Meng and Hao Chen. Magnet: a two-pronged defense against adversarial examples. In Proceedings\nof the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 135\u2013147. ACM,\n2017.\n\n[55] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversar-\n\nial perturbations. In CVPR, 2017.\n\n[56] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate\n\nmethod to fool deep neural networks. In CVPR, 2016.\n\n[57] Matej Morav\u02c7c\u00b4\u0131k, Martin Schmid, Neil Burch, Viliam Lis`y, Dustin Morrill, Nolan Bard, Trevor Davis,\nKevin Waugh, Michael Johanson, and Michael Bowling. Deepstack: Expert-level arti\ufb01cial intelligence in\nheads-up no-limit poker. Science, 356(6337):508\u2013513, 2017.\n\n[58] Mehran Mozaffari-Kermani, Susmita Sur-Kolay, Anand Raghunathan, and Niraj K Jha. Systematic\npoisoning attacks on and defenses for machine learning in healthcare. IEEE journal of biomedical and\nhealth informatics, 19(6):1893\u20131905, 2015.\n\n[59] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from\n\nphenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.\n\n[60] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami.\nPractical black-box attacks against deep learning systems using adversarial examples. In Proceedings of\nthe 2017 ACM Asia Conference on Computer and Communications Security, 2017.\n\n[61] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami.\nThe limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security\nand Privacy (EuroS&P), pages 372\u2013387. IEEE, 2016.\n\n[62] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of\n\nsecurity and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.\n\n[63] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense\nto adversarial perturbations against deep neural networks. In Security and Privacy (SP), 2016 IEEE\nSymposium on, pages 582\u2013597. IEEE, 2016.\n\n[64] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial examples. In\n\nICLR, 2018.\n\n11\n\n\f[65] Benjamin IP Rubinstein, Blaine Nelson, Ling Huang, Anthony D Joseph, Shing-hon Lau, Satish Rao,\nNina Taft, and JD Tygar. Stealthy poisoning attacks on pca-based anomaly detectors. ACM SIGMETRICS\nPerformance Evaluation Review, 37(2):73\u201374, 2009.\n\n[66] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially\n\nrobust generalization requires more data. arXiv preprint arXiv:1804.11285, 2018.\n\n[67] Uri Shaham, James Garritano, Yutaro Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan Cheng, Kelly\nStanton, and Yuval Kluger. Defending against adversarial images using basis functions transformations.\narXiv preprint arXiv:1803.10840, 2018.\n\n[68] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms.\n\nCambridge university press, 2014.\n\n[69] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. Accessorize to a crime: Real and\nstealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference\non Computer and Communications Security, pages 1528\u20131540. ACM, 2016.\n\n[70] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez,\nThomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human\nknowledge. Nature, 550(7676):354, 2017.\n\n[71] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni-\n\ntion. arXiv preprint arXiv:1409.1556, 2014.\n\n[72] Aman Sinha, Hongseok Namkoong, and John Duchi. Certi\ufb01able distributional robustness with principled\n\nadversarial training. In ICLR, 2018.\n\n[73] Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Prateek Mittal, and Mung Chiang. Rogue signs:\n\nDeceiving traf\ufb01c sign recognition with malicious ads and logos. In DLS (IEEE SP), 2018.\n\n[74] Charles Smutz and Angelos Stavrou. When a tree falls: Using diversity in ensemble classi\ufb01ers to identify\n\nevasion in malware detectors. In NDSS, 2016.\n\n[75] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and\n\nRob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.\n\n[76] Florian Tram`er, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversar-\n\nial training: Attacks and defenses. In ICLR, 2018.\n\n[77] Qinglong Wang, Wenbo Guo, Kaixuan Zhang, Alexander G Ororbia II, Xinyu Xing, Xue Liu, and C Lee\nGiles. Adversary resistant deep neural networks with an application to malware detection. In Proceedings\nof the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages\n1145\u20131153. ACM, 2017.\n\n[78] Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. Analyzing the robustness of nearest neighbors to\n\nadversarial examples. In ICML, 2018.\n\n[79] Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca\nDaniel. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR, 2018.\n[80] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. Adversarial examples\nfor semantic segmentation and object detection. In International Conference on Computer Vision. IEEE,\n2017.\n\n[81] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural\n\nnetworks. In NDSS, 2018.\n\n[82] Weilin Xu, Yanjun Qi, and David Evans. Automatically evading classi\ufb01ers. In Proceedings of the 2016\n\nNetwork and Distributed Systems Symposium, 2016.\n\n[83] Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing\nHuang, Xiaofeng Wang, and Carl A Gunter. Commandersong: A systematic approach for practical\nadversarial voice recognition. In USENIX Security, 2018.\n\n12\n\n\f", "award": [], "sourceid": 178, "authors": [{"given_name": "Daniel", "family_name": "Cullina", "institution": "Princeton University"}, {"given_name": "Arjun Nitin", "family_name": "Bhagoji", "institution": "Princeton University"}, {"given_name": "Prateek", "family_name": "Mittal", "institution": "Princeton University"}]}