{"title": "Thwarting Adversarial Examples: An $L_0$-Robust Sparse Fourier Transform", "book": "Advances in Neural Information Processing Systems", "page_first": 10075, "page_last": 10085, "abstract": "We give a new algorithm for approximating the Discrete Fourier transform of an approximately sparse signal that is robust to worst-case $L_0$ corruptions, namely that some coordinates of the signal can be corrupt arbitrarily. Our techniques generalize to a wide range of linear transformations that are used in data analysis such as the Discrete Cosine and Sine transforms, the Hadamard transform, and their high-dimensional analogs. We use our algorithm to successfully defend against worst-case $L_0$ adversaries in the setting of image classification. We give experimental results on the Jacobian-based Saliency Map Attack (JSMA) and the CW $L_0$ attack on the MNIST and Fashion-MNIST datasets as well as the Adversarial Patch on the ImageNet dataset.", "full_text": "Thwarting Adversarial Examples: An L0-Robust\n\nSparse Fourier Transform\n\nMitali Bafna \u2217\n\nHarvard University\nCambridge, MA USA\n\nmitalibafna@g.harvard.edu\n\nJack Murtagh \u2217\n\nHarvard University\nCambridge, MA USA\n\njmurtagh@g.harvard.edu\n\nSchool of Engineering & Applied Sciences\n\nSchool of Engineering & Applied Sciences\n\nDepartment of Electrical Engineering and Computer Science\n\nNikhil Vyas\u2217\n\nMIT\n\nCambridge, MA USA\nnikhilv@mit.edu\n\nAbstract\n\nWe give a new algorithm for approximating the Discrete Fourier transform of an\napproximately sparse signal that has been corrupted by worst-case L0 noise, namely\na bounded number of coordinates of the signal have been corrupted arbitrarily. Our\ntechniques generalize to a wide range of linear transformations that are used in data\nanalysis such as the Discrete Cosine and Sine transforms, the Hadamard transform,\nand their high-dimensional analogs. We use our algorithm to successfully defend\nagainst well known L0 adversaries in the setting of image classi\ufb01cation. We give\nexperimental results on the Jacobian-based Saliency Map Attack (JSMA) and the\nCarlini Wagner (CW) L0 attack on the MNIST and Fashion-MNIST datasets as\nwell as the Adversarial Patch on the ImageNet dataset.\n\n1\n\nIntroduction\n\nIn the last several years, neural networks have made unprecedented achievements on computational\nlearning tasks like image classi\ufb01cation. Despite their remarkable success, neural networks have been\nshown to be brittle in the presence of adversarial noise [SZS+13]. Many effective attacks have been\nproposed in the context of computer vision that reliably generate small perturbations to input images\n(sometimes imperceptible to humans) that drastically change the network\u2019s classi\ufb01cation of the image\n[MFF15, GSS15, CW16]. As deep learning becomes more integrated into our everyday technology,\nthe need for systems that are robust to adversarial noise grows, especially in applications to security.\nA lot of work has been done to improve robustness and defend against adversarial attacks [PMW+16,\nTKP+17, MMS+17]. However many approaches rely on knowing the attack strategy in advance and\ntoo few proposed methods for robustness offer theoretical guarantees and may be broken by a new\nattack shortly after they\u2019re published. As such, recent deep learning literature has seen an arms race\nof back-and-forth attacks and defenses reminiscent of cryptography before it was grounded in \ufb01rm\ntheoretical foundations.\nIn this work, we give a framework for improving the robustness of classi\ufb01ers to adversaries with\nL0 noise budgets. That is, adversaries are restricted in the number of features they can corrupt, but\nmay corrupt each arbitrarily. Our framework is based on a new Sparse Discrete Fourier transform\n(DFT) that is robust to worst-case L0 noise added in the time domain. We call such transformations\n\n\u2217Authors ordered alphabetically.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fL0-robust. In particular, we show how to recover top coef\ufb01cients of an approximately sparse signal\nthat has been corrupted by worst-case L0 noise.\nOur theoretical results use techniques from compressed sensing [CRT06, BCDH10]. In fact we\nprovide a much more general framework for building L0-robust sparse transformations that applies\nto many transformations used in practice such as all discrete variants of the Fourier transform, the\nSine and Cosine Transforms, the Hadamard transform, and their higher-dimensional generalizations.\nOur approach can be used to develop algorithms with the following bene\ufb01ts:\n\n\u2022 Provable performance guarantees. Our approach leverages rigorous results in compressed\nsensing that allow us to prove theorems about L0-robust sparse transformations under mild\nassumptions that typically hold in practice.\n\u2022 Worst-case adversaries. The guarantees of our algorithms hold for all adversaries that stay\nwithin the noise budget, given that the input signals are sparse in the Fourier or related\ndomains. In particular, our defenses do not require prior knowledge of the adversary\u2019s attack\nstrategy. In Section 4.2, we relax this and show that one can design improved algorithms\nwithin our framework when given more knowledge of the adversary.\n\n\u2022 Generality. Our framework is general purpose and is compatible with a variety of basis\n\ntransformations commonly used in scienti\ufb01c computing.\n\nA notable feature of our framework is the focus on L0 noise. This threat model has been considered\nin previous works and L0 attacks and defenses have been developed [CW16, PMJ+15, PMW+16].\nWhile L2 attacks are more commonly studied, many of the most high-pro\ufb01le recent real-world\nattacks actually \ufb01t in the L0 model, such as the graf\ufb01ti-like road sign perturbations of [EEF+17],\nthe eyeglasses that fool facial recognition software [SBBR16], and the patch that can make almost\nany image get labeled as a \u2018toaster\u2019 by state-of-the-art classi\ufb01ers [BMR+17]. In general, physical\nobstructions in images or malicious splicing of audio or video \ufb01les are realistic threats that can be\nmodeled as L0 noise, whereas L2 attacks may be more dif\ufb01cult to carry out in the physical world.\nThe connection between our L0-robust transformations and adversarial attacks on images is as\nfollows. Many natural images are sparse in Fourier bases such as the basis used in the Discrete Cosine\ntransform (DCT). Indeed, this is a necessary feature for compression algorithms like JPEG to work.\nThrough this lens, corrupted images can be viewed as noisy signals that are sparse in some domain\nand our techniques allow us to reconstruct these sparse signals under worst-case/adversarial L0 noise.\nThis reconstruction allows us to correct the corruptions made by the adversary to produce something\nclose to the original image, which in turn improves the neural network accuracy.\nOur results have wide applicability in signal processing since it is well known that audio/video signals\nare sparse in the Fourier or wavelet domains. Signal processing is important in many areas of science\nand medicine including MRI, radio astronomy, and facial recognition. Errors are ubiquitous in the\nabove applications, whether due to natural artifacts, sensor failures, or malicious tampering.\nIn Section 4, we give experimental results that demonstrate the effectiveness of our approach against\nleading L0 attacks. For example, in one experiment the network accuracy drops from 88.5% on\nuncorrupted images to 24.8% on adversarial images with 30 pixels corrupted, but after our correction,\nnetwork accuracy returns to 83.1%. On another attack, the adversary is free to choose its own budget\nand network accuracy drops from 87.8% all the way to 0% (the adversary succeeds on every image)\nbut after running our correction algorithm, network accuracy returns to 85.7%.\nIn Section 2, we set up the problem and discuss related work. We give new theoretical results in\nSection 3. In Section 4, we evaluate our framework on three leading L0 attacks in the literature: the\nJSMA attack of Papernot et al [PMJ+15], the L0 attack from Carlini and Wagner (CW) [CW16], and\nthe adversarial patch from Brown et al [BMR+17].\n\nNotation For a vector v, we let vh(k) and vt(k) denote the head(k) and tail(k) of v. That is, vh(k)\ndenotes the vector containing just the k largest coordinates of v in absolute value with all other\ncoordinates set to 0 and vt(k) = v \u2212 vh(k). For example if v = [\u22123, 2, 1] then vh(2) = [\u22123, 2, 0] and\nvt(2) = [0, 0, 1]. We refer to vh(k) as the \u201ctop k coef\ufb01cients of v\u201d. For a vector v, we let \u02c6v = F v,\nwhere F is the contextual linear transformation. We use the phrase, \u201cprojection of v to its top-k\nF -coef\ufb01cients\u201d, to mean the result of F \u22121(F v)h(k). That is, calculate the top-k coef\ufb01cients of F v,\nset the remaining coef\ufb01cients to 0 and then invert the result to back to the original domain by applying\n\n2\n\n\fbut k of its entries are 0. We say that v is approximately (k, \u0001)-sparse if(cid:13)(cid:13)xt(k)\n\nF \u22121. Unless speci\ufb01ed, (cid:107)\u00b7(cid:107) denotes the L2 norm of a vector. For a scalar c, |c| denotes the absolute\nvalue when c \u2208 R and denotes the modulus of c when c \u2208 C. We say that a vector v is k-sparse if all\nMk to be the set of all k-sparse vectors. (cid:126)0 denotes the all-zeroes vector.\n\n(cid:13)(cid:13) \u2264 \u0001 \u00b7 (cid:107)x(cid:107). We de\ufb01ne\n\n2 Overview\n\n2.1 L0-Robust Sparse Fourier Transform\n\nProblem Setup: The key property that we use is that natural images are approximately sparse in\nfrequency bases like the 2D Discrete Fourier basis or the 2D Discrete Cosine basis. This sparsity is\nexploited in image and video compression algorithms like JPEG and MPEG. The DFT and DCT are\njust linear transformations (in fact change of bases) from the space of images to a frequency domain.\nSo given a d \u00d7 d image, we model it as approximately sparse in one of these bases, which from now\non we will just refer to as the \u2018Fourier basis\u2019. Note that once the basis is \ufb01xed we can think of the\nimage x \u2208 Rn (n = d2) as an approximately sparse vector in the corresponding Fourier basis.\nOur goal is to approximate the top-k Fourier coef\ufb01cients of a vector x even after it has been corrupted\nwith adversarial L0 noise. We do not know the locations or magnitudes of the corruptions but we do\nassume that we know an upper bound on the number of corrupted coordinates. In other words, if F is\nthe Fourier matrix (the matrix corresponding to the Discrete Fourier linear transformation), we want\nto approximate \u02c6xh(k) where \u02c6x = F x. This can be modeled as the following problem:\nProblem 2.1 (Main Problem). Given a corrupted vector y = x + e where x \u2208 Rn is approximately\nk-sparse in the Fourier basis and e is exactly t-sparse in the time domain (i.e. has L0 norm bounded\nby t), approximate \u02c6xh(k).\nWe will solve the above problem by splitting y into x(cid:48) + e(cid:48) + \u03b2 where x(cid:48) is exactly k-sparse in the\nFourier domain, e(cid:48) is t-sparse in the time domain and \u03b2 is an error term bounded in L2 norm by the\ntail of x. Our techniques are not limited to Fourier matrices and in fact extend naturally to other\ntransformations like wavelets, but for simplicity we will use the term Fourier throughout.\n\nRelated Work: Our setting is reminiscent of extensively studied dimensionality reduction tech-\nniques like Robust PCA [CLMW11] for recovery of low rank matrices from L0 corrupted data. These\nhave wide applicability in machine learning although, in that setting, they are not able to handle truly\nadversarial noise and make some assumptions on the error distribution. Our results on the other hand,\ncan protect against worst-case adversaries bounded in their L0 noise budget.\nVariants of the Sparse Fourier Transform have been studied [HIKP12a, HIKP12b, IKP14] but that\nwork is concerned with recovering \u02c6xh(k) given an approximately sparse vector x, using sublinear\nmeasurements. Our focus is on recovering \u02c6xh(k) when some of the measurements might be corrupted\nand we show a tight tradeoff between the number of measurements corrupted versus the quality of\nrecovery we can ensure.\n\nOur Techniques: Our main result uses techniques from the \ufb01eld of compressed sensing (CS)\n[CRT06, BCDH10] and properties of Fourier (and related) matrices. Using these we prove that\nAlgorithm 1 converges to a good solution to Problem 2.1, where by a good solution we mean that it is\nclose to the true solution in the L\u221e norm.\nIn iteration i of Algorithm 1, \u02c6x[i] is an estimate of \u02c6xh(k) and e[i] is an estimate of e. In iteration\ni + 1, the algorithm uses the previous estimates, e[i] and \u02c6x[i] to update its estimates by solving the\nlinear equation y = F \u22121 \u02c6x + e and projecting onto the top k Fourier coef\ufb01cients of y \u2212 e[i]. Note\nthat while this algorithm is intuitive it does not necessarily converge to the true solution for similar\nsettings. For example, if instead of the L0 norm, e was bounded in the L\u221e norm, then information\ntheoretically, there is no algorithm which can give a good solution and hence this algorithm would\nnot be able to either. In our setting though, we can show that Algorithm 1 has an exponentially fast\nconvergence towards a good solution to Problem 2.1 and moreover the guarantees we get are tight\nin the information theoretic sense. We state our result below for a general class of transformations\nwhich includes the DFT, DCT and their higher-dimensional versions.\n\n3\n\n\fAlgorithm 1 Iterative Hard Thresholding (IHT) [BCDH10].\n\nInput: Positive integers k, t, and T . y = x + e, where x \u2208 Rn is approximately k-sparse in the\n\nFourier basis and e \u2208 Rn is exactly t-sparse in the time domain. Fourier matrix F .\n\nOutput: \u02c6xh(k), approximation of the top k Fourier coef\ufb01cients of x.\n\n\u02c6x[1] \u2190 (cid:126)0\ne[1] \u2190 (cid:126)0\nfor i = 1\u00b7\u00b7\u00b7 T do\n\n1: function IHT(y = x + e, F, k, t, T )\n2:\n3:\n4:\n5:\n6:\n7:\n8:\n9: end function\n\n\u02c6x[i+1] \u2190 (F (y \u2212 e[i]))h(k)\ne[i+1] \u2190 (y \u2212 F \u22121 \u02c6x[i])h(t)\n\nend for\nreturn \u02c6x[T +1]\n\nTheorem 2.2 (Main Theorem). Let F \u2208 Cn\u00d7n be an orthonormal matrix, such that each of its entries\n\u221a\nFij, |Fij| is O(1/\nn). Let \u02c6x = F x \u2208 Cn be (k, \u0001)-sparse, e \u2208 Rn be t-sparse and y = F \u22121 \u02c6x + e.\nLet \u02c6x[T ] = IHT(y, F, k, t, T ), for T = O(log((cid:107)x(cid:107) + (cid:107)e(cid:107))), then\n\n1. (cid:13)(cid:13)\u02c6x[T ] \u2212 \u02c6xh(k)\n2. (cid:13)(cid:13)\u02c6x[T ] \u2212 \u02c6xh(k)\n\n(cid:13)(cid:13)\u221e = O((cid:112)t/n \u00b7(cid:13)(cid:13)\u02c6xt(k)\n(cid:13)(cid:13) = O((cid:112)kt/n \u00b7(cid:13)(cid:13)\u02c6xt(k)\n\n(cid:13)(cid:13)) = O((cid:112)t/n \u00b7 (cid:107)\u0001\u02c6x(cid:107))\n(cid:13)(cid:13)) = O((cid:112)kt/n \u00b7 (cid:107)\u0001\u02c6x(cid:107))\n\n(In fact (1) implies (2).)\n\nNote the strong L\u221e - L2 guarantee that Theorem 2.2 gives us, with a tight dependence between t,\nthe L0 budget of the adversary and (\u0001, k), the sparsity parameters of the inputs. Also, our choice of\nrecovering just the top-k coordinates of \u02c6x, i.e. \u02c6xh(k), instead of all of \u02c6x is important. In the latter\n\ncase, no matter what t is, any solution we recover would incur an L2 error of \u2126((cid:13)(cid:13)\u02c6xt(k)\nerror that vanishes with n and is equal to o((cid:13)(cid:13)\u02c6xt(k)\n\nthe adversary corrupts only one coordinate (t = 1), while in our case, with t = o(n/k), we get an L2\n\n(cid:13)(cid:13)). (by Theorem 2.2)\n\n(cid:13)(cid:13)), even when\n\n2.2 Defending against L0 budgeted adversaries\n\nWe model images as approximately k-sparse vectors in the 2D-DCT domain. Using the results from\nSection 2.1, we can recover the top-k coef\ufb01cients in the face of a worst-case adversary with an L0\nbudget. To apply this to image classi\ufb01ers, we want to build a neural network to recognize images\nprojected to their top-k 2D-DCT coef\ufb01cients. This motivates the following framework for building\nclassi\ufb01ers that are robust to L0 adversaries:\n\n1. Train a neural network on images projected to their top-k 2D-DCT coef\ufb01cients. We refer to\n\nsuch projected images as \u201ccompressed images\u201d. 2\n\n2. On adversarial input images we run our L0-robust DCT algorithm to recover the top-k\n\ncoef\ufb01cients. Then transform the sparse image back to the original domain.\n\n3. Run the recovered/corrected image through the network.\n\nIn Section 2.1, we saw that recovering the top-k projection of an image gives better theoretical\nbounds than recovering the whole image. Hence it is important that the neural network is also trained\nto recognize compressed images. Training only on compressed images could possibly reduce the\naccuracy of neural networks, but as has been observed and used in practice (e.g. the JPEG and MPEG\ncompression algorithms), images contain most of their information in relatively few coef\ufb01cients.\nThis is validated on our datasets, where we incur a < 1% loss in accuracy on MNIST and < 2.5%\nfor Fashion-MNIST when training on compressed rather than original images. Note that one still\nneeds our correction algorithm for L0-corrupted images, since a naive compression of an adversarial\nexample (by taking its top-k projection) will not get classi\ufb01ed correctly by a neural network in general.\nFor example, if 1 pixel of the image is corrupted to have an extremely high magnitude, this would\npropagate into the top-k coef\ufb01cients of the DCT of the image too and the resulting compressed image\nwill be nowhere close to the original uncorrupted image. Our correction algorithm does not depend\n\n2Indeed the JPEG lossy-compression algorithm essentially does such a top-k projection!\n\n4\n\n\fon the magnitude of the corruptions, only their number (t). Hence both the training of the neural\nnetwork on compressed images and the correction algorithm are essential to our framework.\n\n2.3 Reverse Engineering Attacks\n\nIn step 2 of our framework, we use Theorem 2.2 to get strong guarantees on the distance \u03b4, between\nthe original compressed image x and the recovered image. Ideally, \u03b4 will be so small that no\nadversarial examples exist in the \u03b4-ball around x. This may not always be achieved in practice\nthough and there might exist a small number of adversarial examples that are in the \u03b4-ball from the\noriginal image. This leaves open the possibility, that an attacker could reverse engineer our algorithm\nand design an adversarial example that, when corrected, yields a (potentially different) adversarial\nexample inside the \u03b4-ball centered at x (although it is unclear how one would achieve this, as our\ndefense is non-differentiable). Such an attack can be prevented by initializing the IHT algorithm\nwith random vectors \u02c6x[1], e[1] (instead of all-zeros vectors) so that the resulting recovered image is\nnot deterministic. Since there are only a small number of adversarial examples in the \u03b4-ball, this\nrandomization would ensure that a reverse engineering attack would fail to hit an adversarial example,\nwith high probability. The guarantees of the IHT algorithm (Theorem 2.2) are independent of the\nstarting vectors and continue to hold with the randomized initialization. The IHT algorithm used for\nthe experiments reported in this work is not randomized, because current attacks were not designed\nto reverse engineer our defense, and the deterministic IHT itself gives good results.\n\n3 Proof of Main Result\n\nIn this section we prove Theorem 2.2, which says that Algorithm 1 converges to a good solution\n(one that is close to the true vector in the L\u221e norm) to Problem 2.1. Our proof uses techniques from\ncompressed sensing. The main problem studied in compressed sensing is reconstructing a signal x\nfrom few linear measurements. For arbitrary signals, this task is impossible, however the main idea\nof compressed sensing is that signals that are approximately sparse can be recovered using fewer than\nn linear measurements. This is modeled as,\nProblem 3.1. Given observations y = M x where x \u2208 Cn is an approximately sparse signal, and M\nis an m \u00d7 n matrix with m < n, recover the vector x.\nA main success in compressed sensing (CS) is that there are ef\ufb01cient algorithms [BD08, NT08] for\nProblem 3.1 when the matrix M satis\ufb01es a property called the RIP.\nDe\ufb01nition 3.2 (Restricted Isometry Property (RIP)). An m \u00d7 n matrix M has the (k, \u03b4)-restricted\nisometry property ((k, \u03b4)-RIP) if for all k-sparse vectors v we have,\n\n(1 \u2212 \u03b4) \u00b7 (cid:107)v(cid:107) \u2264 (cid:107)M v(cid:107) \u2264 (1 + \u03b4) \u00b7 (cid:107)v(cid:107) .\n\n(cid:21)\n\n(cid:20)\u02c6x\n\nRecall that in our main problem (Problem 2.1), we want to recover the top-k coef\ufb01cients of \u02c6x = F x,\nwhere \u02c6x is approximately k-sparse, given a corrupted vector y = x + e. The key idea is to notice that\nwe can write y as [F \u22121 I]\n, where \u02c6x is approximately k-sparse and e is t-sparse. This is almost\nthe same setup as Problem 3.1. In fact, we have more knowledge about the structure of sparsity of the\nvector\n\n\u2208 C2n that we want to recover.\n\n(cid:20)\u02c6x\n\n(cid:21)\n\ne\n\ne\n\n(cid:21)\n\n(cid:20)\u02c6x\n\nThe problem of recovery with structured sparsity, has been studied under the heading of Model-Based\nCS [BCDH10, HIS15, HIS14, BIS17] for structured sparsity models. In our setting we want to model\nvectors of the form\n, which have sparsity k in x and t in e. This motivates the following de\ufb01nition.\nDe\ufb01nition 3.3. Let Mk,t \u2286 C2n be the set of all vectors where the \ufb01rst n coordinates are k-sparse\nand the last n coordinates are t-sparse.3 Formally,\n\ne\n\n(cid:21)\n\n(cid:20)x\n\nMk,t := {v =\n\ne\n\n\u2208 C2n | x is k-sparse, e is t-sparse}.\n\n3Recall that Mk was the set of all k-sparse vectors in C. Note that Mk,t is different from Mk+t which is\n\nthe set of all vectors \u2208 C2n that are k + t-sparse.\n\n5\n\n\fWe say that a matrix M has the ((k, t), \u03b4)-RIP if for all vectors v \u2208 Mk,t,\n\n(1 \u2212 \u03b4) \u00b7 (cid:107)v(cid:107) \u2264 (cid:107)M v(cid:107) \u2264 (1 + \u03b4) \u00b7 (cid:107)v(cid:107) .\n\nModel-Based CS was \ufb01rst introduced in [BCDH10], for general sparsity models, and they proved\ntherein that Iterative Hard Thresholding (IHT) [BD08] indeed converges to a good solution to Problem\n3.1, given that the measurement matrix M satis\ufb01es RIP for the model. We use this Model-Based IHT\napproach to argue that Algorithm 1 \ufb01nds a good solution to Problem 2.1. In [BCDH10], they proved\nthat the IHT algorithm converges to an approximately correct solution, given that the measurement\nmatrix M satis\ufb01es the RIP for the model at hand. For us this translates to the following theorem.\nTheorem 3.4 ([BCDH10]). Let v \u2208 Mk,t and let y = M v + \u03b2, where M \u2208 Rn is a full-rank matrix\nand \u03b2 is a noise vector. Let v[T ] = IHT(y, M\u22121, k, t, T ). If M is ((3k, 3t), \u03b4)-RIP, with \u03b4 \u2264 0.1,\nthen\n\n(cid:13)(cid:13)(cid:13)v[T ] \u2212 v\n\n(cid:13)(cid:13)(cid:13) \u2264 2\u2212T \u00b7 (cid:107)v(cid:107) + 4 \u00b7 (cid:107)\u03b2(cid:107) .\n\nWe use the above theorem to prove that Algorithm 1 also converges to a good solution. Another key\ntechnique we use in our proofs is an uncertainty principle for speci\ufb01c structured matrices.\nLemma 3.5 (General Uncertainty Principle). Let F be a matrix in Cn\u00d7n such that each entry Fij\nhas |Fij| \u2264 \u03b1. Let x be a k-sparse vector in Cn and y = F x. Then (cid:107)y(cid:107)\u221e \u2264 \u03b1 \u00b7 \u221a\ncorresponding to Discrete Cosine and Sine Transforms and their 2D analogs we have \u03b1 = O((cid:112)1/n).\n\n\u221a\nNote that when F is the normalized Fourier matrix, this is the same as the folklore Fourier uncertainty\nprinciple with \u03b1 = 1/\nn. The proof of the above is indeed very similar to the Fourier case\nand is included in the full version of the paper4. One can check that for transformation matrices\n\nk \u00b7 (cid:107)x(cid:107).\n\nFinally to prove Theorem 2.2, we \ufb01rst prove that the matrix M = [F \u22121 I] has the RIP (Lemma 3.6\nbelow), which then combined with Theorem 3.4 and the uncertainty principle 3.5, \ufb01nishes the proof\nof Theorem 2.2. These proofs can be found in the full version of the paper.\nLemma 3.6. Let F be an orthonormal matrix, such that each entry Fij has |Fij| = O(1/\nn). Then\nthe matrix M = [F \u22121 I] \u2208 Cn\u00d72n satis\ufb01es ((3k, 3t), \u03b4)- RIP with \u03b4 \u2264 0.1, when t = O(n/k).\n, such that \u02c6x is 3k-sparse and e is at most 3t = O(n/k)-sparse,\nEquivalently, for all vectors v =\n\n(cid:20)\u02c6x\n\n(cid:21)\n\n\u221a\n\ne\n\n(1 \u2212 \u03b4) \u00b7 (cid:107)v(cid:107) \u2264 (cid:107)M v(cid:107) \u2264 (1 + \u03b4) \u00b7 (cid:107)v(cid:107) .\n\n4 Experiments\n\n4.1 Worst case adversaries\n\nWe evaluated our framework on three leading L0 attacks in the literature: the JSMA Attack of\nPapernot et al [PMJ+15], the L0 attack from Carlini and Wagner (CW) [CW16], and the adversarial\npatch from Brown et al [BMR+17]. We evaluated Algorithm 1 on the JSMA and CW attacks and\npresent these results in this section. We discuss experiments on the adversarial patch in Section 4.2.\nWe tested both JSMA and CW on two datasets: the MNIST handwritten digits [LeC98] and the\nFashion-MNIST [XRV17] dataset of clothing images. For each attack, we used randomly selected\ntargets. For both datasets we used a neural network composed of a convolutional layer (32 kernels of\n3x3), max pooling layer (2x2), convolutional layer (64 kernels of 3x3), max pooling layer (2x2), fully\nconnected layer (128 neurons) with dropout (rate = .25) and an output softmax layer (10 neurons).\nWe used the Adam optimizer with cross-entropy loss and ran it for 10 epochs over the training\ndatasets.\nFor each dataset, we trained our neural network only on images that were projected onto their\ntop-k 2D-DCT coef\ufb01cients. Here k is a parameter we tuned depending on the dataset (for MNIST\nk = 40 and for Fashion-MNIST k = 35). For each dataset, we \ufb01xed its corresponding k across all\nexperiments reported here.\n\n4https://arxiv.org/pdf/1812.05013.pdf\n\n6\n\n\fIn all of our evaluations there were three experimental conditions: \ufb01rst we ran uncorrupted images\nthrough the network to establish a baseline accuracy. Then we ran the L0 adversarial examples\nthrough the network. Finally, we ran our correction algorithm on the adversarial examples and ran\nthe results through the network. Example images of these conditions can be seen in Figure 1.\n\nFigure 1: Example experimental conditions. The left 3 images depict an original MNIST image,\nthe image corrupted by JSMA, and our corrected image. The right three images show an original\nFashion-MNIST image, the image corrupted by CW, and our corrected image.\n\nFor the JSMA, we ran an experiment for several different adversary noise budgets. For each budget,\nwe evaluated the network on the three experimental conditions. The accuracy vs L0 budget and loss\nvs L0 budget graphs can be seen in Figure 3 on the MNIST and Fashion-MNIST datasets. Exact\nvalues can be found in the full version of the paper. The results demonstrate that our correction\nalgorithm successfully defends against the JSMA attack. For example when the adversary corrupts\n30 bits, it is able to drop the accuracy of our network on the Fashion-MNIST dataset from 88.5% to\n24.8% but after running our recovery algorithm we get back up to 83.1%.\nThe CW attack works by \ufb01nding a minimal set of pixels that can be corrupted to fool the network.\nThis means that the adversary\u2019s budget will depend on the particular image being corrupted rather\nthan being \ufb01xed in advance. For this reason, we let the CW adversary choose how many pixels to\ncorrupt and allow ourselves to know its budget for each image. Note that the locations and magnitudes\nof the noise are unknown to us. Since the budget varies across images, a plot like Figure 3 does not\nmake sense and we instead report the overall accuracy and loss of our correction algorithm in Table 2.\nAgain our correction algorithm is effective against CW. For example on the Fashion-MNIST dataset\nthe network\u2019s test accuracy on original images was 87.8%. The CW attack was successful and the\nnetwork mislabeled every adversarial example. After running our correction algorithm, the accuracy\nreturns to 85.7%.\n\nAccuracy\n\nLoss\n\n99.0\n0.002\n\n0.0\n0.115\n\n72.8\n0.095\n\n87.8\n0.035\n\nMNIST Adversarial Corrected\n\nF-MNIST Adversarial Corrected\n\n0.0\n0.140\n\n85.7\n0.040\n\nFigure 2: Experimental results for our algorithm on the CW attack for the MNIST and Fashion-MNIST\ndatasets. Columns 2-4 show results for MNIST data and columns 5 to 7 show Fashion-MNIST.\n\nAs images grow larger they become less sparse in Fourier bases but natural images are still block-wise\nsparse. In such cases our algorithm could be modi\ufb01ed to correct images block by block, in which\ncase the network would need to be trained on images compressed block by block (e.g. as in JPEG).\nThis would work with the mild assumption that the corrupted locations are well-distributed across\nblocks because then our recovery result could be applied to each block separately. Within each block\nthe corrupted locations could still be anywhere and of any magnitude. For images that are too large to\nbe sparse in Fourier bases, the block-wise approach may fail in the case where most of the L0 noise\nresides in few blocks because in these blocks there will be too many corrupted coordinates to recover.\nIn the next section we study the extreme case where all of the error is concentrated contiguously. We\nshow that even in this extreme case our framework for L0-robust sparse transformations can be used\nto guard against contiguous noise attacks even in large images.\n\n7\n\n\fFigure 3: Classi\ufb01cation accuracy and loss for JSMA on MNIST (left) and Fashion-MNIST (right).\nBlue lines show the performance of the network on original images (and hence does not change with\nthe number of coordinates corrupted). Red lines show the performance of the network on uncorrected\nadversarial examples and green lines show the performance of the network on images that were\ncorrected by Algorithm 1\n\n4.2 Adversarial patch\n\nIn [BMR+17], the authors introduce a method for generating adversarial patches. These are targeted\nattacks in the form of circular images that get overlayed on input images. They showed that their\npatch effectively fools leading image classi\ufb01ers into mislabeling patched images.\nNotice that the adversarial patch is an example of L0 noise and so \ufb01ts within our framework. The\npatch attack is only successful when the patch is suf\ufb01ciently large (~ 80 pixels in diameter for\n224 \u00d7 224 images), which is larger than our algorithm can tolerate. Also images of this size are less\nsparse in the frequency domain and as discussed above, our approach may not be able to correct\ncontiguous noise on such images. Similarly we cannot train the neural networks on compressed\nimages as that would lead to non trivial loss as the images are less sparse. So in this section we use a\nnetwork that was pretrained on original ImageNet images.\nWe are able to use the contiguity of the noise with the mild sparsity of large images by using the\nPatchwise IHT Algorithm to defend against the patch attack. Since image recovery is not possible\nin this setting, our algorithm instead focuses on detecting the location of the contiguous noise. We\ndetect the noise by searching over contiguous blocks in the image and running Algorithm 1 on each\nblock, where we project e only to the block rather than top-t coordinates. Finally we \ufb01nd the block\nfor which the remaining image (y \u2212 e) is sparsest in the Fourier domain. We call this Patchwise IHT\nand a formal description of the algorithm is given in the full version of the paper. Note that for this\nparticular set of adversarial examples there may be other ways to detect the patch with pre-processing.\nWe do not do any such optimizations that are particular to the adversary and Patchwise IHT is based\nonly on the mild sparsity of the original images.\nWe took 700 random images from ImageNet and for classi\ufb01cation we used pretrained ResNet-50\nnetwork [HZRS15]. We ran each image through the network in our three experimental conditions,\ndepicted in Figure 4.\nFigure 5 shows the results of our experiment. The patch was a successful attack (Top-5 accuracy\ndropped from 92.3% to 63.9% and Top-1 from 76.4% to 12.0%). After correcting, Top-5 accuracy\njumped to 80.4% (Top-1: 59.7%). Only 1.0% of the original images were labeled as \u2018toaster\u2019 (none\nin the Top-1), but \u2018toaster\u2019 was in the Top-5 in 99.0% of the patched images with 85.7% being the\n\n8\n\n\fFigure 4: Example of the three image conditions in the patch experiment. Left is the original image\nclassi\ufb01ed as \u2018banana\u2019 with probability .94. The middle, with the adversarial patch overlayed, is\nclassi\ufb01ed as \u2018toaster\u2019 with probability .93. The right is the image after our Patchwise IHT algorithm,\nwhich gets classi\ufb01ed as \u2018banana\u2019 with probability .94.\n\nFigure 5: Experimental results for Patchwise IHT algorithm. The left plot depicts the accuracy of the\nnetwork in our three experimental conditions. The right plot shows the percentage of images labeled\nas \u2018toaster\u2019 under the same three conditions.\n\nmost con\ufb01dent label. Notably, very few corrected images were labeled as \u2018toaster\u2019 (Top-5: 7.4%,\nTop-1: 4.7%).\n\n5 Acknowledgements\n\nMitali Bafna was supported by NSF Grant CCF 1715187. Jack Murtagh was supported by NSF grant\nCNS-1565387. Nikhil Vyas was supported by an Akamai Presidential Fellowship and NSF Grant\nCCF-1552651. We would like to thank Yaron Singer and Adam Breuer for helpful feedback and\nencouragement in the early stages of this work. We also want to thank Thibaut Horel for valuable\ncomments on the manuscript. Thanks also to the reviewers for helpful remarks.\n\n9\n\n\fReferences\n[BCDH10] Richard G. Baraniuk, Volkan Cevher, Marco F. Duarte, and Chinmay Hegde. Model-\nbased compressive sensing. IEEE Trans. Information Theory, 56(4):1982\u20132001, 2010.\n\n[BD08] Thomas Blumensath and Mike E. Davies. Iterative hard thresholding for compressed\n\nsensing. CoRR, abs/0805.0510, 2008.\n\n[BIS17] Arturs Backurs, Piotr Indyk, and Ludwig Schmidt. Better approximations for tree\nsparsity in nearly-linear time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM\nSymposium on Discrete Algorithms, SODA 2017, Barcelona, Spain, Hotel Porta Fira,\nJanuary 16-19, pages 2215\u20132229, 2017.\n\n[BMR+17] Tom B. Brown, Dandelion Man\u00e9, Aurko Roy, Mart\u00edn Abadi, and Justin Gilmer. Adver-\n\nsarial patch. CoRR, abs/1712.09665, 2017.\n\n[CLMW11] Emmanuel J. Cand\u00e8s, Xiaodong Li, Yi Ma, and John Wright. Robust principal compo-\n\nnent analysis? J. ACM, 58(3):11:1\u201311:37, 2011.\n\n[CRT06] Emmanuel J. Cand\u00e8s, Justin K. Romberg, and Terence Tao. Robust uncertainty princi-\nples: exact signal reconstruction from highly incomplete frequency information. IEEE\nTrans. Information Theory, 52(2):489\u2013509, 2006.\n\n[CW16] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural\n\nnetworks. CoRR, abs/1608.04644, 2016.\n\n[EEF+17] Ivan Evtimov, Kevin Eykholt, Earlence Fernandes, Tadayoshi Kohno, Bo Li, Atul\nPrakash, Amir Rahmati, and Dawn Song. Robust physical-world attacks on machine\nlearning models. CoRR, abs/1707.08945, 2017.\n\n[GSS15] Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing\nadversarial examples. In International Conference on Learning Representations, 2015.\n\n[HIKP12a] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Nearly optimal sparse\nfourier transform. In Proceedings of the 44th Symposium on Theory of Computing\nConference, STOC 2012, New York, NY, USA, May 19 - 22, 2012, pages 563\u2013578, 2012.\n\n[HIKP12b] Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price. Simple and practical\nalgorithm for sparse fourier transform. In Proceedings of the Twenty-Third Annual\nACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January\n17-19, 2012, pages 1183\u20131194, 2012.\n\n[HIS14] Chinmay Hegde, Piotr Indyk, and Ludwig Schmidt. Nearly linear-time model-based\ncompressive sensing. In Automata, Languages, and Programming - 41st International\nColloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I,\npages 588\u2013599, 2014.\n\n[HIS15] Chinmay Hegde, Piotr Indyk, and Ludwig Schmidt. Approximation algorithms for\nmodel-based compressive sensing. IEEE Trans. Information Theory, 61(9):5129\u20135147,\n2015.\n\n[HZRS15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for\n\nimage recognition. CoRR, abs/1512.03385, 2015.\n\n[IKP14] Piotr Indyk, Michael Kapralov, and Eric Price. (nearly) sample-optimal sparse fourier\nIn Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on\ntransform.\nDiscrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages\n480\u2013499, 2014.\n\n[LeC98] Yann LeCun.\n\nThe mnist database of handwritten digits.\n\ncom/exdb/mnist/, 1998.\n\nhttp://yann.\n\nlecun.\n\n[MFF15] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool:\na simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599,\n2015.\n\n10\n\n\f[MMS+17] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and\nAdrian Vladu. Towards deep learning models resistant to adversarial attacks. CoRR,\nabs/1706.06083, 2017.\n\n[NT08] Deanna Needell and Joel A Tropp. Cosamp: Iterative signal recovery from incomplete\n\nand inaccurate samples. arXiv preprint arXiv:0803.2392, 2008.\n\n[PMJ+15] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik,\nand Ananthram Swami. The limitations of deep learning in adversarial settings. CoRR,\nabs/1511.07528, 2015.\n\n[PMW+16] Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami.\nDistillation as a defense to adversarial perturbations against deep neural networks. In\nIEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26,\n2016, pages 582\u2013597, 2016.\n\n[SBBR16] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. Accessorize to\na crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of\nthe 2016 ACM SIGSAC Conference on Computer and Communications Security, pages\n1528\u20131540. ACM, 2016.\n\n[SZS+13] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,\nIan J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR,\nabs/1312.6199, 2013.\n\n[TKP+17] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel.\nEnsemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204,\n2017.\n\n[XRV17] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for\n\nbenchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.\n\n11\n\n\f", "award": [], "sourceid": 6494, "authors": [{"given_name": "Mitali", "family_name": "Bafna", "institution": "Harvard University"}, {"given_name": "Jack", "family_name": "Murtagh", "institution": "Harvard University"}, {"given_name": "Nikhil", "family_name": "Vyas", "institution": "MIT"}]}