{"title": "Semidefinite relaxations for certifying robustness to adversarial examples", "book": "Advances in Neural Information Processing Systems", "page_first": 10877, "page_last": 10887, "abstract": "Despite their impressive performance on diverse tasks, neural networks fail catastrophically in the presence of adversarial inputs\u2014imperceptibly but adversarially perturbed versions of natural inputs. We have witnessed an arms race between defenders who attempt to train robust networks and attackers who try to construct adversarial examples. One promise of ending the arms race is developing certified defenses, ones which are provably robust against all attackers in some family. These certified defenses are based on convex relaxations which construct an upper bound on the worst case loss over all attackers in the family. Previous relaxations are loose on networks that are not trained against the respective relaxation. In this paper, we propose a new semidefinite relaxation for certifying robustness that applies to arbitrary ReLU networks. We show that our proposed relaxation is tighter than previous relaxations and produces meaningful robustness guarantees on three different foreign networks whose training objectives are agnostic to our proposed relaxation.", "full_text": "Semide\ufb01nite relaxations\n\nfor certifying robustness to adversarial examples\n\nAditi Raghunathan, Jacob Steinhardt and Percy Liang\n\nStanford University\n\n{aditir, jsteinhardt, pliang}@cs.stanford.edu\n\nAbstract\n\nDespite their impressive performance on diverse tasks, neural networks fail catas-\ntrophically in the presence of adversarial inputs\u2014imperceptibly but adversarially\nperturbed versions of natural inputs. We have witnessed an arms race between\ndefenders who attempt to train robust networks and attackers who try to construct\nadversarial examples. One promise of ending the arms race is developing certi\ufb01ed\ndefenses, ones which are provably robust against all attackers in some family. These\ncerti\ufb01ed defenses are based on convex relaxations which construct an upper bound\non the worst case loss over all attackers in the family. Previous relaxations are loose\non networks that are not trained against the respective relaxation. In this paper,\nwe propose a new semide\ufb01nite relaxation for certifying robustness that applies to\narbitrary ReLU networks. We show that our proposed relaxation is tighter than pre-\nvious relaxations and produces meaningful robustness guarantees on three different\nforeign networks whose training objectives are agnostic to our proposed relaxation.\n\n1\n\nIntroduction\n\nMany state-of-the-art classi\ufb01ers have been shown to fail catastrophically in the presence of small\nimperceptible but adversarial perturbations. Since the discovery of such adversarial examples [25],\nnumerous defenses have been proposed in attempt to build classi\ufb01ers that are robust to adversarial\nexamples. However, defenses are routinely broken by new attackers who adapt to the proposed defense,\nleading to an arms race. For example, distillation was proposed [22] but shown to be ineffective [5].\nA proposed defense based on transformations of test inputs [20] was broken in only \ufb01ve days [2].\nRecently, seven defenses published at ICLR 2018 fell to the attacks of Athalye et al. [3].\nA recent body of work aims to break this arms race by training classi\ufb01ers that are certi\ufb01ably robust to\nall attacks within a \ufb01xed attack model [13, 23, 29, 8]. These approaches construct a convex relaxation\nfor computing an upper bound on the worst-case loss over all valid attacks\u2014this upper bound serves\nas a certi\ufb01cate of robustness. In this work, we propose a new convex relaxation based on semide\ufb01nite\nprogramming (SDP) that is signi\ufb01cantly tighter than previous relaxations based on linear programming\n(LP) [29, 8, 9] and handles arbitrary number of layers (unlike the formulation in [23], which was\nrestricted to two). We summarize the properties of our relaxation as follows:\n\n1. Our new SDP relaxation reasons jointly about intermediate activations and captures interactions\nthat the LP relaxation cannot. Theoretically, we prove that there is a square root dimension gap between\nthe LP relaxation and our proposed SDP relaxation for neural networks with random weights.\n\n2. Empirically, the tightness of our proposed relaxation allows us to obtain tight certi\ufb01cates for\nforeign networks\u2014networks that were not speci\ufb01cally trained towards the certi\ufb01cation procedure.\nFor instance, adversarial training against the Projected Gradient Descent (PGD) attack [21] has led\nto networks that are \u201cempirically\u201d robust against known attacks, but which have only been certi\ufb01ed\nagainst small perturbations (e.g. \u0001 = 0.05 in the (cid:96)\u221e-norm for the MNIST dataset [9]). We use our SDP\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fto provide the \ufb01rst non-trivial certi\ufb01cate of robustness for a moderate-size adversarially-trained model\non MNIST at \u0001 = 0.1.\n\n3. Furthermore, training a network to minimize the optimum of particular relaxation produces\nnetworks for which the respective relaxation provides good robustness certi\ufb01cates [23]. Notably and\nsurprisingly, on such networks, our relaxation provides tighter certi\ufb01cates than even the relaxation\nthat was optimized for during training.\nRelated work. Certi\ufb01cation methods which evaluate the performance of a given network against all\npossible attacks roughly fall into two categories. The \ufb01rst category leverages convex optimization\nand our work adds to this family. Convex relaxations are useful for various reasons. Wong and\nKolter [29], Raghunathan et al. [23] exploited the theory of duality to train certi\ufb01ably robust networks\non MNIST. In recent work, Dvijotham et al. [8], Wong et al. [30] extended this approach to train\nbigger networks with improved certi\ufb01ed error and on larger datasets. Solving a convex relaxation for\ncerti\ufb01cation typically involves standard techniques from convex optimization. This enables scalable\ncerti\ufb01cation by providing valid upper bounds at every step in the optimization [9].\nThe second category draws techniques from formal veri\ufb01cation such as SMT [16, 17, 7, 14], which\naim to provide tight certi\ufb01cates for any network using discrete optimization. These techniques, while\nproviding tight certi\ufb01cates on arbitrary networks, are often very slow and worst-case exponential in\nnetwork size. In prior work, certi\ufb01cation would take up to several hours or longer for a single example\neven for a small network with around 100 hidden units [7, 16]. However, in concurrent work, Tjeng\nand Tedrake [26] impressively scaled up exact veri\ufb01cation through careful preprocessing and ef\ufb01cient\npruning that dramatically reduces the search space. In particular, they concurrently obtain non-trivial\ncerti\ufb01cates of robustness on a moderately-sized network trained using the adversarial training objective\nof [21] on MNIST at perturbation level \u0001 = 0.1.\n\n2 Setup\n\nOur main contribution is a semide\ufb01nite relaxation of an optimization objective that arises in certi\ufb01cation\nof neural networks against adversarial examples. In this section, we set up relevant notation and present\nthe optimization objective that will be the focus of the rest of the paper.\nNotation. For a vector z \u2208 Rn, we use zi to denote the ith coordinate of z. For a matrix Z \u2208 Rm\u00d7n,\nZi\u2208Rn denotes the ith row. For any function f :R\u2192R and a vector z\u2208Rn, f (z) is a vector in Rn with\n(f (z))i = f (zi), e.g., z2\u2208Rn represents the function that squares each component. For z,y\u2208Rn, z(cid:23) y\ndenotes that zi\u2265 yi for i = 1,2,...,n. We use z1(cid:12)z2 to represent the elementwise product of the vectors\nz1 and z2. We use B\u0001(\u00afx) def= {x|(cid:107)x\u2212 \u00afx(cid:107)\u221e\u2264 \u0001} to denote the (cid:96)\u221e ball around \u00afx. When it is necessary to\ndistinguish vectors from scalars (in Section 4.1), we use (cid:126)x to represent a vector in Rn that is semantically\nassociated with the scalar x. Finally, we denote the vector of all zeros by 0 and the vector of all ones by 1.\nMulti-layer ReLU networks for classi\ufb01cation. We focus on multi-layer neural networks with ReLU\nactivations. A network f with L hidden layers is de\ufb01ned as follows: let x0 \u2208 Rd denote the input\nand x1, ... , xL denote the activation vectors at the intermediate layers. Suppose the network has\nmi units in layer i. xi is related to xi\u22121 as xi = ReLU(W i\u22121xi\u22121) = max(W i\u22121xi\u22121,0), where\nW i\u22121\u2208Rmi\u00d7mi\u22121 are the weights of the network. For simplicity of exposition, we omit the bias terms\nassociated with the activations (but consider them in the experiments). We are interested in neural\nnetworks for classi\ufb01cation where we classify an input into one of k classes. The output of the network\nis f (x0)\u2208 Rk such that f (x0)j = c(cid:62)\nj xL represents the score of class j. The class label y assigned to\nthe input x0 is the class with the highest score: y = argmaxj=1,...,kf (x0)j.\nAttack model and certi\ufb01cate of robustness. We study classi\ufb01cation in the presence of an attacker A\nthat takes a clean test input \u00afx\u2208Rd and returns an adversarially perturbed input A(\u00afx). In this work, we\nfocus on attackers that are bounded in the (cid:96)\u221e norm: A(\u00afx)\u2208 B\u0001(\u00afx) for some \ufb01xed \u0001 > 0. The attacker\nis successful on a clean input label pair (\u00afx,\u00afy) if f (A(\u00afx))(cid:54)= \u00afy, or equivalently if f (A(\u00afx))y > f (x0)\u00afy\nfor some y(cid:54)= \u00afy.\nWe are interested in bounding the error against the worst-case attack (we assume the attacker has full\ny(\u00afx, \u00afy) denote the worst-case margin of an incorrect class\nknowledge of the neural network). Let (cid:96)(cid:63)\ny that can be achieved in the attack model:\n\n(f (A(x))y\u2212f (A(x))\u00afy).\n\n(1)\n\ny(\u00afx,\u00afy) def= max\n(cid:96)(cid:63)\n\nA(x)\u2208B\u0001(\u00afx)\n\n2\n\n\fy(\u00afx,\u00afy) < 0 for all y(cid:54)= \u00afy. Computing (cid:96)(cid:63)\n\nA network is certi\ufb01ably robust on (\u00afx,\u00afy) if (cid:96)(cid:63)\ny(\u00afx,\u00afy) for a neural\nnetwork involves solving a non-convex optimization problem, which is intractable in general. In this\nwork, we study convex relaxations to ef\ufb01ciently compute an upper bound Ly(\u00afx,\u00afy)\u2265 (cid:96)(cid:63)\ny(\u00afx,\u00afy). When\nLy(\u00afx,\u00afy) < 0, we have a certi\ufb01cate of robustness of the network on input (\u00afx,\u00afy).\nOptimization objective. For a \ufb01xed class y, the worst-case margin (cid:96)(cid:63)\ny(\u00afx, \u00afy) of a neural network f\nwith weights W can be expressed as the following optimization problem. The decision variable is\nthe input A(x), which we denote here by x0 for notational convenience. The quantity we are interested\nin maximizing is f (x0)y \u2212 f (x0)\u00afy = (cy \u2212 c\u00afy)(cid:62)xL, where xL is the \ufb01nal layer activation. We set\nup the optimization problem by jointly optimizing over all the activations x0,x1,x2,...xL, imposing\nconsistency constraints dictated by the neural network, and restricting the input x0 to be within the\nattack model. Formally,\ny(\u00afx,\u00afy) = max\n(cid:96)(cid:63)\nx0,...,xL\nsubject to xi =ReLU(W i\u22121xi\u22121) for i = 1,2,...,L\n\n(cy\u2212c\u00afy)(cid:62)xL\n\n(2)\n\n(Neural network constraints)\n(Attack model constraints)\n\n(cid:107)x0\n\nj\u2212 \u00afxj(cid:107)\u221e\u2264 \u0001 for j = 1,2,...,d\n\nComputing (cid:96)(cid:63)\nthis objective to a convex semide\ufb01nite program and discuss some properties of this relaxation.\n\ny is computationally hard in general. In the following sections, we present how to relax\n\n3 Semide\ufb01nite relaxations\n\nIn this section, we present our approach to obtaining a computationally tractable upper bound to the\nsolution of the optimization problem described in (2).\nKey insight. The source of the non-convexity in (2) is the ReLU constraints. Consider a ReLU\nconstraint of the form z = max(x,0). The key observation is that this constraint can be expressed\nequivalently as the following three linear and quadratic constraints between z and x: (i) z(z\u2212x) = 0,\n(ii) z\u2265 x, and (iii) z\u2265 0. Constraint (i) ensures that z is equal to either x or 0 and constraints (ii) and\n(iii) together then ensure that z is at least as large as both. This reformulation allows us to replace\nthe non-linear ReLU constraints of the optimization problem in 2 with linear and quadratic constraints,\nturning it into a quadratically constrained quadratic program (QCQP). We \ufb01rst show how this QCQP\ncan be relaxed to a semide\ufb01nite program (SDP) for networks with one hidden layer. The relaxation\nfor multiple layers is a straightforward extension and is presented in Section 5.\n\n3.1 Relaxation for one hidden layer\n\nConsider a neural network with one hidden layer containing m nodes. Let the input be denoted\nby x \u2208 Rd. The hidden layer activations are denoted by z \u2208 Rm and related to the input x as\nz =ReLU(W x) for weights W \u2208Rm\u00d7d.\nSuppose that we have lower and upper bounds l,u \u2208 Rd on the inputs such that lj \u2264 xj \u2264 uj. For\nexample, in the (cid:96)\u221e attack model we have l = \u00afx\u2212\u00011 and u = \u00afx+\u00011 where \u00afx is the clean input. For the\nmulti-layer case, we discuss how to obtain these bounds for the intermediate activations in Section 5.2.\nWe are interested in optimizing a linear function of the hidden layer: f (x) = c(cid:62)z, where c\u2208Rm. For\ninstance, while computing the worst case margin of an incorrect label y over true label \u00afy, c = cy\u2212c\u00afy.\nWe use the key insight that the ReLU constraints can be written as linear and quadratic constraints,\nallowing us to embed these constraints into a QCQP. We can also express the input constraint\nlj \u2264 xj \u2264 uj as a quadratic constraint, which will be useful later. In particular, lj \u2264 xj \u2264 uj if and only\nif (xj\u2212lj)(xj\u2212uj)\u2264 0, thereby yielding the quadratic constraint x2\nj \u2264 (lj +uj)xj\u2212ljuj. This gives\nus the \ufb01nal QCQP below:\n\n(cid:96)(cid:63)\ny(\u00afx,\u00afy) = fQCQP = max\nx,z\n\nc(cid:62)z\n\ns.t. z\u2265 0, z\u2265 W x, z2 = z(cid:12)(W x)\n\nx2\u2264 (l+u)(cid:12)x\u2212l(cid:12)u\n\n(3)\n\n(ReLU constraints)\n(Input constraints)\n\nWe now relax the non-convex QCQP (3) to a convex SDP. The basic idea is to introduce a new set\nof variables representing all linear and quadratic monomials in x and z; the constraints in (3) can then\nbe written as linear functions of these new variables.\n\n3\n\n\f(a)\n\n(b)\n\nFigure 1: (a) Plot showing the feasible regions for the vectors (cid:126)x (green) and (cid:126)z (red). The input constraints\nrestrict (cid:126)x to lie within the green circle. The ReLU constraint (cid:126)z\u22a5(cid:126)z\u2212(cid:126)x forces (cid:126)z to lie on the dashed red\ncircle and the constraint (cid:126)z\u00b7(cid:126)e\u2265(cid:126)x\u00b7(cid:126)e restricts it to the solid arc. (b) For a \ufb01xed value of input (cid:126)x\u00b7(cid:126)e, when the\nangle made by (cid:126)x with (cid:126)e increases, the arc spanned by (cid:126)z has a larger projection on (cid:126)e and leading to a looser\nrelaxation. Secondly, for a \ufb01xed value of (cid:126)x\u00b7(cid:126)e, as \u03b8 increases, the norm (cid:107)(cid:126)x(cid:107) increases and vice versa.\n\nIn particular, let v def=\n\n. We de\ufb01ne a matrix P def= vv(cid:62) and use symbolic indexing P [\u00b7] to index\n\n(cid:34) 1\n\nx\nz\n\n(cid:35)\n\uf8ee\uf8f0 P [1] P [x(cid:62)]\n\nP [z(cid:62)]\nP [x] P [xx(cid:62)] P [xz(cid:62)]\nP [z] P [zx(cid:62)] P [zz(cid:62)]\n\n\uf8f9\uf8fb.\n\nthe elements of P , i.e P =\n\nfSDP = max\n\nP\n\nc(cid:62)P [z]\n\nThe SDP relaxation of (3) can be written in terms of the matrix P as follows.\n\ns.t P [z]\u2265 0, P [z]\u2265 W P [x], diag(P [zz(cid:62)]) = diag(W P [xz(cid:62)])\n\ndiag(P [xx(cid:62)])\u2264 (l+u)(cid:12)P [x]\u2212l(cid:12)u\nP [1] = 1, P (cid:23) 0\n\n(4)\n\n(ReLU constraints)\n(Input constraints)\n(Matrix constraints).\n\nWhen the matrix P admits a rank-one factorization vv(cid:62), the entries of the matrix P exactly correspond\nto linear and quadratic monomials in x and z. In this case, the ReLU and input constraints of the SDP\nare identical to the constraints of the QCQP. However, this rank-one constraint on P would make the\nfeasible set non-convex. We instead consider the relaxed constraint on P that allows factorizations of\nthe form P = V V (cid:62), where V can be full rank. Equivalently, we consider the set of matrices P such that\nP (cid:23) 0. This set is convex and is a superset of the original non-convex set. Therefore, the above SDP\nis a relaxation of the QCQP in 3 with fSDP\u2265 fQCQP, providing an upper bound on (cid:96)(cid:63)\ny(\u00afx,\u00afy) that could\nserve as a certi\ufb01cate of robustness. We note that this SDP relaxation is different from the one proposed\nin [23], which applies only to neural networks with one hidden layer. In contrast, the construction\npresented here naturally generalizes to multiple layers, as we show in Section 5. Moreover, we will see\nin Section 6 that our new relaxation often yields substantially tighter bounds than the approach of [23].\n\n4 Analysis of the relaxation\n\nBefore extending the SDP relaxation de\ufb01ned in (4) to multiple layers, we will provide some geometric\nintuition for the SDP relaxation.\n\n4.1 Geometric interpretation\n\nFirst consider the simple case where m = d = 1 and W = c = 1, so that the problem is to maximize\nz subject to z =ReLU(x) and l\u2264 x\u2264 u. In this case, the SDP relaxation of (4) is as follows:\n\nfSDP = max\n\nP\n\nP [z]\n\n(5)\n\ns.t P [z]\u2265 0, P [z]\u2265 P [x], P [z2] = P [xz]\n\nP [x2]\u2264 (l+u)P [x]\u2212lu\nP [1] = 1, P (cid:23) 0\n\n(ReLU constraints)\n(Input constraints)\n(Matrix constraints).\n\n4\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 2: (a) Visualization of the LP and SDP for a single ReLU unit with input x and output z. The LP\nis bounded by the line joining the extreme points. (b) Let z1 =ReLU(x1 +x2) and z2 =ReLU(x1\u2212x2).\nOn \ufb01xing the inputs x1 and x2 (both equal to 0.5\u0001), we plot the feasible activations of the LP and SDP\nrelaxation. The LP feasible set is a simple product over the independent sets, while the SDP enforces\njoint constraints to obtain a more complex convex set. (c) We plot the set (z1,z2) across all feasible\ninputs (x1,x2) for the same setup as (b) and the objective of maximizing z1 +z2. We see that fSDP < fLP.\n\n(cid:35)\n\n(cid:34) \u2190(cid:126)e\u2192\n\n\u2190(cid:126)x\u2192\n\u2190(cid:126)z\u2192\n\nThe SDP operates on a PSD matrix P and imposes linear constraints on the entries of the matrix. Since\nfeasible P can be written as V V (cid:62), the entries of P can be thought of as dot products between vectors,\n\nand constraints as operating on these dot products. For the simple example above, V def=\nfor\nsome vectors (cid:126)e,(cid:126)x,(cid:126)z\u2208R3. The constraint P [1] = 1, for example, imposes (cid:126)e\u00b7(cid:126)e = 1 i.e., (cid:126)e is a unit vector.\nThe linear monomials P [x],P [z] correspond to projections on this unit vector, (cid:126)x\u00b7(cid:126)e and (cid:126)z\u00b7(cid:126)e. Finally,\nthe quadratic monomials P [xz], P [x2] and P [z2] correspond to (cid:126)x\u00b7(cid:126)z, (cid:107)(cid:126)x(cid:107)2 and (cid:107)(cid:126)z(cid:107)2 respectively. We\nnow reason about the input and ReLU constraints and visualize the geometry (see Figure 1a).\nInput constraints. The input constraint P [x2] \u2264 (l + u)P [x] \u2212 lu equivalently imposes (cid:107)(cid:126)x(cid:107)2 \u2264\n(l + u)((cid:126)x\u00b7(cid:126)e)\u2212 lu. Geometrically, this constrains vector (cid:126)x on a sphere with center at 1\n2 (l + u)(cid:126)e and\n2 (l\u2212u). Notice that this implicitly bounds the norm of (cid:126)x. This is illustrated in Figure 1a where\nradius 1\nthe green circle represents the space of feasible vectors (cid:126)x, projected onto the plane containing (cid:126)e and (cid:126)x.\nReLU constraints. The constraint on the quadratic terms (P [z2] = P [zx]) is the core of the SDP. It\nsays that the vector (cid:126)z is perpendicular to (cid:126)z\u2212(cid:126)x. We can visualize (cid:126)z on the plane containing (cid:126)x and (cid:126)e\nin Figure 1a; the component of (cid:126)z perpendicular to this plane is not relevant to the SDP, because it\u2019s\nneither constrained nor appears in the objective. The feasible (cid:126)z trace out a circle with 1\n2 (cid:126)x as the center\n(because the angle inscribed in a semicircle is a right angle). The linear constraints restrict (cid:126)z to the\narc that has a larger projection on (cid:126)e than (cid:126)x, and is positive.\nRemarks. This geometric picture allows us to make the following important observation about the\n\nobjective value max(cid:0)(cid:126)z\u00b7(cid:126)e(cid:1) of the SDP relaxation. The largest value that (cid:126)z\u00b7(cid:126)e can take depends on the\n\nangle \u03b8 that (cid:126)x makes with (cid:126)e. In particular, as \u03b8 decreases, the relaxation becomes tighter and as the\nvector deviates from (cid:126)e, the relaxation gets looser. Figure 1b provides an illustration. For large \u03b8, the\nradius of the circle that (cid:126)z traces increases, allowing (cid:126)z\u00b7(cid:126)e to take large values.\nThat leads to the natural question: For a \ufb01xed input value (cid:126)x\u00b7(cid:126)e (corresponding to x), what controls\n\u03b8? Since (cid:126)x\u00b7(cid:126)e =(cid:107)(cid:126)x(cid:107)cos\u03b8, as the norm of (cid:126)x increases, \u03b8 increases. Hence a constraint that forces (cid:107)(cid:126)x(cid:107)\nto be close to (cid:126)x\u00b7(cid:126)e will cause the output (cid:126)z\u00b7(cid:126)e to take smaller values. Porting this intuition into the matrix\ninterpretation, this suggests that constraints forcing P [x2] =(cid:107)(cid:126)x(cid:107)2 to be small lead to tighter relaxations.\n\n4.2 Comparison with linear programming relaxation\n\nIn contrast to the SDP, another approach is to relax the objective and constraints in (2) to a linear\nprogram (LP) [18, 10, 9]. As we will see below, a crucial difference from the LP is that our SDP can\n\u201creason jointly\u201d about different activations of the network in a stronger way than the LP can. We brie\ufb02y\nreview the LP approach and then elaborate on this difference.\nReview of the LP relaxation. We present the LP relaxation for a neural network with one hidden\nlayer, where the hidden layer activations z \u2208 Rm are related to the input x\u2208 Rd as z = ReLU(W x).\nAs before, we have bounds l,u\u2208Rd such that l\u2264 x\u2264 u.\n\n5\n\n\fIn the LP relaxation, we replace the ReLU constraints at hidden node j with a convex outer envelope as\nillustrated in Figure 2a. The envelope is lower bounded by the linear constraints z\u2265 W x and z\u2265 0. In\norder to construct the upper bounding linear constraints, we compute the extreme points s = min\nW x\nl\u2264x\u2264u\nand t = max\nW x and construct lines that connect (s, ReLU(s)) and (t, ReLU(t)). The \ufb01nal LP\nl\u2264x\u2264u\nfor the neural network is then written by constructing the convex envelopes for each ReLU unit and\noptimizing over this set as follows:\n\nfLP = max c(cid:62)z\ns.t z\u2265 0, z\u2265 W x,\n\nz\u2264(cid:16)ReLU(t)\u2212ReLU(s)\n\nt\u2212s\n\nl\u2264 x\u2264 u\n\n(cid:17)\u00b7(W x\u2212s)+ReLU(s),\n\n(6)\n\n(Lower bound lines)\n\n(Upper bound lines)\n\n(Input constraints).\n\nThe extreme points s and t are the optima of a linear transformation (by W ) over a box in Rd and can\nbe computed using interval arithmetic. In the (cid:96)\u221e attack model where l = \u00afx\u2212\u00011 and u = \u00afx+\u00011, we\nhave sj = W \u00afx\u2212\u0001(cid:107)Wj(cid:107)1 and tj = W \u00afx+\u0001(cid:107)Wj(cid:107)1 for j = 1,2,...m.\nFrom Figure 2a, we see that for a single ReLU unit taken in isolation, the LP is tighter than the SDP.\nHowever, when we have multiple units, the SDP is tighter than the LP. We illustrate this with a simple\nexample in 2 dimensions with 2 hidden nodes (See Figure 2b).\nSimple example to compare the LP and SDP. Consider a two dimensional example with input\nx = [x1,x2] and lower and upper bounds l = [\u2212\u0001,\u2212\u0001] and u = [\u0001,\u0001], respectively. The hidden layer\nactivations z1 and z2 are related to the input as z1 = ReLU(x1 + x2) and z2 = ReLU(x1\u2212 x2). The\nobjective is to maximize z1 +z2.\nThe LP constrains z1 and z2 independently. To see this, let us set the input x to a \ufb01xed value and look\nat the feasible values of z1 and z2. In the LP, the convex outer envelope that bounds z1 only depends on\nthe input x and the bounds l and u and is independent of the value of z2. Similarly, the outer envelope\nof z2 does not depend on the value of z1, and the feasible set for (z1,z2) is simply the product of the\nindividual feasible sets.\nIn contrast, the SDP has constraints that couple z1 and z2. As a result, the feasible set of (z1,z2) is a\nstrict subset of the product of the individual feasible sets. Figure 2b plots the LP and SDP feasible sets\n[z1,z2] for x = [ \u0001\n2 ]. Recall from the geometric observations (Section 4.1) that the arc of (cid:126)z1 depends\non the con\ufb01guration of (cid:126)x1 + (cid:126)x2, while that of (cid:126)z2 depends on (cid:126)x1\u2212 (cid:126)x2. Since the vectors (cid:126)x1 + (cid:126)x2 and\n(cid:126)x1\u2212 (cid:126)x2 are dependent, the feasible sets of (cid:126)z1 and (cid:126)z2 are also dependent on each other. An alternative\nway to see this is from the matrix constraint that P (cid:23) 0 in 4. This matrix constraint does not factor\ninto terms that decouple the entries P [z1] and P [z2], hence z1 and z2 cannot vary independently.\nWhen we reason about the relaxation over all feasible points x, the joint reasoning of the SDP allows\nit to achieve a better objective value. Figure 2c plots the feasible sets [z1,z2] over all valid x where\nthe optimal value of the SDP, fSDP, is less than that of the LP, fLP.\nWe can extend the preceding example to exhibit a dimension-dependent gap between the LP and the\n\u221a\nSDP for random weight matrices. In particular, for a random network with m hidden nodes and input\ndimension d, with high probability, fLP = \u0398(md) while fSDP = \u0398(m\nProposition 1. Suppose that the weight matrix W \u2208 Rm\u00d7d is generated randomly by sampling each\nelement Wij uniformly and independently from {\u22121,+1}. Also let the output vector c be the all-1s\nvector, 1. Take \u00afx = 0 and \u0001 = 1. Then, for some universal constant \u03b3,\n\nm). More formally:\n\n\u221a\nd+d\n\n2 , \u0001\n\nmd almost surely, while\n\nfLP\u2265 1\n\u221a\n2\nfSDP\u2264 \u03b3\u00b7(m\n\n\u221a\n\nd+d\n\nm) with probability 1\u2212exp(\u2212(d+m)).\n\nWe defer the proof of this to Section A.\n\n5 Multi-layer networks\n\nThe SDP relaxation to evaluate robustness for multi-layer networks is a straightforward generalization\nof the relaxation presented for one hidden layer in Section 3.1.\n\n6\n\n\fPGD-attack\nSDP-cert (this work)\nLP-cert\nGrad-cert\n\nGrad-NN [23] LP-NN [29]\n18%\n20%\n22%\n93%\n\n15%\n20%\n97%\n35%\n\nPGD-NN\n9%\n18%\n100%\nn/a\n\nTable 1: Fraction of non-certi\ufb01ed examples on MNIST. Different certi\ufb01cation techniques (rows) on\ndifferent networks (columns). SDP-cert is consistently better than other certi\ufb01cates. All numbers\nare reported for (cid:96)\u221e attacks at \u0001 = 0.1.\n\n5.1 General SDP\nThe interactions between xi\u22121 and xi in (2) (via the ReLU constraint) are analogous to the interaction\nbetween the input and hidden layer for the one layer case. Suppose we have bounds li\u22121,ui\u22121\u2208Rmi\u22121\non the inputs to the ReLU units at layer i such that li\u22121\u2264 xi\u22121\u2264 ui\u22121. We discuss how to obtain these\nbounds and their signi\ufb01cance in Section 5.2. Writing the constraints for each layer iteratively gives\nus the following SDP:\n\n(\u00afx,\u00afy) = max\n\nf SDP\ny\ns.t. for i = 1,...,L\n\n(cy\u2212c\u00afy)(cid:62)P [xL]\n\nP\n\nP [xi]\u2265 0, P [xi]\u2265 W i\u22121P [xi\u22121],\ndiag(P [xi(xi)(cid:62)]) = diag(W P [xi\u22121(xi)(cid:62)]),\ndiag(P [xi\u22121(xi\u22121)(cid:62)])\u2264 (li\u22121 +ui\u22121)(cid:12)P [xi\u22121]\u2212li\u22121(cid:12)ui\u22121,\nP [1] = 1, P (cid:23) 0\n\n(7)\n\n(ReLU constraints for layer i)\n(Input constraints for layer i)\n(Matrix constraints).\n\n5.2 Bounds on intermediate activations\n\nFrom the geometric interpretation of Section 4.1, we made the important observation that adding\nconstraints that keep P [x2] small aid in obtaining tighter relaxations. For the multi-layer case, since\nthe activations at layer i\u2212 1 act as input to the next layer i, adding constraints that restrict P [(xi\nj)2]\nwill lead to a tighter relaxation for the overall objective. The SDP automatically obtains some bound on\nj)2] from the bounds on the input, hence the SDP solution is well-de\ufb01ned and \ufb01nite even without\nP [(xi\nj)2] by relating it to the linear monomial\nthese bounds. However, we can tighten the bound on P [(xi\nj. One simple way to obtain bounds on activations\nP [(xi\nj is to treat each hidden unit separately, using simple interval arithmetic to obtain\nxi\n\nj)] via bounds on the value of the activation xi\n\nl0 = \u00afx\u2212\u00011 (Attack model),\nli = [W i\u22121]+li\u22121 +[W i\u22121]\u2212ui\u22121,\n\nu0 = \u00afx+\u00011 (Attack model),\nui = [W i\u22121]+ui\u22121 +[W i\u22121]\u2212li\u22121,\n\n(8)\n\nwhere ([M ]+)ij = max(Mij,0) and ([M ]\u2212)ij = min(Mij,0).\nIn our experiments on real networks (Section 6), we observe that these simple bounds are suf\ufb01cient\nto obtain good certi\ufb01cates. However tighter bounds could potentially lead to tighter certi\ufb01cates.\n\n6 Experiments\n\nIn this section, we evaluate the performance of our certi\ufb01cate (7) on neural networks trained using\ndifferent robust training procedures, and compare against other certi\ufb01cates in the literature.\nNetworks. We consider feedforward networks that are trained on the MNIST dataset of handwritten\ndigits using three different robust training procedures.\n\n1. Grad-NN. We use the two-layer network with 500 hidden nodes from [23], obtained by using\nan SDP-based bound on the gradient of the network (different from the SDP presented here) as a\nregularizer. We obtained the weights of this network from the authors of [23].\n\n2. LP-NN. We use a two-layer network with 500 hidden nodes (matching that of Grad-NN) trained\n\nvia the LP-based robust training procedure of [29]. The authors of [29] provided the weights.\n\n7\n\n\f(a)\n\n(b)\n\nFigure 3: Histogram of PGD margins for (a) points that are certi\ufb01ed by the SDP and (b) points that\nare not certi\ufb01ed by the SDP.\n\n3. PGD-NN. We consider a fully-connected network with four layers containing 200,100 and 50\nhidden nodes (i.e., the architecture is 784-200-100-50-10). We train this network using adversarial\ntraining [12] against the strong PGD attack [21]. We train to minimize a weighted combination\nof the regular cross entropy loss and adversarial loss. We tuned the hyperparameters based on the\nperformance of the PGD attack on a holdout set. The stepsize of the PGD attack was set to 0.1, number\n3.\nof iterations to 40, perturbation size \u0001 = 0.3 and weight on adversarial loss to 1\nThe training procedures for SDP-NN and LP-NN yield certi\ufb01cates of robustness (described in their\ncorresponding papers), but the training procedure of PGD-NN does not. Note that all the networks\nare \u201cforeign networks\u201d to our SDP, as their training procedures do not incorporate the SDP relaxation.\nCerti\ufb01cation procedures. Recall from Section 2 that an upper bound on the maximum incorrect\nmargin can be used to obtain certi\ufb01cates. We consider certi\ufb01cates from three different upper bounds.\n1. SDP-cert. This is the certi\ufb01cate we propose in this work. This uses the SDP upper bound that we\nde\ufb01ned in Section 5. The exact optimization problem is presented in (7) and the bounds on intermediate\nactivations are obtained using the interval arithmetic procedure presented in (8).\n\n2. LP-cert. This uses the upper bound based on the LP relaxation discussed in Section 4.2 which\nforms the basis for several existing works on scalable certi\ufb01cation [9, 10, 28, 29]. The LP uses\nlayer-wise bounds for intermediate nodes, similar to li,ui in our SDP formulation (7). For Grad-NN\nand LP-NN with a single hidden layer, the layerwise bounds can be computed exactly using interval\narithmetic. For the four-layer PGD-NN, in order to have a fair comparison with SDP-cert, we use\nthe same procedure (interval arithmetic) (8).\n\n3. Grad-cert. We use the upper bound proposed in [23]. This upper bound is based on the maximum\n\nnorm of the gradient of the network predictions and only holds for two-layer networks.\nTable 1 presents the performance of the three different certi\ufb01cation procedures on the three networks.\nFor each certi\ufb01cation method and network, we evaluate the associated upper bounds on the same\n1000 random test points and report the fraction of points that were not certi\ufb01ed. Computing the exact\nworst-case adversarial error is not computationally tractable. Therefore, to provide a comparison,\nwe also compute a lower bound on the adversarial error\u2014the error obtained by the PGD attack.\nPerformance of proposed SDP-cert. SDP-cert provides non-vacuous certi\ufb01cates for all networks\nconsidered. In particular, we can certify that the four layer PGD-NN has an error of at most 18% at\n\u0001 = 0.1. To compare, a lower bound on the robust error (PGD attack error) is 9%. On the two-layer\nnetworks, SDP-cert improves the previously-known bounds. For example, it certi\ufb01es that Grad-NN\nhas an error of at most 20% compared to the previously known 35%. Similarly, SDP-cert improves\nthe bound for LP-NN from 22% to 20%.\nThe gap between the lower bound (PGD) and upper bound (SDP) is because of points that cannot\nbe misclassi\ufb01ed by PGD but are also not certi\ufb01ed by the SDP. In order to further investigate these\npoints, we look at the margins obtained by the PGD attack to estimate the robustness of different points.\nFormally, let xPGD be the adversarial example generated by the PGD attack on clean input \u00afx with true\n[f (xPGD)\u00afy\u2212f (xPGD)y], the margin of the closest incorrect class. A small\nlabel \u00afy. We compute min\ny(cid:54)=\u00afy\nvalue indicates that the xPGD was close to being misclassi\ufb01ed. Figure 3 shows the histograms of the\nabove PGD margin. The examples which are not certi\ufb01ed by the SDP have much smaller margins\nthan those examples that are certi\ufb01ed: the average PGD margin is 1.2 on points that are not certi\ufb01ed\n\n8\n\n0246810Margin050100150200250300Number of pointsPGD Margin of closest incorrect class for SDP verified points0.00.51.01.52.02.53.03.54.0Margin051015202530Number of pointsPGD Margin of closest incorrect class for SDP unverified points\fand 4.5 on points that are certi\ufb01ed. From Figure 3, we see that a large number of the SDP uncerti\ufb01ed\npoints have very small margin, suggesting that these points might be misclassi\ufb01ed by stronger attacks.\nRemark. As discussed in Section 5, we could consider a version of the SDP that does not include the\nconstraints relating linear and quadratic terms at the intermediate layers of the network. Empirically,\nsuch an SDP produces vacuous certi\ufb01cates (> 90% error). Therefore, these constraints at intermediate\nlayers play a signi\ufb01cant role in improving the empirical performance of the SDP relaxation.\nComparison with other certi\ufb01cation approaches. From Table 1, we observe that SDP-cert\nconsistently performs better than both LP-cert and Grad-cert for all three networks.\nGrad-cert and LP-cert provide vacuous (> 90% error) certi\ufb01cates on networks that are not trained\nto minimize these certi\ufb01cates. This is because these certi\ufb01cates are tight only under some special cases\nthat can be enforced by training. For example, LP-cert is tight when the ReLU units do not switch\nlinear regions [29]. While a typical input causes only 20% of the hidden units of LP-NN to switch\nregions, 75% of the hidden units of Grad-NN switch on a typical input. Grad-cert bounds the gradient\nuniformly across the entire input space. This makes the bound loose on arbitrary networks that could\nhave a small gradient only on the data distribution of interest.\nComparison to concurrent work [26]. A variety of robust MNIST networks are certi\ufb01ed by Tjeng\nand Tedrake [26]. On Grad-NN, their certi\ufb01ed error is 30% which is looser than our SDP certi\ufb01ed error\n(20%). They also consider the CNN counterparts of LP-NN and PGD-NN, trained using the procedures\nof [29] and [21]. The certi\ufb01ed errors are 4.4% and 7.2% respectively. This reduction in the errors is\ndue to the CNN architecture. Further discussion on applying our SDP to CNNs appears in Section 7.\nOptimization setup. We use the YALMIP toolbox [19] with MOSEK as a backend to solve the\ndifferent convex programs that arise in these certi\ufb01cation procedures. On a 4-core CPU, the average\nSDP computation took around 25 minutes and the LP around 5 minutes per example.\n\n7 Discussion\n\nIn this work, we focused on fully connected feedforward networks for computational ef\ufb01ciency. In\nprinciple, our proposed SDP can be directly used to certify convolutional neural networks (CNNs);\nunrolling the convolution would result in a (large) feedforward network. Naively, current off-the-shelf\nsolvers cannot handle the SDP formulation of such large networks. Robust training on CNNs leads\nto better error rates: for example, adversarial training against the PGD adversary on a four-layer\nfeedforward network has error 9% against the PGD attack, while a four-layer CNN trained using a\nsimilar procedure has error less than 3% [21]. An immediate open question is whether the network\nin [21], which has so far withstood many different attacks, is truly robust on MNIST. We are hopeful\nthat we can scale up our SDP to answer this question, perhaps borrowing ideas from work on highly\nscalable SDPs [1] and explicitly exploiting the sparsity and structure induced by the CNN architecture.\nCurrent work on certi\ufb01cation of neural networks against adversarial examples has focused on\nperturbations bounded in some norm ball. In our work, we focused on the common (cid:96)\u221e attack because\nthe problem of securing multi-layer ReLU networks remains unsolved even in this well-studied attack\nmodel. Different attack models lead to different constraints only at the input layer; our SDP framework\ncan be applied to any attack model where these input constraints can be written as linear and quadratic\nconstraints. In particular, it can also be used to certify robustness against attacks bounded in (cid:96)2 norm.\n[13] provide alternative bounds for (cid:96)2 norm attacks based on the local gradient.\nGuarantees for the bounded norm attack model in general are suf\ufb01cient but not necessary for robustness\nagainst adversaries in the real world. Many successful attacks involve inconspicious but clearly visible\nperturbations [11, 24, 6, 4], or large but semantics-preserving perturbations in the case of natural\nlanguage [15]. These perturbations do not currently have well-de\ufb01ned mathematical models and\npresent yet another layer of challenge. However, we believe that the mathematical ideas we develop\nfor the bounded norm will be useful building blocks in the broader adversarial game.\n\n9\n\n\fthe\n\nfor\n\nare\n\nAll\n\ncode,\n\ndata\nplatform at\n\nCodalab\n\navailable\nand experiments\nhttps://worksheets.codalab.org/worksheets/\n\nReproducibility.\non\n0x6933b8cdbbfd424584062cdf40865f30/.\nAcknowledgements. This work was partially supported by a Future of Life Institute Research Award\nand Open Philanthrophy Project Award. JS was supported by a Fannie & John Hertz Foundation\nFellowship and an NSF Graduate Research Fellowship. We thank Eric Wong for providing relevant\nexperimental results. We are also grateful to Moses Charikar, Zico Kolter and Eric Wong for several\nhelpful discussions and anonymous reviewers for useful feedback.\n\nthis paper\n\nReferences\n[1] A. A. Ahmadi and A. Majumdar. DSOS and SDSOS optimization: more tractable alternatives\n\nto sum of squares and semide\ufb01nite optimization. arXiv preprint arXiv:1706.02586, 2017.\n\n[2] A. Athalye and I. Sutskever. Synthesizing robust adversarial examples. arXiv preprint\n\narXiv:1707.07397, 2017.\n\n[3] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security:\n\nCircumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.\n\n[4] T. B. Brown, D. Man\u00e9, A. Roy, M. Abadi, and J. Gilmer. Adversarial patch. arXiv preprint\n\narXiv:1712.09665, 2017.\n\n[5] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In IEEE\n\nSymposium on Security and Privacy, pages 39\u201357, 2017.\n\n[6] N. Carlini, P. Mishra, T. Vaidya, Y. Zhang, M. Sherr, C. Shields, D. Wagner, and W. Zhou. Hidden\n\nvoice commands. In USENIX Security, 2016.\n\n[7] N. Carlini, G. Katz, C. Barrett, and D. L. Dill. Ground-truth adversarial examples. arXiv, 2017.\n\n[8] K. Dvijotham, S. Gowal, R. Stanforth, R. Arandjelovic, B. O\u2019Donoghue, J. Uesato, and P. Kohli.\n\nTraining veri\ufb01ed learners with learned veri\ufb01ers. arXiv preprint arXiv:1805.10265, 2018.\n\n[9] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli. A dual approach to scalable\n\nveri\ufb01cation of deep networks. arXiv preprint arXiv:1803.06567, 2018.\n\n[10] R. Ehlers. Formal veri\ufb01cation of piece-wise linear feed-forward neural networks. In International\nSymposium on Automated Technology for Veri\ufb01cation and Analysis (ATVA), pages 269\u2013286, 2017.\n\n[11] I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash, A. Rahmati, and D. Song.\n\nRobust physical-world attacks on machine learning models. arXiv, 2017.\n\n[12] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.\n\nIn International Conference on Learning Representations (ICLR), 2015.\n\n[13] M. Hein and M. Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er against\nadversarial manipulation. In Advances in Neural Information Processing Systems (NIPS), pages\n2263\u20132273, 2017.\n\n[14] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel. Adversarial attacks on neural\n\nnetwork policies. arXiv, 2017.\n\n[15] R. Jia and P. Liang. Adversarial examples for evaluating reading comprehension systems. In\n\nEmpirical Methods in Natural Language Processing (EMNLP), 2017.\n\n[16] G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer. Reluplex: An ef\ufb01cient SMT solver\n\nfor verifying deep neural networks. arXiv preprint arXiv:1702.01135, 2017.\n\n[17] G. Katz, C. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer. Towards proving the adversarial\n\nrobustness of deep neural networks. arXiv, 2017.\n\n[18] J. Z. Kolter and E. Wong. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope (published at ICML 2018). arXiv preprint arXiv:1711.00851, 2017.\n\n10\n\n\f[19] J. L\u00f6fberg. YALMIP: A toolbox for modeling and optimization in MATLAB. In CACSD, 2004.\n\n[20] J. Lu, H. Sibai, E. Fabry, and D. Forsyth. No need to worry about adversarial examples in object\n\ndetection in autonomous vehicles. arXiv preprint arXiv:1707.03501, 2017.\n\n[21] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models\nresistant to adversarial attacks. In International Conference on Learning Representations (ICLR),\n2018.\n\n[22] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a defense to adversarial\nperturbations against deep neural networks. In IEEE Symposium on Security and Privacy, pages\n582\u2013597, 2016.\n\n[23] A. Raghunathan, J. Steinhardt, and P. Liang. Certi\ufb01ed defenses against adversarial examples.\n\nIn International Conference on Learning Representations (ICLR), 2018.\n\n[24] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter. Accessorize to a crime: Real and stealthy\nattacks on state-of-the-art face recognition. In ACM SIGSAC Conference on Computer and\nCommunications Security, pages 1528\u20131540, 2016.\n\n[25] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing\nproperties of neural networks. In International Conference on Learning Representations (ICLR),\n2014.\n\n[26] V. Tjeng and R. Tedrake. Verifying neural networks with mixed integer programming. arXiv\n\npreprint arXiv:1711.07356, 2017.\n\n[27] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. arXiv, 2010.\n\n[28] T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, D. Boning, I. S. Dhillon, and L. Daniel. Towards\nfast computation of certi\ufb01ed robustness for relu networks. arXiv preprint arXiv:1804.09699, 2018.\n\n[29] E. Wong and J. Z. Kolter. Provable defenses against adversarial examples via the convex outer\n\nadversarial polytope. In International Conference on Machine Learning (ICML), 2018.\n\n[30] E. Wong, F. Schmidt, J. H. Metzen, and J. Z. Kolter. Scaling provable adversarial defenses. arXiv\n\npreprint arXiv:1805.12514, 2018.\n\n11\n\n\f", "award": [], "sourceid": 6968, "authors": [{"given_name": "Aditi", "family_name": "Raghunathan", "institution": "Stanford University"}, {"given_name": "Jacob", "family_name": "Steinhardt", "institution": "UC Berkeley"}, {"given_name": "Percy", "family_name": "Liang", "institution": "Stanford University"}]}