{"title": "Certifying Geometric Robustness of Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 15313, "page_last": 15323, "abstract": "The use of neural networks in safety-critical computer vision systems calls for their\nrobustness certification against natural geometric transformations (e.g., rotation,\nscaling). However, current certification methods target mostly norm-based pixel\nperturbations and cannot certify robustness against geometric transformations. In\nthis work, we propose a new method to compute sound and asymptotically optimal\nlinear relaxations for any composition of transformations. Our method is based on\na novel combination of sampling and optimization. We implemented the method\nin a system called DeepG and demonstrated that it certifies significantly more\ncomplex geometric transformations than existing methods on both defended and\nundefended networks while scaling to large architectures.", "full_text": "Certifying Geometric Robustness of Neural Networks\n\nMislav Balunovi\u00b4c, Maximilian Baader, Gagandeep Singh, Timon Gehr, Martin Vechev\n\nDepartment of Computer Science\n\nETH Zurich\n\n{mislav.balunovic, mbaader, gsingh, timon.gehr, martin.vechev}@inf.ethz.ch\n\nAbstract\n\nThe use of neural networks in safety-critical computer vision systems calls for their\nrobustness certi\ufb01cation against natural geometric transformations (e.g., rotation,\nscaling). However, current certi\ufb01cation methods target mostly norm-based pixel\nperturbations and cannot certify robustness against geometric transformations. In\nthis work, we propose a new method to compute sound and asymptotically optimal\nlinear relaxations for any composition of transformations. Our method is based on\na novel combination of sampling and optimization. We implemented the method\nin a system called DEEPG and demonstrated that it certi\ufb01es signi\ufb01cantly more\ncomplex geometric transformations than existing methods on both defended and\nundefended networks while scaling to large architectures.\n\n1\n\nIntroduction\n\nRobustness against geometric transformations is a critical property that neural networks deployed\nin computer vision systems should satisfy. However, recent work [1, 2, 3] has shown that by using\nnatural transformations (e.g., rotations), one can generate adversarial examples [4, 5] that cause the\nnetwork to misclassify the image, posing a safety threat to the entire system. To address this issue, one\nwould ideally like to prove that a given network is free of such geometric adversarial examples. While\nthere has been substantial work on certifying robustness to changes in pixel intensity (e.g., [6, 7, 8]),\nonly the recent work of [9] proposed a method to certify robustness to geometric transformations. Its\nkey idea is summarized in Fig. 1: Here, the goal is to prove that any image obtained by translating\nthe original image by some \u03b4x, \u03b4y \u2208 [\u22124, 4] is classi\ufb01ed to label 3. To accomplish this task, [9]\npropagates the image and the parameters \u03b4x, \u03b4y through every component of the transformation using\ninterval bound propagation. The output region I is a convex shape capturing all images that can be\nobtained by translating the original image between \u22124 and 4 pixels. Finally, I is fed to a standard\nneural network veri\ufb01er which tries to prove that all images in I classify to 3. This method can also be\nimproved using tighter relaxation based on Polyhedra [10]. Unfortunately, as we show later, bound\npropagation is not satisfactory. The core issue is that any approach based on bound propagation\ninherently accumulates loss for every intermediate result, often producing regions that are too coarse\nto allow the neural network veri\ufb01er to succeed. Instead, we propose a new method based on sampling\nand optimization which computes a convex relaxation for the entire composition of transformations.\n\nConvex relaxations of translation (\u03b4x, \u03b4y,\n\n)\n\nNEURAL NETWORK\n\n\u03b4x, \u03b4y \u2208 [\u22124, 4]\n\n+\u03b4x\n\n+\u03b4y\n\nI\n\nP\n\nG\n\nInterval [9]\n\nPolyhedra\n\nDEEPG\n\nVERIFIER\n\nRobust\n\nNot robust\n\nFigure 1: End-to-end certi\ufb01cation of geometric robustness using different convex relaxations.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fThe key idea of our method is to sample the parameters of the transformation (e.g., \u03b4x, \u03b4y), obtaining\nsampled points at the output (red dots in Fig. 1), and to then compute sound and asymptotically\noptimal linear constraints around these points (shape G). We implemented our method in a system\ncalled DEEPG and showed that it is signi\ufb01cantly more precise than bound propagation (using Interval\nor Polyhedra relaxation) on a wide range of geometric transformations. To the best of our knowledge,\nDEEPG is currently the state-of-the-art system for certifying geometric robustness of neural networks.\n\nMain contributions Our main contributions are:\n\n\u2022 A novel method, combining sampling and optimization, to compute asymptotically optimal\nlinear constraints bounding the set of geometrically transformed images. We demonstrate\nthat these constraints enable signi\ufb01cantly more precise certi\ufb01cation compared to prior work.\n\u2022 A complete implementation of our certi\ufb01cation in a system called DEEPG. Our results show\nsubstantial bene\ufb01ts over the state-of-the-art across a range of geometric transformations. We\nmake DEEPG publicly available at https://github.com/eth-sri/deepg/.\n\n2 Related work\n\nWe now discuss some of the closely related work in certi\ufb01cation of the neural networks and their\nrobustness to geometric transformations.\n\nCerti\ufb01cation of neural networks Recently, a wide range of methods have been proposed to certify\nrobustness of neural networks against adversarial examples. Those methods are typically based on\nabstract interpretation [6, 7], linear relaxation [8, 11, 12], duality [13], SMT solving [14, 15, 16],\nmixed integer programming [17], symbolic intervals [18], Lipschitz optimization [19], semi-de\ufb01nite\nrelaxations [20] and combining approximations with solvers [21, 22]. Certi\ufb01cation procedures can\nalso be extended into end-to-end training of neural networks to be provably robust against adversarial\nexamples [23, 24]. Recent line of work [25, 26, 27] proposes to construct a classi\ufb01er based on the\nsmoothed neural network which comes with probabilistic guarantees on the robustness against L2\nperturbations. None of these works except [9] consider geometric transformations, while [9] only\nveri\ufb01es robustness against rotation. The work of [28] also generates linear relaxations of non-linear\nspeci\ufb01cations, but they do not handle geometric transformations. We remark that [1] considers a much\nmore restricted (discrete) setting leading to a \ufb01nite number of images. This means that certi\ufb01cation\ncan be performed by brute-force enumeration of this \ufb01nite set of transformed images. In our setting,\nas we will see, this is not possible, as we are dealing with an uncountable set of transformed images.\n\nNeural networks and geometric transformations There has been considerable research in em-\npirical quanti\ufb01cation of geometric robustness of neural networks [2, 3, 29, 30, 31, 32]. Another line\nof work focuses on the design of architectures which possess an inherent ability to learn to be more\nrobust against such transformations [33, 34]. However, all of these approaches offer only empirical\nevidence of robustness. Instead, our focus is to provide formal guarantees.\n\nCerti\ufb01cation of geometric transformations Prior work [9] introduced a method for analyzing\nrotations using the interval propagation and performed certi\ufb01cation using the state-of-the-art veri\ufb01er\nDEEPPOLY. It is straightforward to generalize their interval approach to handle more complex\ntransformations beyond rotation (we provide details in Appendix A.4). However, as we show\nexperimentally, interval propagation loses precision which is why certi\ufb01cation often does not succeed.\nTo capture relationships between pixel values and transformations, one would ideally use the Poly-\nhedra relaxation [10] instead of intervals. While Polyhedra offers higher precision, its worst-case\nrunning time is exponential in the number of variables [35]. Hence, it does not scale to geometric\ntransformations, where every pixel introduces a new variable. Thus, we extended the recent DeepPoly\nrelaxation [9] (a restricted Polyhedra) with custom approximations for the operations used in several\ngeometric transformations (e.g., translation, scaling). Our experimental results show that even though\nthis approach signi\ufb01cantly improves over intervals, it is not precise enough to certify robustness of\nmost images in our dataset. In turn, this motivates the method introduced in this paper.\n\n2\n\n\f(a) Original image\n\n(b) Interpolation\n\n(c) Rotated image\n\nFigure 2: Image rotated by \u2212 \u03c0\n(a) with a focus on relevant interpolation regions. Finally, (c) shows the resulting rotated image.\n\n4 degrees. Here, (a) shows the original image, while (b) shows part of\n\n3 Background\n\nOur goal is to certify the robustness of a neural network against adversarial examples generated\nusing parameterized geometric transformations. In this section we formulate this problem statement,\nintroduce the notation of transformations and provide a running example which we use throughout\nthe paper to illustrate key concepts.\n\nGeometric image transformations A geometric image transformation consists of a parameterized\nspatial transformation T\u00b5, an interpolation I which ensures the result can be represented on a\ndiscrete pixel grid, and parameterized changes in brightness and contrast P\u03b1,\u03b2. We assume T\u00b5 is\na composition of bijective transformations such as rotation, translation, shearing and scaling (full\ndescriptions of all transformations are in Appendix A.1). While our approach also applies to other\ninterpolation methods, in this work we focus on the case where I is the commonly-used bilinear\ninterpolation.\nTo ease presentation, we assume the image (with integer coordinates) consists of an even number of\nrows and columns, is centered around (0, 0), and its coordinates are odd integers. We note that all\nresults hold in the general case (without the assumption).\nInterpolation The bilinear interpolation I : R2 \u2192 [0, 1] evaluated on a coordinate (x, y) \u2208 R2 is a\npolynomial of degree 2 given by\n(1)\n\n(cid:88)\n\nI(x, y) :=\n\npi+\u03b4i,j+\u03b4j (2 \u2212 |i + \u03b4i \u2212 x|)(2 \u2212 |j + \u03b4j \u2212 y|).\n\n1\n4\n\n\u03b4i,\u03b4j\u2208{0,2}\n\nHere, (i, j) is the lower-left corner of an interpolation region Ai,j := [i, i + 2] \u00d7 [j, j + 2] which\ncontains pixel (x, y). Matrix p consists of pixel values at corresponding coordinates in the original\nimage. The function I is continuous on R2 and smooth on the interior of every interpolation region.\nThese interpolation regions are depicted with the blue horizontal and vertical dotted lines in Fig. 2b.\nThe pixel value \u02dcpx,y of the transformed image can be obtained by (i) calculating the preimage of the\ncoordinate (x, y) under T\u00b5, (ii) interpolating the resulting coordinate using I to obtain a value \u03be, and\n(iii) applying the changes in contrast and brightness via P\u03b1,\u03b2(\u03be) = \u03b1\u03be + \u03b2, to obtain the \ufb01nal pixel\nvalue \u02dcpx,y = I\u03b1,\u03b2,\u00b5(x, y). These three steps are captured by\n\u22121\nI\u03b1,\u03b2,\u00b5(x, y) := P\u03b1,\u03b2 \u25e6 I \u25e6 T\n\u00b5 (x, y).\n\n(2)\nRunning example To illustrate key concepts introduced throughout the paper, we use the running\nexample of an MNIST image [36] shown in Fig. 2. On this image, we will apply a rotation R\u03c6 with\nan angle \u03c6. For our running example, we set P\u03b1,\u03b2 to be the identity.\nConsider the pixel \u02dcp5,1 in the transformed image shown in Fig. 2c (the pixel is marked with a red dot).\nThe transformed image is obtained by rotating the original image in Fig. 2a by an angle \u03c6 = \u2212 \u03c0\n4 .\nThis results in the pixel value\n\n\u02dcp5,1 = I \u25e6 R\u22121\u2212 \u03c0\n\n4\n\n(5, 1) = I(2\u221a2, 3\u221a2) \u2248 0.30\n\nproduces the point (2\u221a2, 3\u221a2) with non-integer\nHere, the preimage of point (5, 1) under R\u2212 \u03c0\ncoordinates. This point belongs to the interpolation region A1,3 and by applying I(2\u221a2, 3\u221a2) to the\noriginal image in Fig. 2a, we obtain the \ufb01nal pixel value \u2248 0.30 for pixel (5, 1) in the rotated image.\n\n4\n\n3\n\n3513A1,1A1,3A3,1A3,3A5,1A5,3\fNeural network certi\ufb01cation To certify robustness of a neural network with respect to a geometric\ntransformation, we rely on the state-of-the-art veri\ufb01er DeepPoly [9]. For complex properties such as\ngeometric transformations, the veri\ufb01er needs to receive a convex relaxation of all possible inputs to\nthe network. If this relaxation is too coarse, the veri\ufb01er will not be able to certify the property.\n\nProblem statement To guarantee robustness, our goal is to compute a convex relaxation of all\npossible images obtainable via the transformation I\u03b1,\u03b2,\u00b5. This relaxation can then be provided as\nan input to a neural network veri\ufb01er (e.g., DeepPoly). If the veri\ufb01er proves that the neural network\nclassi\ufb01cation is correct for all images in this relaxation, then geometric robustness is proven.\n\n4 Asymptotically optimal linear constraints via optimization and sampling\n\nWe now present our method for computing the optimal linear approximation (in terms of volume).\n\nMotivation As mentioned earlier, designing custom transformers for every operation incurs preci-\nsion loss at every step in the sequence of transformations. Our key insight is to de\ufb01ne an optimization\nproblem in a way where its solution is the optimal (in terms of volume) lower and upper linear\nconstraint for the entire sequence. To solve this optimization problem, we propose a method based on\nsampling and linear programming. Our method produces, for every pixel, asymptotically optimal\nlower and upper linear constraints for the entire composition of transformations (including inter-\npolation). Such an optimization problem is generally dif\ufb01cult to solve, however, we \ufb01nd that with\ngeometric transformations, our approach is scalable and contributes only a small portion to the entire\nend-to-end certi\ufb01cation running time.\n\nOptimization problem To compute linear constraints for every pixel value, we split the hyper-\nrectangle h representing the set of possible parameters into s splits {hk}k\u2208[s]. Our goal will be to\ncompute sound lower and upper linear constraints for the pixel value I\u03ba(x, y) for a given pixel (x, y).\nBoth of these constraints will be linear in the parameters \u03ba = (\u03b1, \u03b2, \u00b5) \u2208 hk. We de\ufb01ne optimal\nand sound linear (lower and upper) constraints for I\u03ba(x, y) to be a pair of hyperplanes ful\ufb01lling\n\nwhile minimizing\n\nwT\nl \u03ba + bl \u2264 I\u03ba(x, y)\nwT\nu \u03ba + bu \u2265 I\u03ba(x, y)\n\n\u2200\u03ba \u2208 hk\n\u2200\u03ba \u2208 hk,\n\n(cid:90)\n\n1\nV\n\n(cid:90)\n\n(cid:0)\n(cid:0)(bu + wT\n\nl \u03ba(cid:1))d\u03ba\nu \u03ba) \u2212 I\u03ba(x, y)(cid:1) d\u03ba.\n\nI\u03ba(x, y) \u2212 (bl + wT\n\nL(wl, bl) =\n\nU (wu, bu) =\n\n1\nV\n\n\u03ba\u2208hk\n\n\u03ba\u2208hk\n\n(3)\n(4)\n\n(5)\n\n(6)\n\n(7)\n\nHere V denotes the normalization constant equal to the volume of hk. Intuitively, the optimal\nconstraints should result in a convex relaxation of minimum volume. This formulation also allows\nindependent computation for every pixel, facilitating parallelization across pixels. Next, we describe\nhow we obtain lower constraints (upper constraints are computed analogously).\n\nStep 1: Compute a potentially unsound constraint To generate a reasonable but a potentially\nunsound linear constraint, we sample parameters \u03ba1, . . . , \u03baN from hk, approximate the integral in\nEq. (5) by its Monte Carlo estimate LN and enforce the constraints only at the sampled points:\n\nN(cid:88)\n\n(cid:0)\n\ni=1\n\n(cid:1)),\n\nmin\n\n(wl,bl)\u2208W\n\nLN (wl, bl) = min\n\n(wl,bl)\u2208W\n\n1\nN\n\nI\u03bai(x, y) \u2212 (bl + wT\n\nl \u03bai\n\nbl + wT\n\nl \u03bai \u2264 I\u03bai(x, y)\nl + w(cid:48)T\n\n\u2200i \u2208 [N ].\n\nThis problem can be solved exactly using linear programming (LP). The solution is a potentially\nunsound constraint b(cid:48)\nl \u03ba (it may violate the constraint at non-sampled points). For our running\nexample, the region bounded by these potentially unsound lower and upper linear constraints is shown\nas orange in Fig. 3.\n\n4\n\n\fPixel value\n\n1\n\n0.75\n\n0.5\n\n0.25\n\n0\n\n0\n\n\u03c0\n8\n\nAngle \u03c6\n\n\u03c0\n4\n\nInterval bounds\n\nI \u25e6 R\u22121\n\n\u03c6 (5,1)\n\nSound enclosure\nUnsound enclosure\nRandom samples\n\nFigure 3: Unsound (Step 1) and sound (Step 3) enclosures for I \u25e6 R\u22121\nsampling from \u03c6 \u2208 [0, \u03c0\nI \u25e6 R\u22121\n\n\u03c6 (5, 1), with respect to random\n4 ], in comparison to the interval bounds from prior work [9]. Note that\n\n\u03c6 (5, 1) is not piecewise linear, because bilinear interpolation is a polynomial of degree 2.\n\nl + w(cid:48)T\n\nStep 2: Bounding the maximum violation Our next step is to compute an upper bound on the\nviolation of Eq. (3) induced by our potentially unsound constraint from Step 1. This violation is\nequal to the maximum of the function f (\u03ba) = b(cid:48)\nl \u03ba \u2212 I\u03ba(x, y) over the hyperrectangle hk.\nIt can be shown that the function f is Lipschitz continuous which enables application of standard\nglobal optimization techniques with guarantees [37]. We remark that such methods have already been\napplied for optimization over inputs to neural network [38, 19].\nWe show a high level description of this optimization procedure in Algorithm 1. Throughout\nthe optimization, we maintain a partition of the domain of function f into hyperrectangles h. The\nhyperrectangles are stored in a priority queue q sorted by an upper bound f bound\nof the maximum value\nthe function can take inside of the hyperrectangle. At every step, shown in Line 6, the hyperrectangle\nwith the highest upper bound is further re\ufb01ned into smaller hyperrectangles h(cid:48)\nk and their\nupper bounds are recomputed. This procedure \ufb01nishes when the difference between every upper\nbound and the maximal value at one of the hyperrectangle centers is at most \u0001. Finally, maximum\nupper bound of the elements in the queue is returned as a result of the optimization. We provide more\ndetails on the optimization algorithm in Appendix A.5.\n\n1, . . . , h(cid:48)\n\nh\n\nh\n\nh\n\n) from q with maximum f bound\n\nAlgorithm 1 Lipschitz Optimization with Bound Re\ufb01nement\n1: Input: f, h, k \u2265 2\n2: fmax := f (center(h))\n3: f bound\n:= f bound(h,\u2207f )\n4: q := [(h, f bound\n)]\n5: repeat\npop (h(cid:48), f bound\n6:\nh(cid:48)\n1, . . . , h(cid:48)\nh(cid:48)\n7:\nfor i := 1 to k do\n8:\nfmax := max(f (center(h(cid:48)\n9:\nf bound\n10:\ni,\u2207f )\nh(cid:48)\n> fmax + \u0001 then\nif f bound\n11:\nh(cid:48)\nadd (h(cid:48)\ni, f bound\n12:\n13:\n14:\n15: until a maximal f bound\n16: return fmax + \u0001\n\nk := partition(h(cid:48),\u2207f )\n:= f bound(h(cid:48)\n\nin q is lower than fmax + \u0001\n\nend if\nend for\n\ni)), fmax)\n\n) to q\n\nh(cid:48)\n\ni\n\nh(cid:48)\n\ni\n\ni\n\nh(cid:48)\n\n5\n\n\f(cid:12)(cid:12)\u2207hf(cid:12)(cid:12)T\n\n1\n2\n\nThe two most important aspects of the algorithm, which determine the speed of convergence, are (i)\ncomputation of an upper bound, and (ii) choosing an edge along which to re\ufb01ne the hyperrectangle.\nTo compute an upper bound inside of a hyperrectangle spanned between points hl and hu, we use:\n\n) +\n\n2\n\n(hu \u2212 hl).\n\nHere(cid:12)(cid:12)\u2207hf(cid:12)(cid:12) can be any upper bound on the true gradient which satis\ufb01es |\u2202if (\u03ba)| \u2264\n\nf (\u03ba) \u2264 f ( hu+hl\n\nevery dimension i. To compute such a bound, we perform reverse-mode automatic differentiation\nof the function f using interval propagation (this is explained in more details in Appendix A.2). As\nan added bene\ufb01t, results of our analysis can be used for pruning of hyperrectangles. We reduce a\nhyperrectangle to one of its lower-dimensional faces along dimensions for which analysis on gradients\nproves that the respective partial derivative has a constant sign within the entire hyperrectangle. We\nalso improve on standard re\ufb01nement heuristics \u2014 instead of re\ufb01ning along the largest edge, we\nadditionally weight edge length by an upper bound on the partial derivative of that dimension. In\nour experiments, we \ufb01nd that these insights speed up convergence compared to simply applying the\nmethod out of the box.\nLet vl be the result of the above Lipschitz optimization algorithm. It is then guaranteed that\n\n(8)\n\n(cid:12)(cid:12)\u2207hf(cid:12)(cid:12)i for\n\n(cid:0)b(cid:48)\n\nvl \u2264 max\n\u03ba\u2208hk\n\nl \u03ba \u2212 I\u03ba(x, y)(cid:1)\n\nl + w(cid:48)T\n\n\u2264 vl + \u0001.\n\nl \u2212 vl \u2212 \u0001 and wl = w(cid:48)\n\nStep 3: Compute a sound linear constraint\nIn the previous step we obtained a bound on the\nmaximum violation of Eq. (3). Using this bound, in this step, we update our linear constraints\nbl = b(cid:48)\nl to obtain a sound lower linear constraint (it satis\ufb01es Eq. (3)). The\nregion bounded by the sound lower and upper linear constraints is shown as green in Fig. 3. It is easy\nto check that our constraint is sound:\nl \u03ba = b(cid:48)\n\nbl + wT\n\nl \u03ba \u2264 I\u03ba(x, y)\n\n\u2200\u03ba \u2208 hk.\n\nl \u2212 vl \u2212 \u0001 + w(cid:48)T\n\nRunning example As in Section 3, we focus on the pixel (5, 1), choose s = 2 splits for [0, \u03c0\n2 ],\n4 ]. In Step 1, we sample random points {0.1, 0.2, 0.4,\nand focus our attention on the split [0, \u03c0\n4 ] and evaluate I \u25e6 R\u22121\n\u03c6 (5, 1) on these points, obtaining\n0.5, 0.7} for our parameter \u03c6 \u2208 [0, \u03c0\n{0.98, 0.97, 0.92, 0.79, 0.44}.\nThese points correspond to the blue dots in Fig. 3. Solving the LP in Eq. (7) yields b(cid:48)\nl = 1.07 and\nw(cid:48)\nl = \u22120.90. Similarly, we compute a potentially unsound upper constraint. Together, these two\nconstraints form the orange enclosure shown in Fig. 3. This enclosure is in fact unsound, as some\npoints on the blue line (those not sampled above) are not included in the region.\n\nIn Step 2, using Lipschitz optimization for the function 1.07 \u2212 0.9\u03c6 \u2212 I \u25e6 R\u22121\n\u03c6 with \u0001 = 0.02 over\n4 ] we obtain vl = 0.08 resulting in the sound linear lower constraint 0.97 \u2212 0.9\u03c6. This,\n\u03c6 \u2208 [0, \u03c0\ntogether with the similarly obtained sound upper constraint, forms the sound (green) enclosure in\nFig. 3. In the \ufb01gure we also show the black dashed lines corresponding to the interval bounds from\nprior work [9] which enclose much larger volume than our linear constraints.\n\nAsymptotically optimal constraints While our constraints may not be optimal, one can show they\nare asymptotically optimal as we increase the number of samples:\nTheorem 1. Let N be the number of points sampled in our algorithm and \u0001 the tolerance used in the\nLipschitz optimization. Let (wl, bl) be our lower constraint and let (w\u2217, b\u2217) be the minimum of L.\nFor every \u03b4 > 0 there exists N\u03b4 such that |L(wl, bl) \u2212 L(w\u2217, b\u2217)| < \u03b4 + \u0001 for every N > N\u03b4, with\nhigh probability. Analogous result holds for upper constraint (wu, bu) and function U.\nWe provide a proof of Theorem 1 in Appendix A.3. Essentially, as we increase the number of sampled\npoints in our linear program, we approach the optimal constraints. The \u0001 tolerance in Lipschitz\noptimization could be further decreased towards 0 to obtain an offset as small as desired. In our\nexperiments, we also show empirically that close-to-optimal bounds can be obtained with a relatively\nsmall number of samples.\n\n6\n\n\fTable 1: Comparison of DEEPG which uses linear constraints with the baseline based on interval\nbound propagation. Here, R(\u03c6) corresponds to rotations with angles between \u00b1\u03c6; T(x, y), to\ntranslations between \u00b1x pixels horizontally and between \u00b1y pixels vertically; Sc(p), to scaling the\nimage between \u00b1p%; Sh(m), to shearing with a shearing factor between \u00b1m%; and B(\u03b1, \u03b2), to\nchanges in contrast between \u00b1\u03b1% and brightness between \u00b1\u03b2.\n\nAccuracy (%) Attacked (%)\n\nCerti\ufb01ed (%)\n\nInterval [9] DEEPG\n\nMNIST\n\nFashion-MNIST\n\nCIFAR-10\n\nR(30)\nT(2, 2)\nSc(5), R(5), B(5, 0.01)\nSh(2), R(2), Sc(2), B(2, 0.001)\nSc(20)\nR(10), B(2, 0.01)\nSc(3), R(3), Sh(2)\nR(10)\nR(2), Sh(2)\nSc(1), R(1), B(1, 0.001)\n\n99.1\n99.1\n99.3\n99.2\n91.4\n87.7\n87.2\n71.2\n68.5\n73.2\n\n0.0\n1.0\n0.0\n0.0\n11.2\n3.6\n3.5\n10.8\n5.6\n3.8\n\n7.1\n0.0\n0.0\n1.0\n19.1\n0.0\n3.5\n28.4\n0.0\n0.0\n\n87.8\n77.0\n34.0\n72.0\n70.8\n71.4\n56.6\n87.8\n54.2\n54.4\n\n5 Experimental evaluation\n\nWe implemented our certi\ufb01cation method in a system called DEEPG. First, we demonstrate that\nDEEPG can certify robustness to signi\ufb01cantly more complex transformations than both prior work and\ntraditional bound propagation approaches based on relational abstractions. Second, we experimentally\nshow that our method requires relatively small number of samples to converge to the optimal linear\nconstraints. Third, we investigate the effectiveness of a variety of training methods to train a network\nprovably robust to geometric transformations. Finally, we demonstrate that DEEPG is scalable and\ncan certify geometric robustness for large networks. We provide networks and code to reproduce the\nexperiments in this paper at https://github.com/eth-sri/deepg/.\n\nExperimental setup We evaluate on image recognition datasets: MNIST [36], Fashion-MNIST\n[39] and CIFAR-10 [40]. For each dataset, we randomly select 100 images from the test set to certify.\nAmong these 100 images, we discard all images that are misclassi\ufb01ed even without any transformation.\nIn all experiments for MNIST and Fashion-MNIST we evaluate a 3-layer convolutional neural\nnetwork with 9 618 neurons, while for the more challenging CIFAR-10 dataset we consider a 4-\nlayer convolutional network with 45 216 neurons. Details of these architectures are provided in\nAppendix B.2. We certify robustness to composition of transformations such as rotation, translation,\nscaling, shearing and changes in brightness and contrast. These transformations are formally de\ufb01ned\nin Appendix A.1. All experiments except the one with large networks were performed on a desktop\nPC with 2 GeForce RTX 2080 Ti GPU-s and 16-core Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz.\n\nComparison with prior work In the \ufb01rst set of experiments we certify robustness to geometric\ntransformations and compare our results to prior work [9]. While they considered only rotations, we\nimplemented their approach for other transformations and their compositions. This generalization is\ndescribed in detail in Appendix A.4 and shown as Interval in Table 1. For each dataset and geometric\ntransformation, we train a neural network using data augmentation with the transformation that we\nare certifying. Additionally, we use PGD adversarial training to obtain a network robust to noise\nwhich we later show signi\ufb01cantly increases veri\ufb01ability of the network. We provide runtime analysis\nof the experiments and all hyperparameters used for certi\ufb01cation in Appendix B.2.\nWe \ufb01rst measure the success of a randomized attack which samples 100 transformed images uniformly\nat random [2]. Then, we generate linear constraints using DEEPG, as described in Section 4.\nConstraints produced by both our method and the interval baseline are given as an input to the state-\nof-the-art neural network veri\ufb01er DeepPoly [9]. We invoke both methods for every split separately,\nwith the same set of splits. In order to make results fully comparable, both methods are parallelized\nover pixels in the same way and the re\ufb01nement parameter k of interval propagation is chosen such\nthat its runtime is roughly equal to the one of DEEPG. Table 1 shows the results of our evaluation.\n\n7\n\n\fPixel value\n0.75\n\n0.5\n\n0.25\n\n0\n\nInterval\n\nPolyhedra\n\nDEEPG\n\n0\n\nTranslation transformation\n\nImages certi\ufb01ed (%)\n\nInterval\n\nPolyhedra DeepG\n\nT(0.25)\nSc(4)\nSh(10)\n\n0\n0\n0\n\n14\n23\n12\n\n90\n75\n38\n\nFigure 4: Translation transformation approxi-\nmated using interval propagation, polyhedra and\nDEEPG, for a representative pixel.\n\nTable 2: Certi\ufb01cation success rates of interval\npropagation [9], Polyhedra and DeepG (for\ntranslation, shearing and scaling).\n\nWhile interval propagation used in prior work can prove robustness for simpler transformations, it\nfails for more complex geometric transformations. For example, already for translation which has two\nparameters it does not succeed at certifying a single image. This shows that, in order to certify complex\ntransformations, one has to capture relationships between pixel values and transformation parameters\nusing a relational abstraction. Linear constraints computed by DEEPG provide a signi\ufb01cant increase\nin certi\ufb01cation precision, justifying the more involved method to compute the constraints.\n\nComparison with custom transformers To understand the bene\ufb01ts of our method further, we\ndecided to construct a more advanced baseline than the interval propagation. In particular, we\ncrafted specialized transformers for DeepPoly [9], which is a restriction of Polyhedra, to handle the\noperations used in geometric transformations. These kind of transformers have brought signi\ufb01cant\nbene\ufb01ts over the interval propagation in certifying robustness to noise perturbations, and thus, we\nwanted to see what the bene\ufb01ts would be in our setting of geometric transformations. Concretely, we\ndesigned Polyhedra transformers for addition and multiplication which enables handling of geometric\noperations. These transformers are non-trivial and are listed in Appendix B.1. Fig. 4 shows that\nrelaxation with these transformers is signi\ufb01cantly tighter than intervals. This also translates to higher\ncerti\ufb01cation rates compared to intervals, shown in Table 2. However, this method still fails to certify\nmany images on which DEEPG succeeds. This experiment shows that generating constraints for\nthe entire composition of transformations as in DEEPG is (expectedly) more effective than crafting\ntransformers for individual operations of the transformation.\n\nConvergence towards optimal bounds While Theorem 1 shows that DEEPG obtains optimal\nlinear constraints in the limit, we also experimentally check how quickly our method converges in\npractice. For this experiment, we consider rotation between \u22122\u25e6 and 2\u25e6, composed with scaling\nbetween \u22125% and 5%. We run DEEPG while varying the number of samples used for the LP solver\n(n) and tolerance in Lipschitz optimization (\u0001). In Table 3 we show the approximation error (average\ndistance between lower and upper linear constraint), certi\ufb01cation rate and time taken to compute the\nconstraints. For instance, even with only 100 samples and \u0001 = 0.01 DEEPG can certify almost every\nimage in 1.2 seconds. While higher number of samples and smaller tolerance are necessary to obtain\nmore precise bounds, they do not bring signi\ufb01cant increase in certi\ufb01cation rates.\n\nTable 3: Speed of convergence of DEEPG towards optimal linear bounds.\n\nn\n100\n100\n1000\n10000\n\n\u0001\n0.1\n0.01\n0.001\n0.00001\n\nApproximation error Certi\ufb01ed(%)\n0.032\n0.010\n0.006\n0.005\n\n54.8\n96.5\n97.8\n98.2\n\nRuntime(s)\n1.1\n1.2\n4.9\n46.1\n\n8\n\n\fTable 4: Certi\ufb01cation using DEEPG for neural networks trained using different training techniques.\n\nAccuracy (%) Attack success (%)\n\nCerti\ufb01ed (%)\n\nMNIST\n\nStandard\nAugmented\nL\u221e-PGD\nL\u221e-DIFFAI\nL\u221e-PGD + Augmented\nL\u221e-DIFFAI + Augmented\n\n98.7\n99.0\n98.9\n98.4\n99.1\n98.0\n\n52.0\n4.0\n45.5\n51.0\n1.0\n6.0\n\nInterval [9] DEEPG\n\n0.0\n0.0\n0.0\n1.0\n0.0\n42.0\n\n12.0\n46.5\n20.2\n17.0\n77.0\n66.0\n\nComparison of different training methods Naturally, we would like to know how to train a\nneural network which is certi\ufb01ably robust against geometric transformations. In this experiment, we\nevaluate effectiveness of a wide range of training methods to train a network certi\ufb01ably robust to\nthe translation up to 2 pixels in both x and y direction. While [2] train with adversarial examples\nand show that this leads to lower empirical success of an adversary, we are interested in a different\nquestion: can we train neural networks to be provably robust against geometric transformations?\nWe \ufb01rst train the same MNIST network as before, in a standard fashion, without any kind of defense.\nAs expected, the resulting network shown in the \ufb01rst row of Table 4 is not robust at all \u2013 random\nattack can \ufb01nd many translations which cause misclassi\ufb01cation of the network. To alleviate this\nproblem, we incorporate data augmentation into training by randomly translating every image in\na batch between -4 and 4 pixels before passing it through the network. As a result, the network is\nsigni\ufb01cantly more robust against the attack, however there are still many images that we fail to certify.\nTo make the network more amenable to certi\ufb01cation, we consider two additional techniques. They are\nboth based on the observation that convex relaxation of geometric transformations can be viewed as\nnoise in the pixel values. To train a network which is robust to noise we consider adversarial training\nwith projected gradient descent (PGD) [41] and provable defense based on DIFFAI [23]. We also\nconsider combination of these techniques with data augmentation.\nBased on the results shown in Table 4, we conclude that training with PGD coupled with data\naugmentation achieves both highest accuracy and highest number of certi\ufb01ed images. Training with\nDIFFAI signi\ufb01cantly increases certi\ufb01cation rate for interval bound propagation, but has the drawback\nof signi\ufb01cantly lower accuracy than other methods.\n\nEvaluation on large networks We evaluated whether DEEPG can certify robustness of large\nCIFAR-10 networks with residual connections. We certi\ufb01ed ResNet-Tiny and ResNet-18 from [42]\nwith 312k and 558k neurons, respectively. As certifying these networks is challenging, we consider\nrelatively small rotation between -2 and 2 degrees. As before, we generated constraints using both\nDEEPG and interval bound propagation. This experiment was performed on a Tesla V100 GPU.\nResNet-Tiny was trained using PGD adversarial training and has standard accuracy 83.8%. Using the\nconstraints from DEEPG, we certify 91.1% of images while interval constraints allow us to certify\nonly 1.3%. Average time for the veri\ufb01er to certify or report failure is 528 seconds per image.\nResNet-18 was trained using DIFFAI [42] and has standard accuracy 40.2%. In this case, using\nconstraints from DEEPG, the veri\ufb01er certi\ufb01es 82.2% of images. Similarly as before (see Table 4),\ntraining the network with DIFFAI also enables a high certi\ufb01cation rate of 77.8% even using interval\nconstraints. However, the drawback is that this network has low accuracy of only 40.2% compared\nto ResNet-Tiny trained with PGD which has 83.8% accuracy. On average, the veri\ufb01er takes 1652\nseconds per image. Here, generating the constraints for both networks took 25 seconds on average.\n\n6 Conclusion\n\nWe introduced a new method for computing (optimal at the limit) linear constraints on geometric\nimage transformations combining Lipschitz optimization and linear programming. We implemented\nthe method in a system called DEEPG and showed that it leads to signi\ufb01cantly better certi\ufb01cation\nprecision in proving robustness against geometric perturbations than prior work, on both defended\nand undefended networks.\n\n9\n\n\fAcknowledgments\n\nWe would like to thank anonymous reviewers for their feedback and Christoph M\u00fcller for his help\nwith enabling certi\ufb01cation of large residual networks using DeepPoly.\n\nReferences\n[1] K. Pei, Y. Cao, J. Yang, and S. Jana, \u201cTowards practical veri\ufb01cation of machine learning: The\n\ncase of computer vision systems,\u201d arXiv preprint arXiv:1712.01785, 2017.\n\n[2] L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry, \u201cExploring the landscape of\n\nspatial robustness,\u201d in International Conference on Machine Learning, (ICML), 2019.\n\n[3] C. Kanbak, S. M. Moosavi Dezfooli, and P. Frossard, \u201cGeometric robustness of deep networks:\nanalysis and improvement,\u201d IEEE Conference on Computer Vision and Pattern Recognition,\n(CVPR), 2018.\n\n[4] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus,\n\u201cIntriguing properties of neural networks,\u201d in International Conference on Learning Representa-\ntions, (ICLR), 2014.\n\n[5] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli,\n\u201cEvasion attacks against machine learning at test time,\u201d in European Conference on Machine\nLearning and Knowledge Discovery in Databases, (ECML/PKDD), 2013.\n\n[6] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev, \u201cAi2:\nSafety and robustness certi\ufb01cation of neural networks with abstract interpretation,\u201d in IEEE\nSymposium on Security and Privacy, (SP), 2018.\n\n[7] G. Singh, T. Gehr, M. Mirman, M. P\u00fcschel, and M. T. Vechev, \u201cFast and effective robustness\n\ncerti\ufb01cation,\u201d in Advances in Neural Information Processing Systems, (NeurIPS), 2018.\n\n[8] H. Zhang, T. Weng, P. Chen, C. Hsieh, and L. Daniel, \u201cEf\ufb01cient neural network robustness\ncerti\ufb01cation with general activation functions,\u201d in Advances in Neural Information Processing\nSystems, (NeurIPS), 2018.\n\n[9] G. Singh, T. Gehr, M. P\u00fcschel, and M. T. Vechev, \u201cAn abstract domain for certifying neural\n\nnetworks,\u201d in Symposium on Principles of Programming Languages, (POPL), 2019.\n\n[10] P. Cousot and N. Halbwachs, \u201cAutomatic discovery of linear restraints among variables of a\n\nprogram,\u201d in Symposium on Principles of Programming Languages, (POPL), 1978.\n\n[11] T. Weng, H. Zhang, H. Chen, Z. Song, C. Hsieh, L. Daniel, D. S. Boning, and I. S. Dhillon,\n\u201cTowards fast computation of certi\ufb01ed robustness for relu networks,\u201d in International Conference\non Machine Learning, (ICML), 2018.\n\n[12] H. Salman, G. Yang, H. Zhang, C. Hsieh, and P. Zhang, \u201cA convex relaxation barrier to tight\nrobustness veri\ufb01cation of neural networks,\u201d in Advances in Neural Information Processing\nSystems, (NeurIPS), 2019.\n\n[13] K. Dvijotham, R. Stanforth, S. Gowal, T. A. Mann, and P. Kohli, \u201cA dual approach to scalable\n\nveri\ufb01cation of deep networks,\u201d in Uncertainty in Arti\ufb01cial Intelligence, (UAI), 2018.\n\n[14] R. Bunel, I. Turkaslan, P. H. S. Torr, P. Kohli, and P. K. Mudigonda, \u201cA uni\ufb01ed view of piecewise\nlinear neural network veri\ufb01cation,\u201d in Advances in Neural Information Processing Systems,\n(NeurIPS), 2018.\n\n[15] R. Ehlers, \u201cFormal veri\ufb01cation of piece-wise linear feed-forward neural networks,\u201d in Automated\n\nTechnology for Veri\ufb01cation and Analysis, (ATVA), 2017.\n\n[16] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer, \u201cReluplex: An ef\ufb01cient\nSMT solver for verifying deep neural networks,\u201d in Computer Aided Veri\ufb01cation, (CAV), 2017.\n[17] V. Tjeng, K. Y. Xiao, and R. Tedrake, \u201cEvaluating robustness of neural networks with mixed\ninteger programming,\u201d in International Conference on Learning Representations, (ICLR), 2019.\n[18] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana, \u201cFormal security analysis of neural\n\nnetworks using symbolic intervals,\u201d in USENIX Security Symposium, 2018.\n\n[19] W. Ruan, X. Huang, and M. Kwiatkowska, \u201cReachability analysis of deep neural networks with\nprovable guarantees,\u201d in International Joint Conference on Arti\ufb01cial Intelligence, (IJCAI), 2018.\n\n10\n\n\f[20] A. Raghunathan, J. Steinhardt, and P. S. Liang, \u201cSemide\ufb01nite relaxations for certifying ro-\nbustness to adversarial examples,\u201d in Advances in Neural Information Processing Systems,\n(NeurIPS), 2018.\n\n[21] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana, \u201cEf\ufb01cient formal safety analysis of neural\n\nnetworks,\u201d in Advances in Neural Information Processing Systems, (NeurIPS), 2018.\n\n[22] G. Singh, T. Gehr, M. P\u00fcschel, and M. Vechev, \u201cBoosting robustness certi\ufb01cation of neural\n\nnetworks,\u201d in International Conference on Learning Representations, (ICLR), 2019.\n\n[23] M. Mirman, T. Gehr, and M. T. Vechev, \u201cDifferentiable abstract interpretation for provably\n\nrobust neural networks,\u201d in International Conference on Machine Learning, (ICML), 2018.\n\n[24] E. Wong and J. Z. Kolter, \u201cProvable defenses against adversarial examples via the convex outer\n\nadversarial polytope,\u201d in International Conference on Machine Learning, (ICML), 2018.\n\n[25] M. L\u00e9cuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, \u201cCerti\ufb01ed robustness to adversarial\nexamples with differential privacy,\u201d in IEEE Symposium on Security and Privacy, (SP), 2019.\n[26] B. Li, C. Chen, W. Wang, and L. Carin, \u201cSecond-order adversarial attack and certi\ufb01able\n\nrobustness,\u201d arXiv preprint arXiv:1809.03113, 2018.\n\n[27] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter, \u201cCerti\ufb01ed adversarial robustness via randomized\n\nsmoothing,\u201d in International Conference of Machine Learning, (ICML), 2019.\n\n[28] C. Qin, K. D. Dvijotham, B. O\u2019Donoghue, R. Bunel, R. Stanforth, S. Gowal, J. Uesato,\nG. Swirszcz, and P. Kohli, \u201cVeri\ufb01cation of non-linear speci\ufb01cations for neural networks,\u201d in\nInternational Conference on Learning Representations, (ICLR), 2019.\n\n[29] I. Goodfellow, H. Lee, Q. V. Le, A. Saxe, and A. Y. Ng, \u201cMeasuring invariances in deep\n\nnetworks,\u201d in Advances in Neural Information Processing Systems, (NeurIPS), 2009.\n\n[30] A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, \u201cThe robustness of deep networks: A\n\ngeometrical perspective,\u201d IEEE Signal Processing Magazine, 2017.\n\n[31] R. Alaifari, G. S. Alberti, and T. Gauksson, \u201cADef: an iterative algorithm to construct adversarial\n\ndeformations,\u201d in International Conference on Learning Representations, (ICLR), 2019.\n\n[32] C. Xiao, J.-Y. Zhu, B. Li, W. He, M. Liu, and D. Song, \u201cSpatially transformed adversarial\n\nexamples,\u201d in International Conference on Learning Representations, (ICLR), 2018.\n\n[33] G. E. Hinton, A. Krizhevsky, and S. D. Wang, \u201cTransforming auto-encoders,\u201d in Arti\ufb01cial\n\nNeural Networks and Machine Learning \u2013 ICANN 2011, 2011.\n\n[34] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, \u201cSpatial transformer networks,\u201d\n\nin Advances in Neural Information Processing Systems, NIPS, 2015.\n\n[35] G. Singh, M. P\u00fcschel, and M. T. Vechev, \u201cFast polyhedra abstract domain,\u201d in Symposium on\n\nPrinciples of Programming Languages, (POPL), 2017.\n\n[36] Y. LeCun, C. Cortes, and C. Burges, \u201cMnist handwritten digit database,\u201d AT&T Labs [Online].\n\nAvailable: http://yann. lecun. com/exdb/mnist, 2010.\n\n[37] P. Hansen and B. Jaumard, \u201cLipschitz optimization,\u201d in Handbook of global optimization, 1995.\n[38] E. de Weerdt, Q. Chu, and J. A. Mulder, \u201cNeural network output optimization using interval\n\nanalysis,\u201d IEEE Transactions on Neural Networks, 2009.\n\n[39] H. Xiao, K. Rasul, and R. Vollgraf, \u201cFashion-mnist: a novel image dataset for benchmarking\n\nmachine learning algorithms,\u201d 2017.\n\n[40] A. Krizhevsky, \u201cLearning multiple layers of features from tiny images,\u201d 2009.\n[41] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, \u201cTowards deep learning models\nresistant to adversarial attacks,\u201d in International Conference on Learning Representations,\n(ICLR), 2018.\n\n[42] M. Mirman, G. Singh, and M. Vechev, \u201cA provable defense for deep residual networks,\u201d arXiv\n\npreprint arXiv:1903.12519, 2019.\n\n11\n\n\f", "award": [], "sourceid": 8790, "authors": [{"given_name": "Mislav", "family_name": "Balunovic", "institution": "ETH Zurich"}, {"given_name": "Maximilian", "family_name": "Baader", "institution": "ETH Z\u00fcrich"}, {"given_name": "Gagandeep", "family_name": "Singh", "institution": "ETH Zurich"}, {"given_name": "Timon", "family_name": "Gehr", "institution": "ETH Zurich"}, {"given_name": "Martin", "family_name": "Vechev", "institution": "ETH Zurich, Switzerland"}]}