{"title": "SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud", "book": "Advances in Neural Information Processing Systems", "page_first": 4672, "page_last": 4681, "abstract": "Inference using deep neural networks is often outsourced to the cloud since it is a computationally demanding task.\u00a0 However, this raises a fundamental issue of trust. How can a client be sure that the cloud has performed inference correctly? A lazy cloud provider might use a simpler but less accurate model to reduce its own computational load, or worse, maliciously modify the inference results sent to the client. We propose SafetyNets, a framework that enables an untrusted server (the cloud) to provide a client with a short mathematical proof of the correctness of inference tasks that they perform on behalf of the client. Specifically, SafetyNets develops and implements a specialized interactive proof (IP) protocol for verifiable execution of a class of deep neural networks, i.e., those that can be represented as arithmetic circuits. Our empirical results on three- and four-layer deep neural networks demonstrate the run-time costs of SafetyNets for both the client and server are low. SafetyNets detects any incorrect computations of the neural network by the untrusted server with high probability, while achieving state-of-the-art accuracy on the MNIST digit recognition (99.4%) and TIMIT speech recognition tasks (75.22%).", "full_text": "SafetyNets: Veri\ufb01able Execution of Deep Neural\n\nNetworks on an Untrusted Cloud\n\nZahra Ghodsi, Tianyu Gu, Siddharth Garg\n\nNew York University\n\n{zg451, tg1553, sg175}@nyu.edu\n\nAbstract\n\nInference using deep neural networks is often outsourced to the cloud since it is\na computationally demanding task. However, this raises a fundamental issue of\ntrust. How can a client be sure that the cloud has performed inference correctly?\nA lazy cloud provider might use a simpler but less accurate model to reduce its\nown computational load, or worse, maliciously modify the inference results sent to\nthe client. We propose SafetyNets, a framework that enables an untrusted server\n(the cloud) to provide a client with a short mathematical proof of the correctness of\ninference tasks that they perform on behalf of the client. Speci\ufb01cally, SafetyNets\ndevelops and implements a specialized interactive proof (IP) protocol for veri\ufb01able\nexecution of a class of deep neural networks, i.e., those that can be represented\nas arithmetic circuits. Our empirical results on three- and four-layer deep neural\nnetworks demonstrate the run-time costs of SafetyNets for both the client and server\nare low. SafetyNets detects any incorrect computations of the neural network by\nthe untrusted server with high probability, while achieving state-of-the-art accuracy\non the MNIST digit recognition (99.4%) and TIMIT speech recognition tasks\n(75.22%).\n\n1\n\nIntroduction\n\nRecent advances in deep learning have shown that multi-layer neural networks can achieve state-of-\nthe-art performance on a wide range of machine learning tasks. However, training and performing\ninference (using a trained neural network for predictions) can be computationally expensive. For this\nreason, several commercial vendors have begun offering \u201cmachine learning as a service\" (MLaaS)\nsolutions that allow clients to outsource machine learning computations, both training and inference,\nto the cloud.\nWhile promising, the MLaaS model (and outsourced computing, in general) raises immediate security\nconcerns, speci\ufb01cally relating to the integrity (or correctness) of computations performed by the\ncloud and the privacy of the client\u2019s data [16]. This paper focuses on the former, i.e., the question\nof integrity. Speci\ufb01cally, how can a client perform inference using a deep neural network on an\nuntrusted cloud, while obtaining strong assurance that the cloud has performed inference correctly?\nIndeed, there are compelling reasons for a client to be wary of a third-party cloud\u2019s computations. For\none, the cloud has a \ufb01nancial incentive to be \u201clazy.\" A lazy cloud might use a simpler but less accurate\nmodel, for instance, a single-layer instead of a multi-layer neural network, to reduce its computational\ncosts. Further the cloud could be compromised by malware that modi\ufb01es the results sent back to\nthe client with malicious intent. For instance, the cloud might always mis-classify a certain digit in\na digit recognition task, or allow unauthorized access to certain users in a face recognition based\nauthentication system.\nThe security risks posed by cloud computing have spurred theoretical advances in the area of veri\ufb01able\ncomputing (VC) [21]. The idea is to enable a client to provably (and cheaply) verify that an untrusted\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fserver has performed computations correctly. To do so, the server provides to the client (in addition\nto the result of computation) a mathematical proof of the correctness of the result. The client rejects,\nwith high probability, any incorrectly computed results (or proofs) provided by the server, while\nalways accepting correct results (and corresponding proofs) 1. VC techniques aim for the following\ndesirable properties: the size of the proof should be small, the client\u2019s veri\ufb01cation effort must be\nlower than performing the computation locally, and the server\u2019s effort in generating proofs should not\nbe too high.\nThe advantage of proof-based VC is that it provides unconditional, mathematical guarantees on the\nintegrity of computation performed by the server. Alternative solutions for veri\ufb01able execution require\nthe client to make trust assumptions that are hard for the client to independently verify. Trusted\nplatform modules [7], for instance, require the client to place trust on the hardware manufacturer, and\nassume that the hardware is tamper-proof. Audits based on the server\u2019s execution time [15] require\nprecise knowledge of the server\u2019s hardware con\ufb01guration and assume, for instance, that the server is\nnot over-clocked.\n\nThe work in this paper leverages pow-\nerful VC techniques referred to as in-\nteractive proof (IP) systems [5, 9, 18,\n19]. An IP system consists of two en-\ntities, a prover (P), i.e., the untrusted\nserver, and a veri\ufb01er (V), i.e., the\nclient. The framework is illustrated in\nFigure 1. The veri\ufb01er sends the prover\nan input x, say a batch of test images,\nand asks the prover to compute a func-\nFigure 1: High-level overview of the SafetyNets IP protocol.\ntion y = f (x). In our setting, f (.) is\nIn this example, an untrusted server intentionally changes\na trained multi-layer neural network\nthe classi\ufb01cation output from 4 to 5.\nthat is known to both the veri\ufb01er and\nprover, and y is the neural network\u2019s classi\ufb01cation output for each image in the batch. The prover\nperforms the computation and sends the veri\ufb01er a purported result y(cid:48) (which is not equal to y if the\nprover cheats). The veri\ufb01er and prover then engage in n rounds of interaction. In each round, the\nveri\ufb01er sends the prover a randomly picked challenge, and the prover provides a response based on\nthe IP protocol. The veri\ufb01er accepts that y(cid:48) is indeed equal to f (x) if it is satis\ufb01ed with the prover\u2019s\nresponse in each round, and rejects otherwise.\nA major criticism of IP systems (and, indeed, all existing VC techniques) when used for verifying\ngeneral-purpose computations is that the prover\u2019s overheads are large, often orders of magnitude\nmore than just computing f (x) [21]. Recently, however, Thaler [18] showed that certain types of\ncomputations admit IP protocols with highly ef\ufb01cient veri\ufb01ers and provers, which lays the foundations\nfor the specialized IP protocols for deep neural networks that we develop in this paper.\n\nPaper Contributions. This paper introduces SafetyNets, a new (and to the best of our knowledge,\nthe \ufb01rst) approach for veri\ufb01able execution of deep neural networks on untrusted clouds. Speci\ufb01cally,\nSafetyNets composes a new, specialized IP protocol for the neural network\u2019s activation layers with\nThaler\u2019s IP protocol for matrix multiplication to achieve end-to-end veri\ufb01ability, dramatically reducing\nthe bandwidth costs versus a naive solution that veri\ufb01es the execution of each layer of the neural\nnetwork separately.\nSafetyNets applies to a certain class of neural networks that can be represented as arithmetic circuits\nthat perform computations over \ufb01nite \ufb01elds (i.e., integers modulo a large prime p). Our implementa-\ntion of SafetyNets addresses several practical challenges in this context, including the choice of the\nprime p, its relationship to accuracy of the neural network, and to the veri\ufb01er and prover run-times.\nEmpirical evaluations on the MNIST digit recognition and TIMIT speech recognition tasks illustrate\nthat SafetyNets enables practical, low-cost veri\ufb01able outsourcing of deep neural network execution\nwithout compromising classi\ufb01cation accuracy. Speci\ufb01cally, the client\u2019s execution time is 8\u00d7-80\u00d7\nlower than executing the network locally, the server\u2019s overhead in generating proofs is less than 5%,\nand the client/server exchange less than 8 KBytes of data during the IP protocol. SafetyNets\u2019 security\n\n1Note that the SafetyNets is not intended to and cannot catch any inherent mis-classi\ufb01cations due to the\n\nmodel itself, only those that result from incorrect computations of the model by the server.\n\n2\n\nClient(verifier)Untrusted Server(prover)digit=4 5challenge 1response 1Random ChallengeVerifyExecute Neural NetworkCompute Responsechallenge nresponse n...Random ChallengeRejectCompute ResponseInput ImageReject\fguarantees ensure that a client can detect any incorrect computations performed by a malicious\nserver with probability vanishingly close to 1. At the same time, SafetyNets achieves state-of-the-art\nclassi\ufb01cation accuracies of 99.4% and 75.22% on the MNIST and TIMIT datasets, respectively.\n\n2 Background\n\nIn this section, we begin by reviewing necessary background on IP systems, and then describe the\nrestricted class of neural networks (those that can be represented as arithmetic circuits) that SafetyNets\nhandles.\n\n2.1\n\nInteractive Proof Systems\n\nExisting IP systems proposed in literature [5, 9, 18, 19] use, at their heart, a protocol referred to as\nthe sum-check protocol [13] that we describe here in some detail, and then discuss its applicability in\nverifying general-purpose computations expressed as arithmetic circuits.\n\nSum-check Protocol Consider a d-degree n-variate polynomial g(x1, x2, . . . , xn), where each\nvariable xi \u2208 Fp (Fp is the set of all natural numbers between zero and p \u2212 1, for a given prime p)\nand g : Fn\n\np \u2192 Fp. The prover P seeks to prove the following claim:\n\ny =\n\n. . .\n\ng(x1, x2, . . . , xn)\n\n(1)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nx1\u2208{0,1}\n\nx2\u2208{0,1}\n\nxn\u2208{0,1}\n\nthat is, the sum of g evaluated at 2n points is y. P and V now engage in a sum-check protocol to\nverify this claim. In the \ufb01rst round of the protocol, P sends the following unidimensional polynomial\n(2)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\ng(x1, x2, . . . , xn)\n\nh(x1) =\n\n. . .\n\nx2\u2208{0,1}\n\nx3\u2208{0,1}\n\nxn\u2208{0,1}\n\nto V in the form of its coef\ufb01cients. V checks if h(0) + h(1) = y. If yes, it proceeds, otherwise\nit rejects P\u2019s claim. Next, V picks a random value q1 \u2208 Fp and evaluates h(q1) which, based on\nEquation 2, yields a new claim:\n\nh(q1) =\n\n. . .\n\ng(q1, x2, . . . , xn).\n\n(3)\n\n(cid:88)\n\n(cid:88)\n\n(cid:88)\n\nx2\u2208{0,1}\n\nx3\u2208{0,1}\n\nxn\u2208{0,1}\n\nV now recursively calls the sum-check protocol to verify this new claim. By the \ufb01nal round of the\nsum-check protocol, P returns the value g(q1, q2, . . . , qn) and the V checks if this value is correct by\nevaluating the polynomial by itself. If so, V accepts the original claim in Equation 1, otherwise it\nrejects the claim.\n[2] V rejects an incorrect claim by P with probability greater than (1 \u2212 \u0001) where\nLemma 2.1.\n\u0001 = nd\n\np is referred to as the soundness error.\n\nIPs for Verifying Arithmetic Circuits\nIn their seminal work, Goldwasser et al. [9] demonstrated\nhow sum-check can be used to verify the execution of arithmetic circuits using an IP protocol now\nreferred to as GKR. An arithmetic circuit is a directed acyclic graph of computation over elements of\na \ufb01nite \ufb01eld Fp in which each node can perform either addition or multiplication operations (modulo\np). While we refer the reader to [9] for further details of GKR, one important aspect of the protocol\nbears mention.\nGKR organizes nodes of an arithmetic circuit into layers; starting with the circuit inputs, the outputs\nof one layer feed the inputs of the next. The GKR proof protocol operates backwards from the circuit\noutputs to its inputs. Speci\ufb01cally, GKR uses sum-check to reduce the prover\u2019s assertion about the\ncircuit output into an assertion about the inputs of the output layer. This assertion is then reduced to\nan assertion about the inputs of the penultimate layer, and so on. The protocol continues iteratively till\nthe veri\ufb01er is left with an assertion about the circuit inputs, which it checks on its own. The layered\nnature of GKR\u2019s prover aligns almost perfectly with the structure of a multi-layer neural network and\nmotivates the use of an IP system based on GKR for SafetyNets.\n\n3\n\n\f2.2 Neural Networks as Arithmetic Circuits\n\nAs mentioned before, SafetyNets applies to neural networks that can be expressed as arithmetic\ncircuits. This requirement places the following restrictions on the neural network layers.\n\nQuadratic Activations The activation functions in SafetyNets must be polynomials with integer\ncoef\ufb01cients (or, more precisely, coef\ufb01cients in the \ufb01eld Fp). The simplest of these is the element-wise\nquadratic activation function whose output is simply the square of its input. Other commonly used\nactivation functions such as ReLU, sigmoid or softmax activations are precluded, except in the \ufb01nal\noutput layer. Prior work has shown that neural networks with quadratic activations have the same\nrepresentation power as networks with threshold activations and can be ef\ufb01ciently trained [6, 12].\n\nSum Pooling Pooling layers are commonly used to reduce the network size, to prevent over\ufb01tting\nand provide translation invariance. SafetyNets uses sum pooling, wherein the output of the pooling\nlayer is the sum of activations in each local region. However, techniques such as max pooling [10]\nand stochastic pooling [22] are not supported since max and divisions operations are not easily\nrepresented as arithmetic circuits.\nFinite Field Computations SafetyNets supports computations over elements of the \ufb01eld Fp, that\nis, integers in the range {\u2212 p\u22121\n2 }. The inputs, weights and all intermediate values\ncomputed in the network must lie in this range. Note that due to the use of quadratic activations\nand sum pooling, the values in the network can become quite large. In practice, we will pick large\nprimes to support these large values. We note that this restriction applies to the inference phase only;\nthe network can be trained with \ufb02oating point inputs and weights. The inputs and weights are then\nre-scaled and quantized, as explained in Section 3.3, to \ufb01nite \ufb01eld elements.\nWe note that the restrictions above are shared by a recently proposed technique, CryptoNets [8], that\nseeks to perform neural network based inference on encrypted inputs so as to guarantee data privacy.\nHowever, Cryptonets does not guarantee integrity and compared to SafetyNets, incurs high costs\nfor both the client and server (see Section 4.3 for a comparison). Conversely, SafetyNets is targeted\ntowards applications where integrity is critical, but does not provide privacy.\n\n2 , . . . , 0, . . . , p\u22121\n\n2.3 Mathematical Model\n\np\n\np\n\nis:\n\np\n\n, and biases bi\u22121 \u2208 Fni\np .\n\nAn L layer neural network with the constraints discussed above can be modeled, without loss of\ngenerality, as follows. The input to the network is x \u2208 Fn0\u00d7b\n, where n0 is the dimension of each\ninput and b is the batch size. Layer i \u2208 [1, L] has ni output neurons2, and is speci\ufb01ed using a weight\nmatrix wi\u22121 \u2208 Fni\u00d7ni\u22121\nThe output of Layer i \u2208 [1, L], yi \u2208 Fni\u00d7b\nyi = \u03c3quad(wi\u22121.yi\u22121 + bi\u221211T ) \u2200i \u2208 [1, L \u2212 1]; yL = \u03c3out(wL\u22121.yL\u22121 + bL\u221211T ),\n(4)\nwhere \u03c3quad(.) is the quadratic activation function, \u03c3out(.) is the activation function of the output\nlayer, and 1 \u2208 Fb\np is the vector of all ones. We will typically use softmax activations in the output\nlayer. We will also \ufb01nd it convenient to introduce the variable zi \u2208 Fni+1\u00d7b\n\nde\ufb01ned as\n\np\n\nzi = wi.yi + bi1T \u2200i \u2208 [0, L \u2212 1].\n\n(5)\nThe model captures both fully connected and convolutional layers; in the latter case the weight matrix\nis sparse. Further, without loss of generality, all successive linear transformations in a layer, for\ninstance sum pooling followed by convolutions, are represented using a single weight matrix.\nWith this model in place, the goal of SafetyNets is to enable the client to verify that yL was correctly\ncomputed by the server. We note that as in prior work [19], SafetyNets amortizes the prover and\nveri\ufb01er costs over batches of inputs. If the server incorrectly computes the output corresponding to\nany input in a batch, the veri\ufb01er rejects the entire batch of computations.\n\n2The 0th layer is de\ufb01ned to be input layer and thus y0 = x.\n\n4\n\n\f3 SafetyNets\n\nWe now describe the design and implementation of our end-to-end IP protocol for verifying execution\nof deep networks. The SafetyNets protocol is a specialized form of the IP protocols developed by\nThaler [18] for verifying \u201cregular\" arithmetic circuits, that themselves specialize and re\ufb01ne prior\nwork [5]. The starting point for the protocol is a polynomial representation of the network\u2019s inputs\nand parameters, referred to as a multilinear extension.\nMultilinear Extensions Consider a matrix w \u2208 Fn\u00d7n\n. Each row and column of w can be\nreferenced using m = log2(n) bits, and consequently one can represent w as a function W :\n{0, 1}m \u00d7 {0, 1}m \u2192 Fp. That is, given Boolean vectors t, u \u2208 {0, 1}m, the function W (t, u)\nreturns the element of w at the row and column speci\ufb01ed by Boolean vectors t and u, respectively.\np \u2192 Fp that has the following\nA multi-linear extension of W is a polynomial function \u02dcW : Fm\ntwo properties: (1) given vectors t, u \u2208 Fm\np such that \u02dcW (t, u) = W (t, u) for all points on the unit\nhyper-cube, that is, for all t, u \u2208 {0, 1}m; and (2) \u02dcW has degree 1 in each of its variables. In the\nremainder of this discussion, we will use \u02dcX, \u02dcYi and \u02dcZi and \u02dcWi to refer to multi-linear extensions of\nx, yi, zi, and wi, respectively, for i \u2208 [1, L]. We will also assume, for clarity of exposition, that\nthe biases, bi are zero for all layers. The supplementary draft describes how biases are incorporated.\nConsistent with the IP literature, the description of our protocol refers to the client as the veri\ufb01er and\nthe server as the prover.\n\np \u00d7 Fm\n\np\n\nProtocol Overview The veri\ufb01er seeks to check the result yL provided by the prover corresponding\nto input x. Note that yL is the output of the \ufb01nal activation layer which, as discussed in Section 2.2,\nis the only layer that does not use quadratic activations, and is hence not amenable to an IP.\nInstead, in SafetyNets, the prover computes and sends zL\u22121 (the input of the \ufb01nal activation layer) as a\nresult to the veri\ufb01er. zL\u22121 has the same dimensions as yL and therefore this re\ufb01nement has no impact\non the server to client bandwidth. Furthermore, the veri\ufb01er can easily compute yL = \u03c3out(zL\u22121)\nlocally.\nNow, the veri\ufb01er needs to check whether the prover computed zL\u22121 correctly. As noted by Vu\net al. [19], this check can be replaced by a check on whether the multilinear extension of zL\u22121 is\ncorrectly computed at a randomly picked point in the \ufb01eld, with minimal impact on the soundness\nerror. That is, the veri\ufb01er picks two vectors, qL\u22121 \u2208 Flog(nL)\nat random,\nevaluates \u02dcZL\u22121(qL\u22121, rL\u22121), and checks whether it was correctly computed using a specialized\nsum-check protocol for matrix multiplication due to Thaler [18] (described in Section 3.1).\nSince zL\u22121 depends on wL\u22121 and yL\u22121, sum-check yields assertions on the values of\n\u02dcWL\u22121(qL\u22121, sL\u22121) and \u02dcYL\u22121(sL\u22121, rL\u22121), where sL\u22121 \u2208 Flog(nL\u22121)\nis another random vector\npicked by the veri\ufb01er during sum-check.\n\u02dcWL\u22121(qL\u22121, sL\u22121) is an assertion about the weight of the \ufb01nal layer. This is checked by the veri\ufb01er\nlocally since the weights are known to both the prover and veri\ufb01er. Finally, the veri\ufb01er uses our\nspecialized sum-check protocol for activation layers (described in Section 3.2) to reduce the assertion\non \u02dcYL\u22121(sL\u22121, rL\u22121) to an assertion on \u02dcZL\u22122(qL\u22122, sL\u22122). The protocol repeats till it reaches the\ninput layer and produces an assertion on \u02dcX(s0, r0), the multilinear extension of the input x. The\nveri\ufb01er checks this locally. If at any point in the protocol, the veri\ufb01er\u2019s checks fail, it rejects the\nprover\u2019s computation. Next, we describe the sum-check protocols for matrix multiplication and\nactivation that SafetyNets uses.\n\nand rL\u22121 \u2208 Flog(b)\n\np\n\np\n\np\n\n3.1 Sum-check for Matrix Multiplication\n\nSince zi = wi.yi (recall we assumed zero biases for clarity), we can check an assertion about the\nmultilinear extension of zi evaluated at randomly picked points qi and ri by expressing \u02dcZi(qi, ri)\nas [18]:\n\n\u02dcZi(qi, ri) =\n\n\u02dcWi(qi, j). \u02dcYi(j, ri)\n\n(6)\n\n(cid:88)\n\nj\u2208{0,1}log(ni)\n\n5\n\n\fNote that Equation 6 has the same form as the sum-check problem in Equation 1. Consequently the\nsum-check protocol described in Section 2.1 can be used to verify this assertion. At the end of the\nsum-check rounds, the veri\ufb01er will have assertions on \u02dcWi which it checks locally, and \u02dcYi which is\nchecked using the sum-check protocol for quadratic activations described in Section 3.2.\nThe prover run-time for running the sum-check protocol in layer i is O(ni(ni\u22121 + b)), the veri\ufb01er\u2019s\nrun-time is O(nini\u22121) and the prover/veri\ufb01er exchange 4 log(ni) \ufb01eld elements.\n\n3.2 Sum-check for Quadratic Activation\n\nIn this step, we check an assertion about the output of quadratic activation layer i, \u02dcYi(si, ri), by\nwriting it in terms of the input of the activation layer as follows:\n\n\u02dcYi(si, ri) =\n\n\u02dcI(si, j) \u02dcI(ri, k) \u02dcZ 2\n\ni\u22121(j, k),\n\n(7)\n\nj\u2208{0,1}log(ni),k\u2208{0,1}log(b)\n\n(cid:88)\n\nwhere \u02dcI(., .) is the multilinear extension of the identity matrix. Equation 7 can also be veri\ufb01ed using\nthe sum-check protocol, and yields an assertion about \u02dcZi\u22121, i.e., the inputs to the activation layer.\nThis assertion is in turn checked using the protocol described in Section 3.1.\nThe prover run-time for running the sum-check protocol in layer i is O(bni), the veri\ufb01er\u2019s run-\ntime is O(log(bni)) and the prover/veri\ufb01er exchange 5 log(bni) \ufb01eld elements. This completes the\ntheoertical description of the SafetyNets specialized IP protocol.\nLemma 3.1. The SafetyNets veri\ufb01er rejects incorrect computations with probability greater than\n\n(1 \u2212 \u0001) where \u0001 = 3b(cid:80)L\nIn practice, with p = 261 \u2212 1 the soundness error < 1\nsizes.\n\nis the soundness error.\n\ni=0 ni\np\n\n230 for practical network parameters and batch\n\nImplementation\n\ni \u2208 Rni\u22121\u00d7ni and b(cid:48)\n\n3.3\nThe fact that SafetyNets operates only on elements in a \ufb01nite \ufb01eld Fp during inference imposes a\npractical challenge. That is, how do we convert \ufb02oating point inputs and weights from training into\n\ufb01eld elements, and how do we select the size of the \ufb01eld p?\ni \u2208 Rni be the \ufb02oating point parameters obtained from training for\nLet w(cid:48)\neach layer i \u2208 [1, L]. We convert the weights to integers by multiplying with a constant \u03b2 > 1 and\nrounding, i.e., wi = (cid:98)\u03b2w(cid:48)\ni(cid:101). We do the same for inputs with a scaling factor \u03b1, i.e., x = (cid:98)\u03b1x(cid:48)(cid:101). Then,\nto ensure that all values in the network scale isotropically, we must set bi = (cid:98)\u03b12i\u22121\nWhile larger \u03b1 and \u03b2 values imply lower quantization errors, they also result in large values in the\nnetwork, especially in the layers closer to the output. Similar empirical observations were made\nby the CryptoNets work [8]. To ensure accuracy the values in the network must lie in the range\n[\u2212 p\u22121\n2 ]; this in\ufb02uences the choice of the prime p. On the other hand, we note that large primes\nincrease the veri\ufb01er and prover run-time because of the higher cost of performing modular additions\nand multiplications.\nAs in prior works [5, 18, 19], we restrict our choice of p to Mersenne primes since they afford ef\ufb01cient\nmodular arithmetic implementations, and speci\ufb01cally to the primes p = 261 \u2212 1 and p = 2127 \u2212 1.\nFor a given p, we explore and different values of \u03b1 and \u03b2 and use the validation dataset to the pick the\nones that maximize accuracy while ensuring that the values in the network lie within [\u2212 p\u22121\n2 , p\u22121\n2 ].\n\ni(cid:101).\n\u03b2(2i\u22121+1)b(cid:48)\n\n2 , p\u22121\n\n4 Empirical Evaluation\n\nIn this section, we present empirical evidence to support our claim that SafetyNets enables low-cost\nveri\ufb01able execution of deep neural networks on untrusted clouds without compromising classi\ufb01cation\naccuracy.\n\n6\n\n\f(a) MNIST\n\n(b) MNIST-Back-Rand\n\n(c) TIMIT\n\nFigure 2: Evolution of training and test error for the MNIST, MNIST-Back-Rand and TIMIT tasks.\n\n4.1 Setup\n\nDatasets We evaluated SafetyNets on three classi\ufb01cations tasks. (1) Handwritten digit recognition\non the MNIST dataset, using 50,000 training, 10,000 validation and 10,000 test images. (2) A\nmore challenging version of digit recognition, MNIST-Back-Rand, an arti\ufb01cial dataset generated\nby inserting a random background into MNIST image [1]. The dataset has 10,000 training, 2,000\nvalidation and 50,000 test images. ZCA whitening is applied to the raw dataset before training and\ntesting [4]. (3) Speech recognition on the TIMIT dataset, split into a training set with 462 speakers,\na validation set with 144 speakers and a testing set with 24 speakers. The raw audio samples are\npre-processed as described by [3]. Each example includes its preceding and succeeding 7 frames,\nresulting in a 1845-dimensional input in total. During testing, all labels are mapped to 39 classes [11]\nfor evaluation.\n\nNeural Networks For the two MNIST tasks, we used a convolutional neural network same as [23]\nwith 2 convolutional layers with 5 \u00d7 5 \ufb01lters, a stride of 1 and a mapcount of 16 and 32 for the\n\ufb01rst and second layer respectively. Each convolutional layer is followed by quadratic activations\nand 2 \u00d7 2 sum pooling with a stride of 2. The fully connected layer uses softmax activation. We\nrefer to this network as CNN-2-Quad. For TIMIT, we use a four layer network described by [3]\nwith 3 hidden, fully connected layers with 2000 neurons and quadratic activations. The output layer\nis fully connected with 183 output neurons and softmax activation. We refer to this network as\nFcNN-3-Quad. Since quadratic activations are not commonly used, we compare the performance\nof CNN-2-Quad and FcNN-3-Quad with baseline versions in which the quadratic activations are\nreplaced by ReLUs. The baseline networks are CNN-2-ReLU and FcNN-3-ReLU.\nThe hyper-parameters for training are selected based on the validation datasets. The Adam Optimizer\nis used for CNNs with learning rate 0.001, exponential decay and dropout probability 0.75. The\nAdaGrad optimizer is used for FcNNs with a learning rate of 0.01 and dropout probability 0.5. We\nfound that norm gradient clipping was required for training the CNN-2-Quad and FcNN-3-Quad\nnetworks, since the gradient values for quadratic activations can become large.\nOur implementation of SafetyNets uses Thaler\u2019s code for the IP protocol for matrix multiplication\n[18] and our own implementation of the IP for quadratic activations. We use an Intel Core i7-4600U\nCPU running at 2.10 GHz for benchmarking.\n\n4.2 Classi\ufb01cation Accuracy of SafetyNets\n\nSafetyNets places certain restrictions on the activation function (quadratic) and requires weights\nand inputs to be integers (in \ufb01eld Fp). We begin by analyzing how (and if) these restrictions impact\nclassi\ufb01cation accuracy/error. Figure 2 compares training and test error of CNN-2-Quad/FcNN-3-Quad\nversus CNN-2-ReLU/FcNN-3-ReLU. For all three tasks, the networks with quadratic activations are\ncompetitive with networks that use ReLU activations. Further, we observe that the networks with\nquadratic activations appear to converge faster during training, possibly because their gradients are\nlarger despite gradient clipping.\nNext, we used the scaling and rounding strategy proposed in Section 3.3 to convert weights and\ninputs to integers. Table 1 shows the impact of scaling factors \u03b1 and \u03b2 on the classi\ufb01cation error and\nmaximum values observed in the network during inference for MNIST-Back-Rand. The validation\n\n7\n\n 0 0.5 1 1.5 2 2.5 200 400 600 800 1000 1200Error (%)Time (s)CNN-2-ReLU TrainCNN-2-ReLU TestCNN-2-Quad TrainCNN-2-Quad Test 0 2 4 6 8 10 0 200 400 600 800 1000 1200Error (%)Time (s)CNN-2-ReLU TrainCNN-2-ReLU TestCNN-2-Quad TrainCNN-2-Quad Test 10 20 30 40 50 60 70 80 10000 20000 30000 40000Error (%)Time (s)FcNN-3-ReLU TrainFcNN-3-ReLU TestFcNN-3-Quad TrainFcNN-3-Quad Test\fTable 1: Validation error and maximum value observed in the network for MNIST-Rand-Back and\ndifferent values of scaling parameters, \u03b1 and \u03b2. Shown in bold red font are values of \u03b1 and \u03b2 that are\ninfeasible because the maximum value exceeds that allowed by prime p = 261 \u2212 1.\nErr\n0.04\n0.037\n0.035\n0.036\n0.036\n\n\u03b1 = 32\nMax\n6.6 \u00d7 1014\n1.0 \u00d7 1016\n1.6 \u00d7 1017\n2.6 \u00d7 1018\n4.2 \u00d7 1019\n\n\u03b1 = 64\nMax\n8.8 \u00d7 1016\n1.3 \u00d7 1018\n2.1 \u00d7 1019\n3.5 \u00d7 1020\n5.6 \u00d7 1021\n\n\u03b1 = 16\nMax\n5.5 \u00d7 1012\n8.3 \u00d7 1013\n1.3 \u00d7 1015\n2.1 \u00d7 1016\n3.4 \u00d7 1017\n\nErr\n0.042\n0.039\n0.036\n0.038\n0.038\n\n\u03b1 = 8\nMax\n4.0 \u00d7 1010\n6.9 \u00d7 1011\n1.1 \u00d7 1013\n1.7 \u00d7 1014\n2.8 \u00d7 1015\n\nErr\n0.073\n0.072\n0.072\n0.073\n0.073\n\nErr\n0.039\n0.038\n0.037\n0.037\n0.037\n\n\u03b2\n4\n8\n16\n32\n64\n\n\u03b1 = 4\nMax\n4.0 \u00d7 108\n6.1 \u00d7 109\n9.4 \u00d7 1010\n1.5 \u00d7 1012\n2.5 \u00d7 1013\n\nErr\n0.188\n0.194\n0.188\n0.186\n0.185\n\nerror drops as \u03b1 and \u03b2 are increased. On the other hand, for p = 261 \u2212 1, the largest value allowed is\n1.35 \u00d7 1018; this rules out \u03b1 and \u03b2 greater than 64, as shown in the table. For MNIST-Back-Rand,\nwe pick \u03b1 = \u03b2 = 16 based on validation data, and obtain a test error of 4.67%. Following a similar\nmethodology, we obtain a test error of 0.63% for MNIST (p = 261 \u2212 1) and 25.7% for TIMIT\n(p = 2127 \u2212 1). We note that SafetyNets does not support techniques such as Maxout [10] that have\ndemonstrated lower error on MNIST (0.45%). Ba et al. [3] report an error of 18.5% for TIMIT using\nan ensemble of nine deep neural networks, which SafetyNets might be able to support by verifying\neach network individually and performing ensemble averaging at the client-side.\n4.3 Veri\ufb01er and Prover Run-times\n\nThe relevant performance metrics for SafetyNets are (1)\nthe client\u2019s (or veri\ufb01er\u2019s) run-time, (2) the server\u2019s run-\ntime which includes baseline time to execute the neural\nnetwork and overhead of generating proofs, and (3) the\nbandwidth required by the IP protocol. Ideally, these quan-\ntities should be small, and importantly, the client\u2019s run-\ntime should be smaller than the case in which it executes\nthe network by itself. Figure 3 plots run-time data over\ninput batch sizes ranging from 256 to 2048 for FcNN-\nQuad-3.\nFor FcNN-Quad-3, the client\u2019s time for verifying proofs\nis 8\u00d7 to 82\u00d7 faster than the baseline in which it executes\nFigure 3: Run-time of veri\ufb01er, prover\nFcNN-Quad-3 itself, and decreases with batch size. The\nand baseline execution time for the arith-\nincrease in the server\u2019s execution time due to the over-\nmetic circuit representation of FcNN-\nhead of generating proofs is only 5% over the baseline\nQuad-3 versus input batch size.\nunveri\ufb01ed execution of FcNN-Quad-3. The prover and\nveri\ufb01er exchange less than 8 KBytes of data during the IP protocol for a batch size of 2048, which is\nnegligible (less than 2%) compared to the bandwidth required to communicate inputs and outputs\nback and forth. In all settings, the soundness error \u0001, i.e., the chance that the veri\ufb01er fails to detect\nincorrect computations by the server is less than 1\n230 , a negligible value. We note SafetyNets has\nsigni\ufb01cantly lower bandwidth costs compared to an approach that separately veri\ufb01es the execution of\neach layer using only the IP protocol for matrix multiplication.\nA closely related technique, CryptoNets [8], uses homomorphic encryption to provide privacy, but not\nintegrity, for neural networks executing in the cloud. Since SafetyNets and CryptoNets target different\nsecurity goals a direct comparison is not entirely meaningful. However, from the data presented in\nthe CryptoNets paper, we note that the client\u2019s run-time for MNIST using a CNN similar to ours and\nan input batch size b = 4096 is about 600 seconds, primarily because of the high cost of encryptions.\nFor the same batch size, the client-side run-time of SafetyNets is less than 10 seconds. Recent work\nhas also looked at how neural networks can be trained in the cloud without compromising the user\u2019s\ntraining data [14], but the proposed techniques do not guarantee integrity. We expect that SafetyNets\ncan be extended to address the veri\ufb01able neural network training problem as well.\n5 Conclusion\nIn this paper, we have presented SafetyNets, a new framework that allows a client to provably verify\nthe correctness of deep neural network based inference running on an untrusted clouds. Building\nupon the rich literature on interactive proof systems for verifying general-purpose and specialized\ncomputations, we designed and implemented a specialized IP protocol tailored for a certain class\n\n8\n\n 0.1 1 10 100 10002829210211212Running Time (s)Input Batch SizeFcNN-Quad-3 Exe TimeAdditional Prover TimeVerifier Time\fof deep neural networks, i.e., those that can be represented as arithmetic circuits. We showed that\nplacing these restrictions did not impact the accuracy of the networks on real-world classi\ufb01cation\ntasks like digit and speech recognition, while enabling a client to veri\ufb01ably outsource inference\nto the cloud at low-cost. For our future work, we will apply SafetyNets to deeper networks and\nextend it to address both integrity and privacy. There are VC techniques [17] that guarantee both, but\ntypically come at higher costs. Further, building on prior work on the use of IPs to build veri\ufb01able\nhardware [20], we intend to deploy the SafetyNets protocol in the design of a veri\ufb01able hardware\naccelerator for neural network inference.\n\nReferences\n[1] Variations on the MNIST digits. http://www.iro.umontreal.ca/~lisa/twiki/bin/\n\nview.cgi/Public/MnistVariations.\n\n[2] S. Arora and B. Barak. Computational complexity: a modern approach. Cambridge University\n\nPress, 2009.\n\n[3] J. Ba and R. Caruana. Do deep nets really need to be deep? In Advances in Neural Information\n\nProcessing Systems, pages 2654\u20132662, 2014.\n\n[4] A. Coates, A. Ng, and H. Lee. An analysis of single-layer networks in unsupervised feature\nlearning. In International Conference on Arti\ufb01cial Intelligence and Statistics, pages 215\u2013223,\n2011.\n\n[5] G. Cormode, J. Thaler, and K. Yi. Verifying computations with streaming interactive proofs.\n\nProceedings of the Very Large Database Endowment, pages 25\u201336, 2011.\n\n[6] A. Gautier, Q. N. Nguyen, and M. Hein. Globally optimal training of generalized polynomial\nneural networks with nonlinear spectral methods. In Advances in Neural Information Processing\nSystems, pages 1687\u20131695, 2016.\n\n[7] R. Gennaro, C. Gentry, and B. Parno. Non-interactive veri\ufb01able computing: Outsourcing\n\ncomputation to untrusted workers. Annual Cryptology Conference, pages 465\u2013482, 2010.\n\n[8] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing. Cryptonets:\nApplying neural networks to encrypted data with high throughput and accuracy. In International\nConference on Machine Learning, pages 201\u2013210, 2016.\n\n[9] S. Goldwasser, Y. T. Kalai, and G. N. Rothblum. Delegating computation: interactive proofs for\n\nmuggles. Symposium on Theory of Computing, pages 113\u2013122, 2008.\n\n[10] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks.\n\narXiv preprint arXiv:1302.4389, 2013.\n\n[11] K. Lee and H. Hon. Speaker-independent phone recognition using hidden markov models.\n\nIEEE Transactions on Acoustics, Speech, and Signal Processing, pages 1641\u20131648, 1989.\n\n[12] R. Livni, S. Shalev-Shwartz, and O. Shamir. On the computational ef\ufb01ciency of training neural\n\nnetworks. In Advances in Neural Information Processing Systems, pages 855\u2013863, 2014.\n\n[13] C. Lund, L. Fortnow, H. Karloff, and N. Nisan. Algebraic methods for interactive proof systems.\n\nJournal of the ACM, pages 859\u2013868, 1992.\n\n[14] P. Mohassel and Y. Zhang. Secureml: A system for scalable privacy-preserving machine\n\nlearning. IACR Cryptology ePrint Archive, 2017.\n\n[15] F. Monrose, P. Wyckoff, and A. D. Rubin. Distributed execution with remote audit. In Network\n\nand Distributed System Security Symposium, pages 3\u20135, 1999.\n\n[16] N. Papernot, P. McDaniel, A. Sinha, and M. Wellman. Towards the science of security and\n\nprivacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.\n\n[17] B. Parno, J. Howell, C. Gentry, and M. Raykova. Pinocchio: Nearly practical veri\ufb01able\n\ncomputation. In Symposium on Security and Privacy, pages 238\u2013252, 2013.\n\n9\n\n\f[18] J. Thaler. Time-optimal interactive proofs for circuit evaluation. In International Cryptology\n\nConference, pages 71\u201389, 2013.\n\n[19] V. Vu, S. Setty, A. J. Blumberg, and M. Wal\ufb01sh. A hybrid architecture for interactive veri\ufb01able\n\ncomputation. In Symposium on Security and Privacy, pages 223\u2013237, 2013.\n\n[20] R. S. Wahby, M. Howald, S. Garg, A. Shelat, and M. Wal\ufb01sh. Veri\ufb01able asics. In Symposium\n\non Security and Privacy, pages 759\u2013778, 2016.\n\n[21] M. Wal\ufb01sh and A. J. Blumberg. Verifying computations without reexecuting them. Communi-\n\ncations of the ACM, pages 74\u201384, 2015.\n\n[22] M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural\n\nnetworks. arXiv preprint arXiv:1301.3557, 2013.\n\n[23] Y. Zhang, P. Liang, and M. J. Wainwright. Convexi\ufb01ed convolutional neural networks. arXiv\n\npreprint arXiv:1609.01000, 2016.\n\nProof of Lemma 3.1\n\nLemma 3.1 The SafetyNets veri\ufb01er rejects incorrect computations with probability greater than\n\n(1 \u2212 \u0001) where \u0001 = 3b(cid:80)L\n\ni=0 ni\np\n\nis the soundness error.\n\nProof. Verifying a multi-linear extension of the output sampled at a random point, instead of each\nvalue adds a soundness error of \u0001 = bnL\np . Each instance of the sum-check protocol adds to the\nsoundness error [19]. The IP protocol for matrix multiplication adds a soundness error of \u0001 = 2ni\u22121\nin Layer i [18]. Finally, the IP protocol for quadratic activations adds a soundness error of \u0001 = 3bni\np\n. The\n\nin Layer i [18]. Summing together we get a total soundness error of 2(cid:80)L\u22121\n\ni=0 ni+3(cid:80)L\u22121\n\ni=1 bni+bnL\n\np\n\np\n\n\ufb01nal result is an upper bound on this value.\n\nHandling Bias Variables\n\nWe assumed that the bias variables were zero, allowing us to write bmzi = wi.yi while it should be\nbmzi = wi.yi + bi1T . Let z(cid:48)\ni = wi.yi We seek to convert an assertion on \u02dcZi(qi, ri) to an assertion\non \u02dcZ(cid:48)\n\ni. We can do so by noting that:\n\n\u02dcZi(qi, ri) =\n\n\u02dcI(j, qi)( \u02dcZ(cid:48)\n\ni(j, ri) + \u02dcBi(j))\n\n(8)\n\n(cid:88)\n\nj\u2208{0,1}log(ni)\n\nwhich can be reduced to sum-check and thus yields an assertion on \u02dcBi which the veri\ufb01er checks\nlocally and \u02dcZ(cid:48)\n\ni, which is passed to the IP protocol for matrix multiplication.\n\n10\n\n\f", "award": [], "sourceid": 2442, "authors": [{"given_name": "Zahra", "family_name": "Ghodsi", "institution": "New York University"}, {"given_name": "Tianyu", "family_name": "Gu", "institution": "NYU"}, {"given_name": "Siddharth", "family_name": "Garg", "institution": "NYU"}]}