{"title": "Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks", "book": "Advances in Neural Information Processing Systems", "page_first": 4714, "page_last": 4723, "abstract": "With substantial amount of time, resources and human (team) efforts invested to explore and develop successful deep neural networks (DNN), there emerges an urgent need to protect these inventions from being illegally copied, redistributed, or abused without respecting the intellectual properties of legitimate owners. Following recent progresses along this line, we investigate a number of watermark-based DNN ownership verification methods in the face of ambiguity attacks, which aim to cast doubts on the ownership verification by forging counterfeit watermarks. It is shown that ambiguity attacks pose serious threats to existing DNN watermarking methods. As remedies to the above-mentioned loophole, this paper proposes novel passport-based DNN ownership verification schemes which are both robust to network modifications and resilient to ambiguity attacks. The gist of embedding digital passports is to design and train DNN models in a way such that, the DNN inference performance of an original task will be significantly deteriorated due to forged passports. In other words, genuine passports are not only verified by looking for the predefined signatures, but also reasserted by the unyielding DNN model inference performances. Extensive experimental results justify the effectiveness of the proposed passport-based DNN ownership verification schemes. Code and models are available at https://github.com/kamwoh/DeepIPR", "full_text": "Rethinking Deep Neural Network\n\nOwnership Veri\ufb01cation: Embedding Passports to\n\nDefeat Ambiguity Attacks\n\nLixin Fan1 Kam Woh Ng2 Chee Seng Chan2\n\n1WeBank AI Lab, Shenzhen, China\n\n2Center of Image and Signal Processing, Faculty of Comp. Sci. and Info., Tech.\n\nUniversity of Malaya, Kuala Lumpur, Malaysia\n\n{lixinfan@webank.com;kamwoh@siswa.um.edu.my;cs.chan@um.edu.my}\n\nAbstract\n\nWith substantial amount of time, resources and human (team) efforts invested to\nexplore and develop successful deep neural networks (DNN), there emerges an\nurgent need to protect these inventions from being illegally copied, redistributed, or\nabused without respecting the intellectual properties of legitimate owners. Follow-\ning recent progresses along this line, we investigate a number of watermark-based\nDNN ownership veri\ufb01cation methods in the face of ambiguity attacks, which aim\nto cast doubts on the ownership veri\ufb01cation by forging counterfeit watermarks. It\nis shown that ambiguity attacks pose serious threats to existing DNN watermarking\nmethods. As remedies to the above-mentioned loophole, this paper proposes novel\npassport-based DNN ownership veri\ufb01cation schemes which are both robust to\nnetwork modi\ufb01cations and resilient to ambiguity attacks. The gist of embedding\ndigital passports is to design and train DNN models in a way such that, the DNN\ninference performance of an original task will be signi\ufb01cantly deteriorated due to\nforged passports. In other words, genuine passports are not only veri\ufb01ed by looking\nfor the prede\ufb01ned signatures, but also reasserted by the unyielding DNN model\ninference performances. Extensive experimental results justify the effectiveness\nof the proposed passport-based DNN ownership veri\ufb01cation schemes. Code and\nmodels are available at https://github.com/kamwoh/DeepIPR\n\n1\n\nIntroduction\n\nWith the rapid development of deep neural networks (DNN), Machine Learning as a Service (MLaaS)\nhas emerged as a viable and lucrative business model. However, building a successful DNN is\nnot a trivial task, which usually requires substantial investments on expertise, time and resources.\nAs a result of this, there is an urgent need to protect invented DNN models from being illegally\ncopied, redistributed or abused (i.e. intellectual property infringement). Recently, for instance, digital\nwatermarking techniques have been adopted to provide such a protection, by embedding watermarks\ninto DNN models during the training stage. Subsequently, ownerships of these inventions are veri\ufb01ed\nby the detection of the embedded watermarks, which are supposed to be robust to multiple types of\nmodi\ufb01cations such as model \ufb01ne-tuning, model pruning and watermark overwriting [1\u20134].\nIn terms of deep learning methods to embed watermarks, existing approaches can be broadly cat-\negorized into two schools: a) the feature-based methods that embed designated watermarks into\nthe DNN weights by imposing additional regularization terms [1, 3, 5]; and b) the trigger-set based\nmethods that rely on adversarial training samples with speci\ufb01c labels (i.e. backdoor trigger sets)\n[2, 4]. Watermarks embedded with either of these methods have successfully demonstrated robustness\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f(a) Application Scenario.\n\n(b) Present Solution.\n\n(c) Proposed Solution.\n\nFigure 1: DNN model ownership veri\ufb01cation in the face of ambiguity attacks. (a): Owner Alice\nuses an embedding process E to train a DNN model with watermarks (T, s) and releases the model\npublicly available; Attacker Bob forges counterfeit watermarks (T(cid:48), s(cid:48)) with an invert process I;\n(b): The ownership is in doubt since both the original and forged watermarks are detected by the\nveri\ufb01cation process V (Sect. 2.2); (c): The ambiguity is resolved when our proposed passports are\nembedded and the network performances are evaluated in favour of the original passport by the\n\ufb01delity evaluation process F (See De\ufb01nition 1 and Sect. 3.3).\n\nagainst removal attacks which involve modi\ufb01cations of the DNN weights such as \ufb01ne-tuning or\npruning. However, our studies disclose the existence and effectiveness of ambiguity attacks which\naim to cast doubt on the ownership veri\ufb01cation by forging additional watermarks for DNN models in\nquestion (see Fig. 1). We also show that it is always possible to reverse-engineer forged watermarks\nat minor computational cost where the original training dataset is also not needed (Sect. 2).\nAs remedies to the above-mentioned loophole, this paper proposes a novel passport-based approach.\nThere is a unique advantage of the proposed passports over traditional watermarks - i.e. the inference\nperformance of a pre-trained DNN model will either remain intact given the presence of valid\npassports, or be signi\ufb01cantly deteriorated due to either the modi\ufb01ed or forged passports. In other\nwords, we propose to modulate the inference performances of the DNN model depending on the\npresented passports, and by doing so, one can develop ownership veri\ufb01cation schemes that are both\nrobust to removal attacks and resilient to ambiguity attacks at once (Sect. 3).\nThe contributions of our work are threefold: i) we put forth a general formulation of DNN own-\nership veri\ufb01cation schemes and, empirically, we show that existing DNN watermarking methods\nare vulnerable to ambiguity attacks; ii) we propose novel passport-based veri\ufb01cation schemes and\ndemonstrate with extensive experiment results that these schemes successfully defeat ambiguity\nattacks; iii) methodology-wise, the proposed modulation of DNN inference performance based on\nthe presented passports (Eq. 4) plays an indispensable role in bringing the DNN model behaviours\nunder control against adversarial attacks.\n\n1.1 Related work\n\nUchida et. al [1] was probably the \ufb01rst work that proposed to embed watermarks into DNN models\nby imposing an additional regularization term on the weights parameters. [2, 6] proposed to embed\nwatermarks in the classi\ufb01cation labels of adversarial examples in a trigger set, so that the watermarks\ncan be extracted remotely through a service API without the need to access the network weights\n(i.e. black-box setting). Also in both black-box and white box settings, [3, 5, 7] demonstrated how\nto embed watermarks (or \ufb01ngerprints) that are robust to various types of attacks. In particular, it\nwas shown that embedded watermarks are in general robust to removal attacks that modify network\nweights via \ufb01ne-tuning or pruning. Watermark overwriting, on the other hand, is more problematic\nsince it aims to simultaneously embed a new watermark and destroy the existing one. Although [5]\ndemonstrated robustness against overwriting attack, it did not resolve the ambiguity resulted from the\ncounterfeit watermark. Adi et al. [2] also discussed how to deal with an adversary who \ufb01ne-tuned an\nalready watermarked networks with new trigger set images. Nevertheless, [2] required the new set\nof images to be distinguishable from the true trigger set images. This requirement is however often\nunful\ufb01lled in practice, and our experiment results show that none of existing watermarking methods\nare able to deal with ambiguity attacks explored in this paper (see Sect. 2).\nIn the context of digital image watermarking, [8, 9] have studied ambiguity attacks that aim to create\nan ambiguous situation in which a watermark is reverse-engineered from an already watermarked\nimage, by taking advantage of the invertibility of forged watermarks [10].\nIt was argued that\nrobust watermarks do not necessarily imply the ability to establish ownership, unless non-invertible\nwatermarking schemes are employed (see Proposition 2 for our proposed solution).\n\n2\n\n\f2 Rethinking Deep Neural Network Ownership Veri\ufb01cation\n\nThis section analyses and generalizes existing DNN watermarking methods in the face of ambiguity\nattacks. We must emphasize that the analysis mainly focuses on three aspects i.e. \ufb01delity, robustness\nand invertibility of the ownership veri\ufb01cation schemes, and we refer readers to representative previous\nwork [1\u20134] for formulations and other desired features of the entire watermark-based intellectual\nproperty (IP) protection schemes, which are out of the scope of this paper.\n\n2.1 Reformulation of DNN ownership veri\ufb01cation schemes\n\nFigure 1 summarizes the application scenarios of DNN model ownership veri\ufb01cations provided by\nthe watermark based schemes. Inspired by [10], we also illustrate an ambiguous situation in which\nrightful ownerships cannot be uniquely resolved by the current watermarking schemes alone. This\nloophole is largely due to an intrinsic weakness of the watermark-based methods i.e. invertibility.\nFormally, the de\ufb01nition of DNN model ownership veri\ufb01cation schemes is generalized as follows.\nDe\ufb01nition 1. A DNN model ownership veri\ufb01cation scheme is a tuple V = (E, F, V, I) of processes:\n\nI) An embedding process E(cid:0)Dr, T, s, N[\u00b7], L(cid:1) = N[W, T, s], is a DNN learning process that\n\ntakes training data Dr = {Xr, yr} as inputs, and optionally, either trigger set data T =\n{XT , yT} or signature s, and outputs the model N[W, T, s] by minimizing a given loss L.\nRemark: the DNN architectures are pre-determined by N[\u00b7] and, after the DNN weights W are\nlearned, either the trigger set T or signatures s will be embedded and can be veri\ufb01ed by the\nveri\ufb01cation process de\ufb01ned next1.\n\nII) A \ufb01delity evaluation process F(cid:0)N[W,\u00b7,\u00b7], Dt,Mt, \u0001f\n\n(cid:1) = {True, False} is to evaluate whether\n\nor not the discrepancy is less than a prede\ufb01ned threshold i.e. |M(N[W,\u00b7,\u00b7], Dt) \u2212 Mt| \u2264 \u0001f ,\nin which M(N[W,\u00b7,\u00b7], Dt) is the DNN inference performance tested against a set of test data\nDt where Mt is the target inference performance.\nRemark: it is often expected that a well-behaved embedding process will not introduce a signi\ufb01-\ncant inference performance change that is greater than a prede\ufb01ned threshold \u0001f . Nevertheless,\nthis \ufb01delity condition remains to be veri\ufb01ed for DNN models under either removal attacks or\nambiguity attacks.\n\nIII) A veri\ufb01cation process V (N[W,\u00b7,\u00b7], T, s, \u0001s) = {True, False} checks whether or not the ex-\npected signature s or trigger set T is successfully veri\ufb01ed for a given DNN model N[W,\u00b7,\u00b7].\nRemark: for feature-based schemes, V involves the detection of embedded signatures s =\n{P, B} with a false detection rate that is lesser than a prede\ufb01ned threshold \u0001s. Speci\ufb01cally, the\ndetection boils down to measure the distances Df (fe(W, P), B) between target feature B and\nfeatures extracted by a transformation function fe(W, P) parameterized by P.\nRemark: for trigger-set based schemes, V \ufb01rst invokes a DNN inference process that takes\ntrigger set samples Tx as inputs, and then it checks whether the prediction f (W, XT ) produces\nthe designated labels Ty with a false detection rate that is lesser than a threshold \u0001s.\n\nIV) An invert process I(N[W, T, s]) = N[W, T(cid:48), s(cid:48)] exists and constitutes a successful ambiguity\n\nattack, if\n(a) a set of new trigger set T(cid:48) and/or signature s(cid:48) can be reverse-engineered for a given DNN\n(b) the forged T(cid:48), s(cid:48) can be successfully veri\ufb01ed with respect to the given DNN weights W\n\nmodel;\ni.e. V (I(N[W, T, s]), T(cid:48), s(cid:48), \u0001s) = True;\n\n(c) the \ufb01delity evaluation outcome F(cid:0)N[W,\u00b7,\u00b7], Dt,Mt, \u0001f\n\n(cid:1) de\ufb01ned in De\ufb01nition 1.II re-\n\nmains True.\nRemark: this condition plays an indispensable role in designing the non-invertible veri\ufb01ca-\ntion schemes to defeat ambiguity attacks (see Sect. 3.3).\n\n1Learning hyper-parameters such as learning rate and the type of optimization methods are considered\n\nirrelevant to ownership veri\ufb01cations, and thus they are not included in the formulation.\n\n3\n\n\fFeature based method [1]\n\nTrigger-set based method [2]\n\nCIFAR10\n\nReal WM Det.\n\nFake WM Det.\n\nCIFAR10\n\nFake WM Det.\n\n100 (100)\n100 (100)\n\n64.25 (90.97)\n74.08 (90.97)\n\nCIFAR100\nCaltech-101\nTable 1: Detection of embedded watermark (in %) with two representative watermark-based DNN\nmethods [1, 2], before and after DNN weights \ufb01ne-tuning for transfer learning tasks. Top row denotes\na DNN model trained with CIFAR10 and weights \ufb01ne-tuned for CIFAR100; while bottom row\ndenotes weight \ufb01ne-tuned for Caltech-101. Accuracy outside bracket is the transferred task, while\nin-bracket is the original task. WM Det. denotes the detection accuracies of real and fake watermarks.\n\n65.20 (91.03)\n75.06 (91.03)\n\n27.80 (100)\n46.80 (100)\n\n100 (100)\n100 (100)\n\nReal WM Det.\n25.00 (100)\n43.60 (100)\n\nV) If at least one invert process exists for a DNN veri\ufb01cation scheme V, then the scheme is called\nan invertible scheme and denoted by V I = (E, F, V, I (cid:54)= \u2205); otherwise, the scheme is called\nnon-invertible and denoted by V\u2205 = (E, F, V,\u2205).\n\nThe de\ufb01nition as such is abstract and can be instantiated by concrete implementations of processes\nand functions. For instance, the following combined loss function (Eq. 1) generalizes loss functions\nadopted by both the feature-based and trigger-set based watermarking methods:\n\n(cid:0)f (W, Xr), yr\n\n(cid:1) + \u03bbtLc\n\n(cid:0)f (W, XT ), yT\n\n(cid:1) + \u03bbrR(W, s),\n\nL = Lc\n\n(1)\n\nin which \u03bbt, \u03bbr are the relative weight hyper-parameters, f (W, X\u2212) are the network predictions\nwith inputs Xr or XT . Lc is the loss function like cross-entropy that penalizes discrepancies between\nthe predictions and the target labels yr or yT . Signature s = {P, B} consists of passports P and\nsignature string B. The regularization terms could be either R = Lc(\u03c3(W, P), B) as in [1] or\nR = M SE(B \u2212 PW) as in [3].\nIt must be noted that, for those DNN models that will be used for classi\ufb01cation tasks, their inference\nperformance M(N[W,\u00b7,\u00b7], Dt) = Lc\nindependent of either the embedded signature s or trigger set T. It is this independence that induces\nan invertible process for existing watermark-based methods as described next.\nProposition 1 (Invertible process). For a DNN ownership veri\ufb01cation scheme V as in De\ufb01nition 1,\nif the \ufb01delity process F () is independent of either the signature s or trigger set T, then there always\nexists an invertible process I() i.e. the scheme is invertible V I = (E, F, V, I (cid:54)= \u2205)).\n\n(cid:1) tested against a dataset Dt = {Xt, yt} is\n\n(cid:0)f (W, Xt), yt\n\n2.2 Watermarking in the face of ambiguity attacks\n\nAs proved by Proposition 1, one is able to construct forged watermarks for any already watermarked\nnetworks. We tested the performances of two representative DNN watermarking methods [1, 2],\nand Table 1 shows that counterfeit watermarks can be forged for the given DNN models with 100%\ndetection rate, and 100% fake trigger set images can be reconstructed as well in the original task.\nGiven that the detection accuracies for the forged trigger set is slightly better than the original trigger\nset after \ufb01ne-tuning, the claim of the ownership is ambiguous and cannot be resolved by neither\nfeature-based nor trigger-set based watermarking methods. Shockingly, the computational cost to\nforge counterfeit watermarks is quite minor where the forging required no more than 100 epochs to\noptimize, and worst still this is achieved without the need of original training data.\nIn summary, the ambiguity attacks against DNN watermarking methods are effective with minor\ncomputational and without the need of original training datasets. We ascribe this loophole to the crux\nthat the loss of the original task, i.e. Lc\nWe refer readers to our extended version [11] for an elaboration on the ambiguity attack method we\nadopted and more detailed experiment results. In the next section, we shall illustrate a solution to\ndefeat the ambiguity attacks.\n\n(cid:1) is independent of the forged watermarks.\n\n(cid:0)f ( \u02c6W, Xr), yr\n\n3 Embedding passports for DNN ownership veri\ufb01cation\n\nThe main motivation of embedding digital passports is to design and train DNN models in a way such\nthat, their inference performances of the original task (i.e. classi\ufb01cation accuracy) will be signi\ufb01cantly\ndeteriorated due to the forged signatures. We shall illustrate next \ufb01rst how to implement the desired\nproperty by incorporating the so called passport layers, followed by different ownership protection\nschemes that exploit the embedded passports to effectively defeat ambiguity attacks.\n\n4\n\n\f\u03b3, pl\n\n(a) An example in the ResNet layer that consists of\nthe proposed passporting layers. pl = {pl\n\u03b2} is the\np \u2217\nproposed digital passports where F = Avg(Wl\n\u03b3,\u03b2) is a passport function to compute the hidden\nPl\nparameters (i.e. \u03b3 and \u03b2) given in Eq. (2).\nFigure 2: (a) Passport layers in ResNet architecture and (b) Classi\ufb01cation accuracies modulated by\ndifferent passports in CIFAR10, e.g. given counterfeit passports, the DNN models performance will\nbe deteriorated instantaneously to fend off illegal usage.\n\n(b) A comparison of CIFAR10 classi\ufb01cation accuracies\ngiven the original DNN, proposed DNN with valid\npassports, proposed DNN with randomly generated\npassports (f ake1), and proposed DNN with reverse-\nengineered passports (f ake2).\n\n3.1 Passport layers\n\nIn order to control the DNN model functionalities by the embedded digital signatures i.e. passports,\nwe proposed to append after a convolution layer a passport layer, whose scale factor \u03b3 and bias shift\nterm \u03b2 are dependent on both the convolution kernels Wp and the designated passport P as follows:\n\np \u2217 Xl\n\u03b2l = Avg(Wl\n\n\u03b3l = Avg(Wl\n\np + \u03b2l = \u03b3l(Wl\nOl(Xp) = \u03b3lXl\np \u2217 Pl\n\u03b3),\n\n(2)\n(3)\nin which \u2217 denotes the convolution operations, l is the layer number, Xp is the input to the passport\nlayer and Xc is the input to the convolution layer. O() is the corresponding linear transformation of\n\u03b2 are the passports used to derive scale factor and bias term respectively.\noutputs, while Pl\nFig. 2a delineates the architecture of digital passport layers used in a ResNet layer.\nRemark: for DNN models trained with passport se = {Pl\n\u03b2}l, their inference performances\n\u03b3, Pl\nM(N[W, se], Dt, st) depend on the running time passports st i.e.\n\nc) + \u03b2l,\np \u2217 Pl\n\u03b2),\n\n\u03b3 and Pl\n\n(cid:26) Mse ,\n\nif st = se,\nMse , otherwise.\n\nM(N[W, se], Dt, st) =\n\n(4)\nIf the genuine passport is not presented st (cid:54)= se, the running time performance Mse is signi\ufb01cantly\ndeteriorated because the corresponding scale factor \u03b3 and bias terms \u03b2 are calculated based on the\nwrong passports. For instance, as shown in Fig. 2b, a proposed DNN model presented with valid\npassports (green) will demonstrate almost identical accuracies as to the original DNN model (red). In\ncontrast, the same proposed DNN model presented with counterfeit passports (blue), the accuracy\nwill deteriorate to merely about 10% only.\nRemark: the gist of the proposed passport layer is to enforce dependence between scale factor, bias\nterms and network weights. As shown by the Proposition 2, it is this dependence that validates the\nrequired non-invertibility to defeat ambiguity.\nProposition 2 (Non-invertible process). A DNN ownership veri\ufb01cation scheme V as in De\ufb01nition 1\nis non-invertible, if\n\nI) the \ufb01delity process outcome F(cid:0)N[W, T, s], Dt,Mt, \u0001f\n\n(cid:1) depends either on the presented sig-\n\nnature s or trigger set T,\n\nII) with forged passport st (cid:54)= se, the DNN inference performance M(N[W, se], Dt, st) in (Eq. 4)\n\nwill deteriorate such that the discrepancy is larger than a threshold i.e. |Mse \u2212 Mse| > \u0001f .\n\n5\n\n\f3.2 Sign of scale factors as signature\n\nDuring learning the DNN, to further protect the DNN models ownership from insider threat (e.g. a\nformer staff who establish a new start-up business with all the resources stolen from originator), one\ncan enforce the scale factor \u03b3 to take either positive or negative signs (+/-) as designated, so that it\nwill form a unique signature string (like \ufb01ngerprint). This process is done by adding the following\nsign loss regularization term into the combined loss (Eq. 1):\n\nC(cid:88)\n\nR(\u03b3, P, B) =\n\nmax(\u03b30 \u2212 \u03b3ibi, 0)\n\n(5)\n\ni=1\n\nin which B = {b1,\u00b7\u00b7\u00b7 , bC} \u2208 {\u22121, 1}C consists of the designated binary bits for C convolution\nkernels, and \u03b30 is a positive control parameter (0.1 by default unless stated otherwise) to encourage\nthe scale factors have magnitudes greater than \u03b30.\nIt must be highlighted that the inclusion of sign loss (Eq. 5) enforces the scale factors \u03b3 to take either\npositive or negative values, and the signs enforced in this way remain rather persistent against various\nadversarial attacks. This feature explains the superior robustness of embedded passports against\nambiguity attacks by reverse-engineering shown in Sect. 4.2.\n\n3.3 Ownership veri\ufb01cation with passports\n\nTaking advantages of the proposed passport-based approach, we design three new ownership veri\ufb01ca-\ntion schemes V that are summarized next and refer readers to Sect. 4 for the experiment results.\nV1: Passport is distributed with the trained DNN model\nHereby, the learning process aims to minimize the combined loss function (Eq. 1), in which \u03bbt = 0\nsince trigger set images are not used in this scheme and the sign loss (Eq. 5) is added as the\nregularization term. The trained DNN model together with the passport are then distributed to\nlegitimate users, who perform network inferences with the given passport fed to the passport layers\nas shown in Fig. 2a. The network ownership is automatically veri\ufb01ed by the distributed passports. As\nshown in Table 2 and Fig. 3, this ownership veri\ufb01cation is robust to DNN model modi\ufb01cations. Also,\nas shown in Fig. 4, ambiguity attacks are not able to forge a set of passport and signature that can\nmaintain the DNN inference performance.\nThe downside of this scheme is the requirement to use passports during inferencing, which leads\nto extra computational cost by about 10% (see Sect. 4.3). Also the distribution of passports to the\nend-users is intrusive and imposes additional responsibility of guarding the passports safely.\nV2: Private passport is embedded but not distributed\nHerein, the learning process aims to simultaneously achieve two goals, of which the \ufb01rst is to\nminimize the original task loss (e.g. classi\ufb01cation accuracy discrepancy) when no passport layers\nincluded; and the second is to minimize the combined loss function (Eq. 1) with passports regu-\nlarization included. Algorithm-wise, this multi-task learning is achieved by alternating between\nthe minimization of these two goals. The successfully trained DNN model is then distributed to\nend-users, who may perform network inference without the need of passports. Note that this is\npossible since passport layers are not included in the distributed networks. The ownership veri\ufb01cation\nwill be carried out only upon requested by the law enforcement, by adding the passport layers to the\nnetwork in question and detecting the embedded sign signatures with unyielding the original network\ninference performances.\nCompared with scheme V1, this scheme is easy to use for end-users since no passport is needed and\nno extra computational cost is incurred. In the meantime, this ownership veri\ufb01cation is robust to\nremoval attacks as well as ambiguity attacks. The downside, however, is the requirement to access the\nDNN weights and to append the passport layers for ownership veri\ufb01cation, i.e. the disadvantages of\nwhite-box protection mode as discussed in [2]. Therefore, we propose to combine it with trigger-set\nbased veri\ufb01cation that will be described next.\nV3: Both the private passport and trigger set are embedded but not distributed\nThis scheme only differs from scheme V2 in that, a set of trigger images is embedded in addition to\nthe embedded passports. The advantage of this, as discussed in [2] is to probe and claim ownership\n\n6\n\n\fCIFAR10\nCIFAR100\n- (65.53)\n100 (64.64)\n- (62.17)\n99.91 (59.31)\n99.96 (59.41)\n\nCaltech-101\n- (76.33)\n100 (73.03)\n- (73.28)\n100 (70.87)\n100 (71.37)\n\nCIFAR10\n- (91.12)\n100 (90.91)\n- (90.88)\n100 (89.44)\n100 (89.15)\n\n- (94.85)\n100 (94.62)\n- (93.65)\n100 (93.41)\n100 (93.26)\n\nAlexNetp\nBaseline (BN)\nScheme V1\nBaseline (GN)\nScheme V2\nScheme V3\nResNetp-18\nBaseline (BN)\n- (82.88)\nScheme V1\n99.99 (79.27)\nBaseline (GN)\n- (79.15)\nScheme V2\n100 (77.34)\nScheme V3\n100 (77.46)\nTable 2: Removal Attack (Fine-tuning): Detection/Classi\ufb01cation accuracy (in %) of different passport\nnetworks where BN = batch normalisation and GN = group normalisation. (Left: trained with\nCIFAR10 and \ufb01ne-tune for CIFAR100/Caltech-101. Right: trained with CIFAR100 and \ufb01ne-tune for\nCIFAR10/Caltech-101.) Accuracy outside bracket is the signature detection rate, while in-bracket is\nthe classi\ufb01cation rate.\n\nAlexNetp\nBaseline (BN)\nScheme V1\nBaseline (GN)\nScheme V2\nScheme V3\nResNetp-18\nBaseline (BN)\nScheme V1\nBaseline (GN)\nScheme V2\nScheme V3\n\nCaltech-101\n- (79.66)\n100 (78.83)\n- (78.08)\n100 (76.31)\n100 (75.89)\n\n- (72.62)\n100 (69.63)\n- (69.40)\n100 (63.84)\n99.98 (63.61)\n\n- (78.98)\n100 (72.13)\n- (75.08)\n100 (71.07)\n99.99 (72.00)\n\nCIFAR100\n- (68.26)\n100 (68.31)\n- (65.09)\n100 (64.09)\n100 (63.67)\n\n- (76.25)\n100 (75.52)\n- (72.06)\n100 (72.15)\n100 (72.10)\n\nCIFAR100\nCIFAR10\n- (89.46)\n100 (89.07)\n- (88.30)\n100 (87.47)\n100 (87.46)\n\n- (93.22)\n100 (95.28)\n- (91.83)\n100 (90.94)\n100 (91.30)\n\nof the suspect DNN model through remote calls of service APIs. This capability allows one, \ufb01rst\nto claim the ownership in a black-box mode, followed by reassertion of ownership with passport\nveri\ufb01cation in a white box mode. Algorithm-wise, the embedding of trigger set images is jointly\nachieved in the same minimization process that embeds passports in scheme V2. Finally, it must be\nnoted that the embedding of passports in both V2 and V3 schemes are implemented through multi-task\nlearning tasks where we adopted group normalisation [12] instead of batch normalisation [13] that is\nnot applicable due to its dependency on running average of batch-wise training samples.\n\n4 Experiment results\n\nThis section illustrates the experiment results of passport-based DNN models whereas the inference\nperformances of various schemes are compared in terms of robustness to both removal attacks and\nambiguity attacks. The network architectures we investigated include the well-known AlexNet\nand ResNet-18, which are tested with typical CIFAR10 and CIFAR100 classi\ufb01cation tasks. These\nmedium-sized public datasets allow us to perform extensive tests of the DNN model performances.\nUnless stated otherwise, all experiments are repeated 5 times and tested against 50 fake passports to\nget the mean inference performance. Also, to avoid confusion to the original AlexNet and ResNet\nmodels, we denote AlexNetp and ResNetp-18 as our proposed passport-based DNN models.\n\n4.1 Robustness against removal attacks\n\nFine-tuning\nTable 2 shows that the signatures are detected at near to 100% accuracy for all the ownership\nveri\ufb01cation schemes in the original task. Even after \ufb01ne-tuning the proposed DNN models for a\nnew task (e.g. from CIFAR10 to Caltech-101), almost 100% accuracy are still maintained. Note\nthat a detected signature is claimed only iff all the binary bits are exactly matched. We ascribe\nthis superior robustness to the unique controlling nature of the scale factors \u2014 in case that a scale\nfactor value is reduced near to zero, the channel output will be virtually zero, thus, its gradient will\nvanish and lose momentum to move towards to the opposite value. Empirically we have not observed\ncounter-examples against this explanation2.\nModel pruning\nThe aim of model pruning is to reduce redundant parameters without compromise the performance.\nHere, we adopt the class-blind pruning scheme in [14], and test our proposed DNN models with\ndifferent pruning rates. Figure 3 shows that, in general, our proposed DNN models still maintained\nnear to 100% accuracy even 60% parameters are pruned, while the accuracy of testing data drops\naround 5%-25%. Even if we prune 90% parameters, the accuracy of our proposed DNN models are\nstill much higher than the accuracy of testing data. As said, we ascribe the robustness against model\npruning to the superior persistence of signatures embedded in the scale factor signs (see Sect. 3.2).\n\n2A rigorous proof of this argument is under investigation and will be reported elsewhere.\n\n7\n\n\f(a) AlexNetp\n\n(b) ResNetp-18\n\nFigure 3: Removal Attack (Model Pruning): Classi\ufb01cation accuracy of our passport-based DNN\nmodels on both CIFAR10/CIFAR100 and signature detection accuracy against different pruning rates.\n\n(a) AlexNetp. (Left) CIFAR10, (Right) CIFAR100.\n(b) ResNetp-18. (Left) CIFAR10, (Right) CIFAR100.\nFigure 4: Ambiguity Attack: Classi\ufb01cation accuracy of our passport networks with valid passport,\nrandom attack (f ake1) and reversed-engineering attack (f ake2) on CIFAR10 and CIFAR100.\n\nAmbiguity attack\n\nmodes\nf ake1\nf ake2\n\nAttackers have\n\naccess to\n\nW\n\nW , {Dr;Dt}\n\nf ake3\n\nW , {Dr;Dt},\n\n{P , S}\n\nAmbiguous passport\nconstruction methods\n\nInvertibility\n(see Def. 1.V)\n\n- Random passport Pr\n- Reverse engineer passport Pe\n- Reverse engineer passport {Pe;Se}\nby exploiting original passport P\n& sign string S\n\n- F (Pr) fail, by large margin\n- F (Pe) fail, by moderate margin\n- if Se = S:\nF (Pe) pass, with negligible margin\n- if Se (cid:54)= S:\nF (Pe) fail, by moderate to huge margin\n\nV1\n\nVeri\ufb01cation scheme\nVeri\ufb01cation scheme\nLarge accuracy \u2193\nLarge accuracy \u2193\nModerate accuracy \u2193 Moderate accuracy \u2193 Moderate accuracy \u2193\n\nVeri\ufb01cation scheme\nLarge accuracy \u2193\n\nV2\n\nV3\n\nrefer to Fig. 5\n\nrefer to Fig. 5\n\nrefer to Fig. 5\n\nTable 3: Summary of overall passport network performances in Scheme V1, V2 and V3, respectively\nunder three different ambiguity attack modes, f ake.\n\n4.2 Resilience against ambiguity attacks\n\nAs shown in Fig. 4, the accuracy of our proposed DNN models trained on CIFAR10/100 classi\ufb01cation\ntask is signi\ufb01cantly depending on the presence of either valid or counterfeit passports \u2014 the proposed\nDNN models presented with valid passports demonstrated almost identical accuracies as to the\noriginal DNN model. Contrary, the same proposed DNN model presented with invalid passports (in\nthis case of f ake1 = random attack) achieved only 10% accuracy which is merely equivalent to a\nrandom guessing. In the case of f ake2, we assume that the adversaries have access to the original\ntraining dataset, and attempt to reverse-engineer the scale factor and bias term by freezing the trained\nDNN weights. It is shown that in Fig. 4, reverse-engineering attacks are only able to achieve, for\nCIFAR10, at best 84% accuracy on AlexNetp and 70% accuracy on ResNetp-18. While in CIFAR100,\nfor f ake1 case, attack on both our proposed DNN models achieved only 1% accuracy; for f ake2\ncase, this attack only able to achieve 44% accuracy for AlexNetp and 35% accuracy for ResNetp-18.\nTable 3 summarizes the accuracy of the proposed methods under three ambiguity attack modes, f ake\ndepending on attackers\u2019 knowledge of the protection mechanism. It shows that all the corresponding\npassport-based DNN models accuracies are deteriorated to various extents. The ambiguous attacks\nare therefore defeated according to the \ufb01delity evaluation process, F (). We\u2019d like to highlight that\neven under the most adversary condition, i.e. freezing weights, maximizing the distance from the\noriginal passport P , and minimizing the accuracy loss (in layman terms, it means both the original\npassports and scale signs are exploited due to insider threat, and we class this as f ake3), attackers\nare still unable to use new (modi\ufb01ed) scale signs without compromising the network accuracies.\nAs shown in Fig. 5, with 10% and 50% of the original scale signs are modi\ufb01ed, the CIFAR100\nclassi\ufb01cation accuracy drops about 5% and 50%, respectively. In case that the original scale sign\nremains unchanged, the DNN model ownership can be easily veri\ufb01ed by the pre-de\ufb01ned string of\nsigns. Also, Table 3 shows that attackers are unable to exploit Dt to forge ambiguous passports.\nBased on these empirical studies, we decide to set the threshold \u0001f in De\ufb01nition 1 as 3% for AlexNetp\nand 20% for ResNetp-18, respectively. By this \ufb01delity evaluation process, any potential ambiguity\n\n8\n\n020406080100Pruningrate(%)020406080100Accuracy(%)CIFAR10Signature020406080100Pruningrate(%)020406080100Accuracy(%)CIFAR100Signature020406080100Pruningrate(%)020406080100Accuracy(%)CIFAR10Signature020406080100Pruningrate(%)020406080100Accuracy(%)CIFAR100Signature0.00.20.40.60.81.00.00.20.40.60.81.0fake1fake2validorig020406080100CIFAR10classificationaccuracy(%)0.00.20.40.60.81.0Normalizedfrequency020406080100CIFAR100classificationaccuracy(%)0.00.20.40.60.81.0Normalizedfrequency020406080100CIFAR10classificationaccuracy(%)0.00.20.40.60.81.0Normalizedfrequency020406080100CIFAR100classificationaccuracy(%)0.00.20.40.60.81.0Normalizedfrequency\f(a) Veri\ufb01cation scheme V1\n\n(b) Veri\ufb01cation scheme V2\n\n(c) Veri\ufb01cation scheme V3\n\nFigure 5: Ambiguity Attack: Classi\ufb01cation accuracy on CIFAR100 under insider threat (f ake3) on\nthree veri\ufb01cation schemes. It is shown that when a correct signature is used, the classi\ufb01cation accuracy\nis intact, while for a partial correct signature (sign scales are modi\ufb01ed around 10%), the performance\nwill immediately drop around 5%, and a totally wrong signature will obtain a meaningless accuracy\n(1%-10%). Based on the threshold \u2264 \u0001f = 3% for AlexNetp and by the \ufb01delity evaluation process F ,\nany potential ambiguity attacks (even with partially correct signature) are effectively defeated.\n\nScheme V1\n- Passport layers added\n- Passports needed\n- 15%-30% more training time\n- Passport layers & Passports needed\n- 10% more inferencing time\n- NO separate veri\ufb01cation needed\n\nTraining\n\nInferencing\n\nVeri\ufb01cation\n\n- Passport layers added\n- Passports needed\n- 100%-125% more training time\n- Passport layers & Passport NOT needed\n- NO extra time incurred\n- Passport layers & Passports needed\n\n- Passport layers added\n- Passports & Trigger set needed\n- 100%-150% more training time\n- Passport layers & Passport NOT needed\n- NO extra time incurred\n- Trigger set needed (black-box veri\ufb01cation)\n- Passport layers & Passports needed (white-box veri\ufb01cation)\n\nScheme V2\n\nScheme V3\n\nTable 4: Summary of our proposed passport networks complexity for V1, V2 and V3 schemes.\n\nattacks are effectively defeated. In summary, extensive empirical studies have shown that it is\nimpossible for adversaries to maintain the original DNN model accuracies by using counterfeit\npassports, regardless of they are either randomly generated or reverse-engineered with the use of\noriginal training datasets. This passport dependent performances play an indispensable role in\ndesigning secure ownership veri\ufb01cation schemes that are illustrated in Sect. 3.3.\n\n4.3 Network Complexity\n\nTable 4 summarizes the complexity of passport networks in various schemes. We believe that it is the\ncomputational cost at the inference stage that is required to be minimized, since network inference is\ngoing to be performed frequently by the end users. While extra costs at the training and veri\ufb01cation\nstages, on the other hand, are not prohibitive since they are performed by the network owners, with\nthe motivation to protect the DNN model ownerships. Nonetheless, we tested a larger network (i.e.\nResNetp-50) and its training time increases 10%, 182% and 191% respectively for V1, V2 and V3\nschemes. This increase is consistent with those smaller models i.e. AlexNetp and ResNetp-18.\n\n5 Discussions and conclusions\n\nConsidering billions of dollars have been invested by giant and start-up companies to explore new\nDNN models virtually every second, we believe it is imperative to protect these inventions from\nbeing stolen. While ownership of DNN models might be resolved by registering the models with\na centralized authority, it has been recognized that these regulations are inadequate and technical\nsolutions are urgently needed to support the law enforcement and juridical protections. It is this\nmotivation that highlights the unique contribution of the proposed method in unambiguous veri\ufb01cation\nof DNN models ownerships.\nMethodology-wise, our empirical studies re-asserted that over-parameterized DNN models can\nsuccessfully learn multiple tasks with arbitrarily assigned labels and/or constraints. While this\nassertion has been theoretically proved [15] and empirically investigated from the perspective of\nnetwork generalization [16], its implications to network security in general remain to be explored.\nWe believe the proposed modulation of DNN performance based on the presented passports will play\nan indispensable role in bringing DNN behaviours under control against adversarial attacks, as it has\nbeen demonstrated for DNN ownership veri\ufb01cations.\n\n9\n\n0255075100Dissimilaritybetweenvalidandfakesignature(%)010203040506070ClassificationAccuracy(%)FakePassportValidPassport0255075100Dissimilaritybetweenvalidandfakesignature(%)010203040506070ClassificationAccuracy(%)FakePassportValidPassport0255075100Dissimilaritybetweenvalidandfakesignature(%)010203040506070ClassificationAccuracy(%)FakePassportValidPassport\fAcknowledgement\n\nThis research is partly supported by the Fundamental Research Grant Scheme (FRGS) MoHE Grant\nFP021-2018A, from the Ministry of Education Malaysia. Also, we gratefully acknowledge the\nsupport of NVIDIA Corporation with the donation of the Titan V GPU used for this research.\n\nReferences\n[1] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin\u2019ichi Satoh. Embedding watermarks\n\ninto deep neural networks. In ICMR, pages 269\u2013277, 2017.\n\n[2] Y Adi, C Baum, M Cisse, B Pinkas, and J Keshet. Turning your weakness into a strength:\n\nWatermarking deep neural networks by backdooring. In USENIX, pages 1615\u20131631, 2018.\n\n[3] Huili Chen, Bita Darvish Rohani, and Farinaz Koushanfar. DeepMarks: A Digital Fingerprinting\n\nFramework for Deep Neural Networks. arXiv preprint arXiv:1804.03648, 2018.\n\n[4] Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and\nIan Molloy. Protecting intellectual property of deep neural networks with watermarking. In\nASIACCS, pages 159\u2013172, 2018.\n\n[5] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. DeepSigns: A Generic Watermark-\ning Framework for IP Protection of Deep Learning Models. arXiv preprint arXiv:1804.00750,\n2018.\n\n[6] Erwan Le Merrer, Patrick Perez, and Gilles Tr\u00e9dan. Adversarial Frontier Stitching for Remote\n\nNeural Network Watermarking. arXiv preprint arXiv:1711.01894, 2017.\n\n[7] Guo Jia and Miodrag Potkonjak. Watermarking deep neural networks for embedded systems.\n\nIn ICCAD, pages 1\u20138, 2018.\n\n[8] Qiming Li and Ee-Chien Chang. Zero-knowledge watermark detection resistant to ambiguity\nattacks. In Proceedings of the 8th workshop on Multimedia and security, pages 158\u2013163, 2006.\n\n[9] Husrev T. Sencar and Nasir D. Memon. Combatting ambiguity attacks via selective detection of\nembedded watermarks. IEEE Trans. Information Forensics and Security, 2(4):664\u2013682, 2007.\n\n[10] S. Craver, N. Memon, B. . Yeo, and M. M. Yeung. Resolving rightful ownerships with invisible\nwatermarking techniques: limitations, attacks, and implications. IEEE Journal on Selected\nAreas in Communications, 16(4):573\u2013586, 1998.\n\n[11] Lixin Fan, Kam Woh Ng, and Chee Seng Chan. [extended version] rethinking deep neural\nnetwork ownership veri\ufb01cation: Embedding passports to defeat ambiguity attacks. arXiv\npreprint arXiv:1909.07830, 2019.\n\n[12] Yuxin Wu and Kaiming He. Group normalization. In Proceedings of the European Conference\n\non Computer Vision (ECCV), pages 3\u201319, 2018.\n\n[13] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training\n\nby reducing internal covariate shift. In ICML, pages 448\u2013456, 2015.\n\n[14] Abigail See, Minh-Thang Luong, and Christopher D Manning. Compression of neural ma-\nchine translation models via pruning. In Proceedings of The 20th SIGNLL Conference on\nComputational Natural Language Learning, pages 291\u2013301, 2016.\n\n[15] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via\n\nover-parameterization. In ICML, pages 242\u2013252, 2019.\n\n[16] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding\n\ndeep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.\n\n10\n\n\f", "award": [], "sourceid": 2632, "authors": [{"given_name": "Lixin", "family_name": "Fan", "institution": "WeBank AI Lab"}, {"given_name": "Kam Woh", "family_name": "Ng", "institution": "University of Malaya"}, {"given_name": "Chee Seng", "family_name": "Chan", "institution": "University of Malaya"}]}