{"title": "Towards Robust Detection of Adversarial Examples", "book": "Advances in Neural Information Processing Systems", "page_first": 4579, "page_last": 4589, "abstract": "Although the recent progress is substantial, deep learning methods can be vulnerable to the maliciously generated adversarial examples. In this paper, we present a novel training procedure and a thresholding test strategy, towards robust detection of adversarial examples. In training, we propose to minimize the reverse cross-entropy (RCE), which encourages a deep network to learn latent representations that better distinguish adversarial examples from normal ones. In testing, we propose to use a thresholding strategy as the detector to filter out adversarial examples for reliable predictions. Our method is simple to implement using standard algorithms, with little extra training cost compared to the common cross-entropy minimization. We apply our method to defend various attacking methods on the widely used MNIST and CIFAR-10 datasets, and achieve significant improvements on robust predictions under all the threat models in the adversarial setting.", "full_text": "Towards Robust Detection of Adversarial Examples\n\nTianyu Pang, Chao Du, Yinpeng Dong, Jun Zhu\u2217\n\nDept. of Comp. Sci. & Tech., State Key Lab for Intell. Tech. & Systems\n\nBNRist Center, THBI Lab, Tsinghua University, Beijing, China\n\n{pty17, du-c14, dyp17}@mails.tsinghua.edu.cn, dcszj@mail.tsinghua.edu.cn\n\nAbstract\n\nAlthough the recent progress is substantial, deep learning methods can be vulnera-\nble to the maliciously generated adversarial examples. In this paper, we present a\nnovel training procedure and a thresholding test strategy, towards robust detection\nof adversarial examples. In training, we propose to minimize the reverse cross-\nentropy (RCE), which encourages a deep network to learn latent representations\nthat better distinguish adversarial examples from normal ones. In testing, we\npropose to use a thresholding strategy as the detector to \ufb01lter out adversarial exam-\nples for reliable predictions. Our method is simple to implement using standard\nalgorithms, with little extra training cost compared to the common cross-entropy\nminimization. We apply our method to defend various attacking methods on the\nwidely used MNIST and CIFAR-10 datasets, and achieve signi\ufb01cant improvements\non robust predictions under all the threat models in the adversarial setting.\n\n1\n\nIntroduction\n\nDeep learning (DL) has obtained unprecedented progress in various tasks, including image classi\ufb01ca-\ntion, speech recognition, and natural language processing [11]. However, a high-accuracy DL model\ncan be vulnerable in the adversarial setting [12, 33], where adversarial examples are maliciously\ngenerated to mislead the model to output wrong predictions. Several attacking methods have been\ndeveloped to craft such adversarial examples [2, 4, 12, 18, 22, 29, 30]. As DL is becoming ever more\nprevalent, it is imperative to improve the robustness, especially in safety-critical applications.\nTherefore, various defenses have been proposed attempting to correctly classify adversarial exam-\nples [14, 25, 31, 32, 33, 36]. However, most of these defenses are not effective enough, which can be\nsuccessfully attacked by more powerful adversaries [2, 3]. There is also new work on veri\ufb01cation and\ntraining provably robust networks [5, 6, 35], but these methods can only provide pointwise guarantees,\nand they require large extra computation cost. Overall, as adversarial examples even exist for simple\nclassi\ufb01cation tasks [9] and for human eyes [7], it is unlikely for such methods to solve the problem\nby preventing adversaries from generating adversarial examples.\nDue to the dif\ufb01culty, detection-based defenses have attracted a lot of attention recently as alternative\nsolutions. Grosse et al. [13] introduce an extra class in classi\ufb01ers solely for adversarial examples,\nand similarly Gong et al. [10] train an additional binary classi\ufb01er to decide whether an instance\nis adversarial or not. Metzen et al. [26] detect adversarial examples via training a detection neural\nnetwork, which takes input from intermediate layers of the classi\ufb01cation network. Bhagoji et al. [1]\nreduce dimensionality of the input image fed to the classi\ufb01cation network, and train a fully-connected\nneural network on the smaller input. Li and Li [21] build a cascade classi\ufb01er where each classi\ufb01er is\nimplemented as a linear SVM acting on the PCA of inner convolutional layers of the classi\ufb01cation\nnetwork. However, these methods all require a large amount of extra computational cost, and some\nof them also result in loss of accuracy on normal examples. In contrast, Feinman et al. [8] propose\n\n\u2217Corresponding author.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fa kernel density estimate method to detect the points lying far from the data manifolds in the \ufb01nal-\nlayer hidden space, which does not change the structure of the classi\ufb01cation network with little\ncomputational cost. However, Carlini and Wagner [3] show that each of these defense methods can\nbe evaded by an adversary targeting at that speci\ufb01c defense, i.e., by a white-box adversary.\nIn this paper, we propose a defense method which consists of a novel training procedure and a\nthresholding test strategy. The thresholding test strategy is implemented by the kernel density (K-\ndensity) detector introduced in [8]. In training, we make contributions by presenting a novel training\nobjective function, named as reverse cross-entropy (RCE), to substitute the common cross-entropy\n(CE) loss [11]. By minimizing RCE, our training procedure encourages the classi\ufb01ers to return a\nhigh con\ufb01dence on the true class while a uniform distribution on false classes for each data point,\nand further makes the classi\ufb01ers map the normal examples to the neighborhood of low-dimensional\nmanifolds in the \ufb01nal-layer hidden space. Compared to CE, the RCE training procedure can learn\nmore distinguishable representations on \ufb01ltering adversarial examples when using the K-density\ndetector or other dimension-based detectors [23]. The minimization of RCE is simple to implement\nusing stochastic gradient descent methods, with little extra training cost, as compared to CE. Therefore,\nit can be easily applied to any deep networks and is as scalable as the CE training procedure.\nWe apply our method to defend various attacking methods on the widely used MNIST [20] and\nCIFAR-10 [17] datasets. We test the performance of our method under different threat models, i.e.,\noblivious adversaries, white-box adversaries and black-box adversaries. We choose the K-density\nestimate method as our strong baseline, which has shown its superiority and versatility compared\nto other detection-based defenses [3]. The results demonstrate that compared to the baseline, the\nproposed method improves the robustness against adversarial attacks under all the threat models,\nwhile maintaining state-of-the-art accuracy on normal examples. Speci\ufb01cally, we demonstrate that\nthe white-box adversaries have to craft adversarial examples with macroscopic noises to successfully\nevade our defense, which means human observers can easily \ufb01lter out the crafted adversarial examples.\n\n2 Preliminaries\nThis section provides the notations and introduces the threat models and attacking methods.\n\n2.1 Notations\nA deep neural network (DNN) classi\ufb01er can be generally expressed as a mapping function\nF (X, \u03b8) : Rd \u2192 RL, where X \u2208 Rd is the input variable, \u03b8 denotes all the parameters and L\nis the number of classes (hereafter we will omit \u03b8 without ambiguity). Here, we focus on the DNNs\nas S(z)i = exp(zi)/(cid:80)L\nwith softmax output layers. For notation clarity, we de\ufb01ne the softmax function S(z) : RL \u2192 RL\ni=1 exp(zi), i \u2208 [L], where [L] := {1,\u00b7\u00b7\u00b7 , L}. Let Z be the output vector of the\npenultimate layer, i.e., the \ufb01nal hidden layer. This de\ufb01nes a mapping function: X (cid:55)\u2192 Z to extract data\nrepresentations. Then, the classi\ufb01er can be expressed as F (X) = S(WsZ + bs), where Ws and bs are\nthe weight matrix and bias vector of the softmax layer respectively. We denote the pre-softmax output\nWsZ + bs as Zpre, termed logits. Given an input x (i.e., an instance of X), the predicted label for x\nis denoted as \u02c6y = arg maxi\u2208[L] F (x)i. The probability value F (x)\u02c6y is often used as the con\ufb01dence\nscore on this prediction [11]. One common training objective is to minimize the cross-entropy (CE)\nloss, which is de\ufb01ned as:\n\nLCE(x, y) = \u22121(cid:62)\n\ny log F (x) = \u2212 log F (x)y,\n\nfor a single input-label pair (x, y). Here, 1y is the one-hot encoding of y and the logarithm of a vector\nis de\ufb01ned as taking logarithm of each element. The CE training procedure intends to minimize the\naverage CE loss (under proper regularization) on training data to obtain the optimal parameters.\n\n2.2 Threat models\nIn the adversarial setting, an elaborate taxonomy of threat models is introduced in [3]:\n\nadversarial examples based on the unsecured classi\ufb01cation model F .\n\n\u2022 Oblivious adversaries are not aware of the existence of the detector D and generate\n\u2022 White-box adversaries know the scheme and parameters of D, and can design special\n\u2022 Black-box adversaries know the existence of the detector D with its scheme, but have no\n\nmethods to attack both the model F and the detector D simultaneously.\n\naccess to the parameters of the detector D or the model F .\n\n2\n\n\f2.3 Attacking methods\nAlthough DNNs have obtained substantial progress, adversarial examples can be easily identi\ufb01ed to\nfool the network, even when its accuracy is high [28]. Several attacking methods on generating adver-\nsarial examples have been introduced in recent years. Most of them can craft adversarial examples\nthat are visually indistinguishable from the corresponding normal ones, and yet are misclassi\ufb01ed by\nthe target model F . Here we introduce some well-known and commonly used attacking methods.\nFast Gradient Sign Method (FGSM): Goodfellow et al. [12] introduce an one-step attacking\nmethod, which crafts an adversarial example x\u2217 as x\u2217 = x+\u0001\u00b7sign(\u2207xL(x, y)), with the perturbation\n\u0001 and the training loss L(x, y).\nBasic Iterative Method (BIM): Kurakin et al. [18] propose an iterative version of FGSM, with the\nformula as x\u2217\n0 = x, r is the number of\niteration steps and clipx,\u0001(\u00b7) is a clipping function to keep x\u2217\nIterative Least-likely Class Method (ILCM): Kurakin et al. [18] also propose a targeted version of\nBIM as x\u2217\n0 = x and yll = arg mini F (x)i.\nILCM can avoid label leaking [19], since it does not exploit information of the true label y.\nJacobian-based Saliency Map Attack (JSMA): Papernot et al. [30] propose another iterative\nmethod for targeted attack, which perturbs one feature xi by a constant offset \u0001 in each iteration step\nthat maximizes the saliency map\n\ni\u22121, y))), where x\u2217\ni in its domain.\n\ni\u22121, yll))), where x\u2217\n\ni = clipx,\u0001(x\u2217\n\ni = clipx,\u0001(x\u2217\n\nr \u00b7 sign(\u2207x\u2217\n\nr \u00b7 sign(\u2207x\u2217\n\ni\u22121 \u2212 \u0001\n\ni\u22121 + \u0001\n\nL(x\u2217\n\nL(x\u2217\n\ni\u22121\n\ni\u22121\n\n(cid:40)0, if \u2202F (x)y\n< 0 or (cid:80)\n(cid:12)(cid:12)(cid:12)(cid:80)\n\n( \u2202F (x)y\n\n\u2202xi\n)\n\nj(cid:54)=y\n\n\u2202F (x)j\n\nj(cid:54)=y\n\n\u2202F (x)j\n\n(cid:12)(cid:12)(cid:12) , otherwise.\n\n\u2202xi\n\n> 0,\n\nS(x, t)[i] =\n\n\u2202xi\n\n\u2202xi\nCompared to other methods, JSMA perturbs fewer pixels.\nCarlini & Wagner (C&W): Carlini and Wagner [2] introduce an optimization-based method, which\nis one of the most powerful attacks. They de\ufb01ne x\u2217 = 1\n2 (tanh(\u03c9) + 1) in terms of an auxiliary\nvariable \u03c9, and solve the problem min\u03c9 (cid:107) 1\n2 (tanh(\u03c9) + 1)), where\nc is a constant that need to be chosen by modi\ufb01ed binary search. f (\u00b7) is an objective function as\nf (x) = max(max{Zpre(x)i : i (cid:54)= y} \u2212 Zpre(x)i,\u2212\u03ba), where \u03ba controls the con\ufb01dence.\n\n2 (tanh(\u03c9) + 1) \u2212 x(cid:107)2\n\n2 + c \u00b7 f ( 1\n\n3 Methodology\nIn this section, we present a new method to improve the robustness of classi\ufb01ers in the adversarial\nsetting. We \ufb01rst construct a new metric and analyze its properties, which guides us to the new method.\n\n3.1 Non-maximal entropy\nDue to the dif\ufb01culty of correctly classifying adversarial examples [2, 3] and the generality of\ntheir existence [7, 9], we design a method to detect them instead, which could help in real world\napplications. For example, in semi-autonomous systems, the detection of adversarial examples would\nallow disabling autonomous operation and requesting human intervention [26].\nA detection method relies on some metrics to decide whether an input x is adversarial or not for a\ngiven classi\ufb01er F (X). A potential candidate is the con\ufb01dence F (x)\u02c6y on the predicted label \u02c6y, which\ninherently conveys the degree of certainty on a prediction and is widely used [11]. However, previous\nwork shows that the con\ufb01dence score is unreliable in the adversarial setting [12, 28]. Therefore, we\nconstruct another metric which is more pertinent and helpful to our goal. Namely, we de\ufb01ne the\nmetric of non-ME\u2014the entropy of normalized non-maximal elements in F (x), as:\n\nnon-ME(x) = \u2212(cid:88)\n\ni(cid:54)=\u02c6y\n\n\u02c6F (x)i log( \u02c6F (x)i),\n\n(1)\n\nwhere \u02c6F (x)i = F (x)i/(cid:80)\n\nj(cid:54)= \u02c6y F (x)j are the normalized non-maximal elements in F (x). Hereafter we\nwill consider the \ufb01nal hidden vector z of F given x, and use the notation F (z) with the same meaning\nas F (x) without ambiguity. To intuitively illustrate the ideas, Fig. 1a presents an example of classi\ufb01er\nF in the hidden space, where z \u2208 R2 and L = 3. Let Zpre,i, i \u2208 [L] be the i-th element of the\nlogits Zpre. Then the decision boundary between each pair of classes i and j is the hyperplane\ndbij = {z : Zpre,i = Zpre,j}, and let DBij = {Zpre,i = Zpre,j + C, C \u2208 R} be the set of all\n\n3\n\n\fFigure 1: a, The three black solid lines are the decision boundary of the classi\ufb01er, and each black line (both solid and dashed parts) is the\ndecision boundary between two classes. The blue dot-dashed lines are the isolines of non-ME = t. b, t-SNE visualization of the \ufb01nal hidden\nvectors on CIFAR-10. The model is Resnet-32. The training procedure is CE. c, The training procedure is RCE. d, Practical attacks on the\ntrained networks. Blue regions are of the original classes for normal examples, and red regions are of the target classes for adversarial ones.\n\ni(cid:54)=\u02c6y db+\n\ni,j(cid:54)=\u02c6y dbij)(cid:84) dd\u02c6y. With the above notations, we have Lemma 1 as below:\n\nparallel hyperplanes w.r.t. dbij. In Fig. 1a, each dbij corresponds to one of the three black lines. We\ndenote the half space Zpre,i \u2265 Zpre,j as db+\nij. Then, we can formally represent the decision region\n\u02c6yi and the corresponding decision boundary of this region as dd\u02c6y. Note\nthat the output F (z) has L \u2212 1 equal non-maximal elements for any points on the low-dimensional\n\nof class \u02c6y as dd\u02c6y =(cid:84)\nmanifold S\u02c6y = ((cid:84)\nLemma 1. (Proof in Appendix A) In the decision region dd\u02c6y of class \u02c6y, \u2200i, j (cid:54)= \u02c6y,(cid:103)dbij \u2208 DBij,\nthe value of non-ME for any point on the low-dimensional manifold(cid:84)\ni,j(cid:54)=\u02c6y(cid:103)dbij is constant. In\nlow-dimensional manifold(cid:84)\ni,j(cid:54)=\u02c6y(cid:103)dbij, then its value of non-ME will not change, and vice verse.\n(cid:84) dd\u02c6y,\nexists a unique (cid:103)db0\n\nparticular, non-ME obtains its global maximal value log(L \u2212 1) on and only on S\u02c6y.\nLemma 1 tells us that in the decision region of class \u02c6y, if one moves a normal input along the\n\nij \u2208 DBij, such that z0 \u2208 Q0, where Q0 =(cid:84)\n\nTheorem 1. (Proof in Appendix A) In the decision region dd\u02c6y of class \u02c6y, \u2200i, j (cid:54)= \u02c6y, z0 \u2208 dd\u02c6y, there\n\nij. Let Q\u02c6y\n\n0 = Q0\n\ni,j(cid:54)=\u02c6y(cid:103)db0\n(cid:84) dd\u02c6y, F (z)\u02c6y = 1\n\nthen the solution set of the problem\n\narg min\n\nz0\n\n(max\nz\u2208Q \u02c6y\n0\n\nF (z)\u02c6y)\n\nL .\n\n0 [27]. Then the value of maxz\u2208Q \u02c6y\n\nis S\u02c6y. Furthermore, \u2200z0 \u2208 S\u02c6y there is Q0 = S\u02c6y, and \u2200z \u2208 S\u02c6y\nLet z0 be the representation of a normal example with the predicted class \u02c6y. When crafting adversarial\nexamples based on z0, adversaries need to perturb z0 across the decision boundary dd\u02c6y. Theorem 1\nsays that there exists a unique low-dimensional manifold Q0 that z0 lies on in the decision region of\nclass \u02c6y. If we can somehow restrict adversaries changing the values of non-ME when they perturb z0,\nthen by Lemma 1, the adversaries can only perturb z0 along the manifold Q0. In this case, the nearest\nadversarial counterpart z\u2217 for z0 must be in the set Q\u02c6y\nF (z)\u02c6y is an\nupper bound of the prediction con\ufb01dence F (z\u2217)\u02c6y. This bound is a function of z0. Theorem 1 further\ntells us that if z0 \u2208 S\u02c6y, the corresponding value of the upper bound will obtain its minimum 1\nL, which\nleads to F (z\u2217)\u02c6y = 1\nL. This makes z\u2217 be easily distinguished since its low con\ufb01dence score.\nIn practice, the restriction can be implemented by a detector with the metric of non-ME. In the\ncase of Fig. 1a, any point that locates on the set S\u02c6y (black dashed lines) has the highest value of\nnon-ME = log 2. Assuming that the learned representation transformation: X (cid:55)\u2192 Z can map all\nthe normal examples to the neighborhood of S\u02c6y, where the neighborhood boundary consists of the\nisolines of non-ME = t (blue dot-dashed lines). This means that all the normal examples have\nvalues of non-ME > t. When there is no detector, the nearest adversarial example based on z0 is\nz1, which locates on the nearest decision boundary w.r.t. z0. In contrast, when non-ME is used as\nthe detection metric, z1 will be easily \ufb01ltered out by the detector because non-ME(z1) < t, and\nthe nearest adversarial example becomes z2 in this case, which locates on the junction manifold\nof the neighborhood boundary and the decision boundary. It is easy to generally conclude that\n(cid:107)z0 \u2212 z2(cid:107) > (cid:107)z0 \u2212 z1(cid:107), almost everywhere. This means that due to the existence of the detector,\nadversaries have to impose larger minimal perturbations to successfully generate adversarial examples\n\n0\n\n4\n\nOriginalinput z0DecisionboundaryIsoline ofnon-ME=t DecisionboundaryDecisionboundaryIsoline ofnon-ME=t Isoline ofnon-ME=t Adversarialinput z1Adversarialinput z2CacbDetector allowable regionNormal examplesAdversarial examples that succeed to fool detectorCEAdversarial examples that fail to fool detectorRCEDetector allowable regionDetector allowable regionDetector allowable regiond\fthat can fool the detector. Furthermore, according to Theorem 1, the con\ufb01dence at z2 is also lower\nthan it at z1, which makes z2 still be most likely be distinguished since its low con\ufb01dence score.\n\n3.2 The reverse cross-entropy training procedure\nBased on the above analysis, we now design a new training objective to improve the robustness\nof DNN classi\ufb01ers. The key is to enforce a DNN classi\ufb01er to map all the normal instances to the\nneighborhood of the low-dimensional manifolds S\u02c6y in the \ufb01nal-layer hidden space. According to\nLemma 1, this can be achieved by making the non-maximal elements of F (x) be as equal as possible,\nthus having a high non-ME value for every normal input. Speci\ufb01cally, for a training data (x, y), we\nlet Ry denote its reverse label vector whose y-th element is zero and other elements equal to 1\nL\u22121.\nOne obvious way to encourage uniformity among the non-maximal elements of F (x) is to apply\nthe model regularization method termed label smoothing [34], which can be done by introducing a\ncross-entropy term between Ry and F (x) in the CE objective:\n\nL\u03bb\nCE(x, y) = LCE(x, y) \u2212 \u03bb \u00b7 R(cid:62)\n\ny log F (x),\n\n(2)\n\nwhere \u03bb is a trade-off parameter. However, it is easy to show that minimizing L\u03bb\nminimizing the cross-entropy between F (x) and the L-dimensional vector P \u03bb:\n\nCE equals to\n\n(cid:40) 1\n\nP \u03bb\n\ni =\n\n\u03bb+1,\n(L\u22121)(\u03bb+1),\n\n\u03bb\n\ni = y,\ni (cid:54)= y.\n\n(3)\n\nNote that 1y = P 0 and Ry = P \u221e. When \u03bb > 0, let \u03b8\u2217\nCE, then the prediction\nF (x, \u03b8\u2217\n\u03bb) will tend to equal to P \u03bb, rather than the ground-truth 1y. This makes the output predictions\nbe biased. In order to have unbiased predictions that make the output vector F (x) tend to 1y,\nand simultaneously encourage uniformity among probabilities on untrue classes, we de\ufb01ne another\nobjective function based on what we call reverse cross-entropy (RCE) as\n\n\u03bb = arg min\u03b8 L\u03bb\n\n(4)\n\nCE, one will get a reverse classi\ufb01er F (X, \u03b8\u2217\n\ny log F (x).\nCE. Note that by directly minimizing LR\n\nLR\nCE(x, y) = \u2212R(cid:62)\nMinimizing RCE is equivalent to minimizing L\u221e\nCE, i.e.,\nR = arg min\u03b8 LR\n\u03b8\u2217\nR), which means that given an input x,\nthe reverse classi\ufb01er F (X, \u03b8\u2217\nR) will not only tend to assign the lowest probability to the true class but\nalso tend to output a uniform distribution on other classes. This simple insight leads to our entire\nRCE training procedure which consists of two parts, as outlined below:\nReverse training: Given the training set D := {(xi, yi)}i\u2208[N ], training the DNN F (X, \u03b8) to be a\nreverse classi\ufb01er by minimizing the average RCE loss: \u03b8\u2217\nReverse logits: Negating the \ufb01nal logits fed to the softmax layer as FR(X, \u03b8\u2217\nThen we will obtain the network FR(X, \u03b8\u2217\nFR(X, \u03b8\u2217\nTheorem 2. (Proof in Appendix A) Let (x, y) be a given training data. Under the L\u221e-norm, if there\nis a training error \u03b1 (cid:28) 1\n\nR)).\nR) that returns ordinary predictions on classes, and\n\n(cid:80)N\ni=1 LR\nR) = S(\u2212Zpre(X, \u03b8\u2217\n\nCE(xi, yi).\n\nR) is referred as the network trained via the RCE training procedure.\n\nL that (cid:107)S(Zpre(x, \u03b8\u2217\n(cid:107)S(\u2212Zpre(x, \u03b8\u2217\n\nR)) \u2212 Ry(cid:107)\u221e \u2264 \u03b1, then we have bounds\nR)) \u2212 1y(cid:107)\u221e \u2264 \u03b1(L \u2212 1)2,\n\nR = arg min\u03b8\n\n1\nN\n\nand \u2200j, k (cid:54)= y,\n\n|S(\u2212Zpre(x, \u03b8\u2217\n\nR))j \u2212 S(\u2212Zpre(x, \u03b8\u2217\n\nR))k| \u2264 2\u03b12(L \u2212 1)2.\n\nTheorem 2 demonstrates two important properties of the RCE training procedure. First, it is consistent\nand unbiased that when the training error \u03b1 \u2192 0, the output FR(x, \u03b8\u2217\nR) converges to the one-hot label\nvector 1y . Second, the upper bounds of the difference between any two non-maximal elements in\noutputs decrease as O(\u03b12) w.r.t. \u03b1 for RCE, much faster than the O(\u03b1) for CE and label smoothing.\nThese two properties make the RCE training procedure meet our requirements as described above.\n\n3.3 The thresholding test strategy\nGiven a trained classi\ufb01er F (X), we implement a thresholding test strategy by a detector for robust\nprediction. After presetting a metric, the detector classi\ufb01es the input as normal and decides to return\n\n5\n\n\f(cid:80)\n\nthe predicted label if the value of metric is larger than a threshold T , or classi\ufb01es the one as adversarial\nand returns NOT SURE otherwise. In our method, we adopt the kernel density (K-density) metric\nintroduced in [8], because applying the K-density metric with CE training has already shown better\nrobustness and versatility than other defenses [3]. K-density can be regarded as some combination of\nthe con\ufb01dence and non-ME metrics, since it can simultaneously convey the information about them.\nKernel density: The K-density is calculated in the \ufb01nal-layer hidden space. Given the predicted\nlabel \u02c6y, K-density is de\ufb01ned as KD(x) = 1|X \u02c6y|\nk(zi, z), where X\u02c6y represents the set of\ntraining points with label \u02c6y, zi and z are the corresponding \ufb01nal-layer hidden vectors, k(zi, z) =\nexp(\u2212(cid:107)zi \u2212 z(cid:107)2 /\u03c32) is the Gaussian kernel with the bandwidth \u03c3 treated as a hyperparameter.\nCarlini and Wagner [3] show that previous methods on detecting adversarial examples can be evaded\nby white-box adversaries. However, our method (RCE training + K-density detector) can defend\nthe white-box attacks effectively. This is because the RCE training procedure conceals normal\nexamples on low-dimensional manifolds in the \ufb01nal-layer hidden space, as shown in Fig. 1b and\nFig. 1c. Then the detector allowable regions can also be set low-dimensional as long as the regions\ncontain all normal examples. Therefore the white-box adversaries who intend to fool our detector\nhave to generate adversarial examples with preciser calculations and larger noises. This is intuitively\nillustrated in Fig. 1d, where the adversarial examples crafted on the networks trained by CE are easier\nto locate in the detector allowable regions than those crafted on the networks trained by RCE. This\nillustration is experimentally veri\ufb01ed in Section 4.4.\n\nxi\u2208X \u02c6y\n\n4 Experiments\n\nWe now present the experimental results to demonstrate the effectiveness of our method on improving\nthe robustness of DNN classi\ufb01ers in the adversarial setting.\n\n4.1 Setup\nWe use the two widely studied datasets\u2014MNIST [20] and CIFAR-10 [17]. MNIST is a collection of\nhandwritten digits with a training set of 60,000 images and a test set of 10,000 images. CIFAR-10\nconsists of 60,000 color images in 10 classes with 6,000 images per class. There are 50,000 training\nimages and 10,000 test images. The pixel values of images in both datasets are scaled to be in the\ninterval [\u22120.5, 0.5]. The normal examples in our experiments refer to all the ones in the training and\ntest sets. In the adversarial setting, the strong baseline we use is the K-density estimate method (CE\ntraining + K-density detector) [8], which has shown its superiority and versatility compared to other\ndetection-based defense methods [1, 10, 13, 21, 26] in [3].\n\nMethod\n\nMNIST\n\nCIFAR-10\n\n0.38\n0.29\n0.36\n0.32\n\nTable 1: Classi\ufb01cation error rates (%) on test sets.\n\nResnet-32 (CE)\nResnet-32 (RCE)\nResnet-56 (CE)\nResnet-56 (RCE)\n\n4.2 Classi\ufb01cation on normal examples\nWe \ufb01rst evaluate in the normal setting, where we imple-\nment Resnet-32 and Resnet-56 [15] on both datasets. For\neach network, we use both the CE and RCE as the train-\ning objectives, trained by the same settings as He et al.\n[16]. The number of training steps for both objectives\nis set to be 20,000 on MNIST and 90,000 on CIFAR-10.\nHereafter for notation simplicity, we will indicate the\ntraining procedure used after the model name of a trained network, e.g., Resnet-32 (CE). Similarly,\nwe indicate the training procedure and omit the name of the target network after an attacking method,\ne.g., FGSM (CE). Table 1 shows the test error rates, where the thresholding test strategy is disabled\nand all the points receive their predicted labels. We can see that the performance of the networks\ntrained by RCE is as good as and sometimes even better than those trained by the traditional CE\nprocedure. Note that we apply the same training hyperparameters (e.g., learning rates and decay\nfactors) for both the CE and RCE procedures, which suggests that RCE is easy to optimize and does\nnot require much extra effort on tuning hyperparameters.\nTo verify that the RCE procedure tends to map all the normal inputs to the neighborhood of S\u02c6y in\nthe hidden space, we apply the t-SNE technique [24] to visualize the distribution of the \ufb01nal hidden\nvector z on the test set. Fig. 1b and Fig. 1c give the 2-D visualization results on 1,000 test examples\nof CIFAR-10. We can see that the networks trained by RCE can successfully map the test examples\nto the neighborhood of low-dimensional manifolds in the \ufb01nal-layer hidden space.\n\n7.13\n7.02\n6.49\n6.60\n\n6\n\n\fTable 2: AUC-scores (10\u22122) of adversarial examples. The model of target networks is Resnet-32. Values are calculated on the examples which\nare correctly classi\ufb01ed as normal examples and then misclassi\ufb01ed as adversarial counterparts. Bandwidths used when calculating K-density\nare \u03c32\n\nRCE = 0.1/0.26. Here (-) indicates the strong baseline, and (*) indicates our defense method.\n\nCE = 1/0.26 and \u03c32\n\nAttack\n\nFGSM\n\nBIM\n\nILCM\n\nJSMA\n\nC&W\n\nC&W-hc\n\nObj.\n\nCE\nRCE\nCE\nRCE\nCE\nRCE\nCE\nRCE\nCE\nRCE\nCE\nRCE\n\nCon\ufb01dence\n\nMNIST\nnon-ME\n\n79.7\n98.8\n88.9\n91.7\n98.4\n100.0\n98.6\n100.0\n98.6\n100.0\n0.0\n0.1\n\n66.8\n98.6\n70.5\n90.6\n50.4\n97.0\n60.1\n99.4\n64.1\n99.5\n40.0\n93.4\n\nK-density\n98.8 (-)\n99.4 (*)\n90.0 (-)\n91.8 (*)\n96.2 (-)\n98.6 (*)\n97.7 (-)\n99.0 (*)\n99.4 (-)\n99.8 (*)\n91.1 (-)\n99.6 (*)\n\nCon\ufb01dence\n\nCIFAR-10\nnon-ME\n\n71.5\n92.6\n0.0\n0.7\n16.4\n64.1\n99.2\n99.5\n99.5\n99.6\n0.0\n0.2\n\n66.9\n91.4\n64.6\n70.2\n37.1\n77.8\n27.3\n91.9\n50.2\n94.7\n28.8\n53.6\n\nK-density\n99.7 (-)\n98.0 (*)\n100.0 (-)\n100.0 (*)\n84.2 (-)\n93.9 (*)\n85.8 (-)\n95.4 (*)\n95.3 (-)\n98.2 (*)\n75.4 (-)\n91.8 (*)\n\n(a) Classi\ufb01cation accuracy under iteration-based attacks\n\n(b) Average minimal distortions\n\nFigure 2: Robustness with the thresholding test strategy disabled. The model of target networks is Resnet-32.\n\n\u221a\n\n4.3 Performance under the oblivious attack\nWe test the performance of the trained Resnet-32 networks on MNIST and CIFAR-10 under the\noblivious attack, where we investigate the attacking methods as in Sec. 2.3. We \ufb01rst disable the\nthresholding test strategy and make classi\ufb01ers return all predictions to study the networks ability of\ncorrectly classifying adversarial examples. We use the iteration-based attacking methods: FGSM,\nBIM, ILCM and JSMA, and calculate the classi\ufb01cation accuracy of networks on crafted adversarial\nexamples w.r.t. the perturbation \u0001. Fig. 2a shows the results. We can see that Resnet-32 (RCE) has\nhigher accuracy scores than Resnet-32 (CE) under all the four attacks on both datasets.\nAs for optimization-based methods like the C&W attack and its variants, we apply the same way as\nin [3] to report robustness. Speci\ufb01cally, we do a binary search for the parameter c, in order to \ufb01nd\nthe minimal distortions that can successfully attack the classi\ufb01er. The distortion is de\ufb01ned in [33] as\n(cid:107)x \u2212 x\u2217(cid:107)2/\nd , where x\u2217 is the generated adversarial example and each pixel feature is rescaled\nto be in the interval [0, 255]. We set the step size in the C&W attacks at 0.01, and set binary search\nrounds of c to be 9 with the maximal iteration steps at 10,000 in each round. Moreover, to make our\ninvestigation more convincing, we introduce the high-con\ufb01dence version of the C&W attack (abbr. to\nC&W-hc) that sets the parameter \u03ba in the C&W attack to be 10 in our experiments. The C&W-hc\nattack can generate adversarial examples with the con\ufb01dence higher than 0.99, and previous work\nhas shown that the adversarial examples crafted by C&W-hc are stronger and more dif\ufb01cult to defend\nthan those crafted by C&W [2, 3]. The results are shown in Fig. 2b. We can see that the C&W and\nC&W-hc attacks need much larger minimal distortions to successfully attack the networks trained by\nRCE than those trained by CE. Similar phenomenon is also observed under the white-box attack.\nWe further activate the thresholding test strategy with the K-density metric, and also test the perfor-\nmance of con\ufb01dence or non-ME being the metric for a more complete analysis. We construct simple\nbinary classi\ufb01ers to decide whether an example is adversarial or not by thresholding with the metrics,\nand then calculate the AUC-scores of ROC curves on these binary classi\ufb01ers. Table 2 shows the\nAUC-scores calculated under different combinations of training procedures and thresholding metrics\non both datasets. From Table 2, we can see that our method (RCE training + K-density detector)\nperforms the best in almost all cases, and non-ME itself is also a pretty reliable metric, although not\nas good as K-density. The classi\ufb01ers trained by RCE also return more reliable con\ufb01dence scores,\nwhich veri\ufb01es the conclusion in Theorem 1. Furthermore, we also show that our method can better\ndistinguish between noisy examples and adversarial examples, as demonstrated in Appendix B.3.\n\n7\n\n00.040.080.120.160.2Perturbation00.20.40.60.81Accuracy00.040.080.120.160.2Perturbation00.20.40.60.81Accuracy00.040.080.120.160.2Perturbation00.20.40.60.81Accuracy00.040.080.120.160.2Perturbation00.20.40.60.81AccuracyMNIST00.040.080.120.160.2Perturbation00.20.40.60.81AccuracyCIFAR-1000.040.080.120.160.2Perturbation00.20.40.60.81Accuracy00.040.080.120.160.2Perturbation00.20.40.60.81Accuracy00.040.080.120.160.2Perturbation00.20.40.60.81AccuracyFGSM(RCE)FGSM(CE)BIM(RCE)BIM(CE)ILCM(RCE)ILCM(CE)JSMA(RCE)JSMA(CE)8.6313.711.0423.20510152025C&WC&W-hcMinimal DistortionMNIST0.650.930.771.8100.20.40.60.811.21.41.61.82C&WC&W-hcMinimal DistortionCIFAR-10CERCE\fFigure 3: The normal test images are termed as Normal, and adversarial examples generated on Resnet-32 (CE) and Resnet-32 (RCE) are\nseparately termed as CE / RCE. Adversarial examples are generated by C&W-wb with minimal distortions.\n\nObj.\n\nCE\nRCE\n\nMNIST\n\nRatio Distortion\n0.01\n0.77\n\n17.12\n31.59\n\nCIFAR-10\n\nRatio Distortion\n0.00\n0.12\n\n1.26\n3.89\n\nRes.-32 (CE)\n\nRes.-32 (RCE)\n\nRes.-56 (CE)\nRes.-56 (RCE)\n\n75.0\n89.1\n\n90.8\n84.9\n\nTable 4: AUC-scores (10\u22122) on CIFAR-10. Resnet-32 is the\nsubstitute model and Resnet-56 is the target model.\n\nTable 3: The ratios of f2(x\u2217) > 0 and minimal distortions of the\nadversarial examples crafted by C&W-wb. Model is Resnet-32.\n4.4 Performance under the white-box attack\nWe test our method under the white-box attack, which is the most dif\ufb01cult threat model and no\neffective defense exits yet. We apply the white-box version of the C&W attack (abbr. to C&W-wb)\nintroduced in [3], which is constructed specially to fool the K-density detectors. C&W-wb is also\na white-box attack for our method, since it does not exploit information of the training objective.\nC&W-wb introduces a new loss term f2(x\u2217) = max(\u2212 log(KD(x\u2217)) \u2212 \u03b7, 0) that penalizes the\nadversarial example x\u2217 being detected by the K-density detectors, where \u03b7 is set to be the median\nof \u2212 log(KD(\u00b7)) on the training set. Table 3 shows the average minimal distortions and the ratios\nof f2(x\u2217) > 0 on the adversarial examples crafted by C&W-wb, where a higher ratio indicates that\nthe detector is more robust and harder to be fooled. We \ufb01nd that nearly all the adversarial examples\ngenerated on Resnet-32 (CE) have f2(x\u2217) \u2264 0, which means that the values of K-density on them are\ngreater than half of the values on the training data. This result is consistent with previous work [3].\nHowever, note that applying C&W-wb on our method has a much higher ratio and results in a much\nlarger minimal distortion. Fig. 3 shows some adversarial examples crafted by C&W-wb with the\ncorresponding normal ones. We \ufb01nd that the adversarial examples crafted on Resnet-32 (CE) are\nindistinguishable from the normal ones by human eyes. In contrast, those crafted on Resnet-32\n(RCE) have macroscopic noises, which are not strictly adversarial examples since they are visually\ndistinguishable from normal ones. The inef\ufb01ciency of the most aggressive attack C&W-wb under\nour defense veri\ufb01es our illustration in Fig. 1d. More details on the limitation of C&W-wb are in\nAppendix B.4. We have also designed white-box attacks that exploit the training loss information\nof RCE, and we get inef\ufb01cient attacks compared to C&W-wb. This is because given an input, there\nis no explicit relationship between its RCE value and K-density score. Thus it is more ef\ufb01cient to\ndirectly attack the K-density detectors as C&W-wb does.\n4.5 Performance under the black-box attack\nFor complete analysis, we investigate the robustness under the black-box attack. The success of the\nblack-box attack is based on the transferability of adversarial examples among different models [12].\nWe set the trained Resnet-56 networks as the target models. Adversaries intend to attack them\nbut do not have access to their parameters. Thus we set the trained Resnet-32 networks to be the\nsubstitute models that adversaries actually attack on and then feed the crafted adversarial examples\ninto the target models. Since adversaries know the existence of the K-density detectors, we apply the\nC&W-wb attack. We \ufb01nd that the adversarial examples crafted by the C&W-wb attack have poor\ntransferability, where less than 50% of them can make the target model misclassify on MNIST and\nless than 15% on CIFAR-10. Table 4 shows the AUC-scores in four different cases of the black-box\nattack on CIFAR-10, and the AUC-scores in the same cases on MNIST are all higher than 95%. Note\nthat in our experiments the target models and the substitute models have very similar structures, and\nthe C&W-wb attack becomes ineffective even under this quite \u2018white\u2019 black-box attack.\n5 Conclusions\nWe present a novel method to improve the robustness of deep learning models by reliably detecting\nand \ufb01ltering out adversarial examples, which can be implemented using standard algorithms with\nlittle extra training cost. Our method performs well on both the MNIST and CIFAR-10 datasets under\nall threat models and various attacking methods, while maintaining accuracy on normal examples.\n\n8\n\nMNISTNormalCERCECIFAR-10\fAcknowledgements\n\nThis work was supported by the National Key Research and Development Program of China (No.\n2017YFA0700904), NSFC Projects (Nos. 61620106010, 61621136008, 61332007), Beijing NSF\nProject (No. L172037), Tiangong Institute for Intelligent Computing, NVIDIA NVAIL Program, and\nthe projects from Siemens and Intel.\n\nReferences\n[1] Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. Dimensionality reduction as a defense\nagainst evasion attacks on machine learning classi\ufb01ers. arXiv preprint arXiv:1704.02654, 2017.\n\n[2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks.\n\nIEEE Symposium on Security and Privacy, 2017.\n\n[3] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing\n\nten detection methods. ACM Workshop on Arti\ufb01cial Intelligence and Security, 2017.\n\n[4] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo\nLi. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition (CVPR), 2018.\n\n[5] Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan\nO\u2019Donoghue, Jonathan Uesato, and Pushmeet Kohli. Training veri\ufb01ed learners with learned\nveri\ufb01ers. arXiv preprint arXiv:1805.10265, 2018.\n\n[6] Krishnamurthy Dvijotham, Robert Stanforth, Sven Gowal, Timothy Mann, and Pushmeet Kohli.\nA dual approach to scalable veri\ufb01cation of deep networks. arXiv preprint arXiv:1803.06567,\n2018.\n\n[7] Gamaleldin F Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian\nGoodfellow, and Jascha Sohl-Dickstein. Adversarial examples that fool both human and\ncomputer vision. arXiv preprint arXiv:1802.08195, 2018.\n\n[8] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting adversarial\n\nsamples from artifacts. arXiv preprint arXiv:1703.00410, 2017.\n\n[9] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin\nWattenberg, and Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.\n\n[10] Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. Adversarial and clean data are not twins. arXiv\n\npreprint arXiv:1704.04960, 2017.\n\n[11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.\n\nhttp://www.deeplearningbook.org.\n\n[12] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversar-\n\nial examples. In International Conference on Learning Representations (ICLR), 2015.\n\n[13] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel.\nOn the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.\n\n[14] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial\n\nexamples. Conference on Neural Information Processing Systems (NIPS) Workshops, 2014.\n\n[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for im-\nage recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), pages 770\u2013778, 2016.\n\n[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual\nnetworks. In European Conference on Computer Vision (ECCV), pages 630\u2013645. Springer,\n2016.\n\n9\n\n\f[17] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.\n\nTechnical report, 2009.\n\n[18] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical\nworld. The International Conference on Learning Representations (ICLR) Workshops, 2017.\n\n[19] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In\n\nInternational Conference on Learning Representations (ICLR), 2017.\n\n[20] Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning\n\napplied to document recognition. Proceedings of the IEEE, 86(11):2278\u20132324, 1998.\n\n[21] Xin Li and Fuxin Li. Adversarial examples detection in deep networks with convolutional \ufb01lter\n\nstatistics. arXiv preprint arXiv:1612.07767, 2016.\n\n[22] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial\nexamples and black-box attacks. In International Conference on Learning Representations\n(ICLR), 2017.\n\n[23] Xingjun Ma, Bo Li, Yisen Wang, Sarah M Erfani, Sudanthi Wijewickrema, Michael E Houle,\nGrant Schoenebeck, Dawn Song, and James Bailey. Characterizing adversarial subspaces using\nlocal intrinsic dimensionality. arXiv preprint arXiv:1801.02613, 2018.\n\n[24] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine\n\nLearning Research (JMLR), 9(Nov):2579\u20132605, 2008.\n\n[25] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\n[26] Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting\nadversarial perturbations. In International Conference on Learning Representations (ICLR),\n2017.\n\n[27] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple\nand accurate method to fool deep neural networks. In The IEEE Conference on Computer Vision\nand Pattern Recognition (CVPR), pages 2574\u20132582, 2016.\n\n[28] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High\ncon\ufb01dence predictions for unrecognizable images. In The IEEE Conference on Computer Vision\nand Pattern Recognition (CVPR), pages 427\u2013436, 2015.\n\n[29] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Anan-\nthram Swami. Practical black-box attacks against deep learning systems using adversarial\nexamples. arXiv preprint arXiv:1602.02697, 2016.\n\n[30] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and\nAnanthram Swami. The limitations of deep learning in adversarial settings. In Security and\nPrivacy (EuroS&P), 2016 IEEE European Symposium on, pages 372\u2013387. IEEE, 2016.\n\n[31] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation\nas a defense to adversarial perturbations against deep neural networks. In Security and Privacy\n(SP), 2016 IEEE Symposium on, pages 582\u2013597. IEEE, 2016.\n\n[32] Andras Rozsa, Ethan M Rudd, and Terrance E Boult. Adversarial diversity and hard positive\ngeneration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\n(CVPR) Workshops, pages 25\u201332, 2016.\n\n[33] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Good-\nfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference\non Learning Representations (ICLR), 2014.\n\n[34] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Re-\nthinking the inception architecture for computer vision. In Proceedings of the IEEE Conference\non Computer Vision and Pattern Recognition (CVPR), pages 2818\u20132826, 2016.\n\n10\n\n\f[35] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex\nouter adversarial polytope. In International Conference on Machine Learning (ICML), pages\n5283\u20135292, 2018.\n\n[36] Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of\ndeep neural networks via stability training. In The IEEE Conference on Computer Vision and\nPattern Recognition (CVPR), 2016.\n\n11\n\n\f", "award": [], "sourceid": 2233, "authors": [{"given_name": "Tianyu", "family_name": "Pang", "institution": "Tsinghua University"}, {"given_name": "Chao", "family_name": "Du", "institution": "Tsinghua University"}, {"given_name": "Yinpeng", "family_name": "Dong", "institution": "Tsinghua University"}, {"given_name": "Jun", "family_name": "Zhu", "institution": "Tsinghua University"}]}