{"title": "Attribution-Based Confidence Metric For Deep Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 11826, "page_last": 11837, "abstract": "We propose a novel confidence metric, namely, attribution-based confidence (ABC) for deep neural networks (DNNs). ABC metric characterizes whether the output of a DNN on an input can be trusted. DNNs are known to be brittle on inputs outside the training distribution and are, hence, susceptible to adversarial attacks. This fragility is compounded by a lack of effectively computable measures of model confidence that correlate well with the accuracy of DNNs. These factors have impeded the adoption of DNNs in high-assurance systems. The proposed ABC metric addresses these challenges. It does not require access to the training data, the use of ensembles, or the need to train a calibration model on a held-out validation set. Hence, the new metric is usable even when only a trained model is available for inference. We mathematically motivate the proposed metric and evaluate its effectiveness with two sets of experiments. First, we study the change in accuracy and the associated confidence over out-of-distribution inputs. Second, we consider several digital and physically realizable attacks such as FGSM, CW, DeepFool, PGD, and adversarial patch generation methods. The ABC metric is low on out-of-distribution data and adversarial examples, where the accuracy of the model is also low. These experiments demonstrate the effectiveness of the ABC metric to make DNNs more trustworthy and resilient.", "full_text": "Attribution-Based Con\ufb01dence Metric For\n\nDeep Neural Networks\n\nSusmit Jha\n\nComputer Science Laboratory\n\nSRI International\n\nSunny Raj, Steven Lawrence Fernandes, Sumit Kumar Jha\n\nComputer Science Department\n\nUniversity of Central Florida, Orlando\n\nUniversity of Wisconsin-Madison\n\nSomesh Jha\n\nand Xaipient\n\nBrian Jalaian, Gunjan Verma, Ananthram Swami\n\nUS Army Research Laboratory\n\nAdelphi\n\nAbstract\n\nWe propose a novel con\ufb01dence metric, namely, attribution-based con\ufb01dence (ABC)\nfor deep neural networks (DNNs). ABC metric characterizes whether the output\nof a DNN on an input can be trusted. DNNs are known to be brittle on inputs\noutside the training distribution and are, hence, susceptible to adversarial attacks.\nThis fragility is compounded by a lack of effectively computable measures of\nmodel con\ufb01dence that correlate well with the accuracy of DNNs. These factors\nhave impeded the adoption of DNNs in high-assurance systems. The proposed\nABC metric addresses these challenges. It does not require access to the training\ndata, the use of ensembles, or the need to train a calibration model on a held-out\nvalidation set. Hence, the new metric is usable even when only a trained model\nis available for inference. We mathematically motivate the proposed metric and\nevaluate its effectiveness with two sets of experiments. First, we study the change\nin accuracy and the associated con\ufb01dence over out-of-distribution inputs. Second,\nwe consider several digital and physically realizable attacks such as FGSM, CW,\nDeepFool, PGD, and adversarial patch generation methods. The ABC metric is\nlow on out-of-distribution data and adversarial examples, where the accuracy of the\nmodel is also low. These experiments demonstrate the effectiveness of the ABC\nmetric towards creating more trustworthy and resilient DNNs.\n\n1\n\nIntroduction\n\nDeep neural network (DNN) models have been wildly successful in applications such as computer\nvision [1], natural language processing [2], and speech recognition [3, 4]. These models have reached\nhuman-level performance on several benchmarks [5, 3]. But the adoption of these models in safety-\ncritical or high-assurance applications is inhibited due to two major concerns: their brittleness [6, 7] to\nadversarial attack methods or out-of-distribution inputs, and the lack of easily computable con\ufb01dence\nmeasures that correlate well with the accuracy of the model. While improving the accuracy and\nrobustness of DNNs has received signi\ufb01cant attention, there is an urgent need to quantitatively\ncharacterize the limitations of these models and improve the transparency of their failure modes. This\npaper focuses on one such challenge: de\ufb01ning a con\ufb01dence metric on predictions of a DNN that\nclosely re\ufb02ects its accuracy.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fA con\ufb01dence metric will be helpful in integrating DNN decision-making components in applications\nsuch as medical diagnosis [8] or autonomous systems [9] where such scores can be used to override\nthe DNN decision using compositional architectures such as simplex [10]. A low con\ufb01dence score\ncan also be used to detect distribution shifts or adversarial attacks. A straightforward approach to\ncomputing such a score is to use the logit values prior to the softmax layer. But this raw con\ufb01dence\nscore is poorly calibrated [11, 12] and does not correlate well with prediction accuracy. The logits\nre\ufb02ect high con\ufb01dence even on wrong predictions over adversarial examples [13, 14, 6]. In fact,\nthe improved accuracy in deep learning models over the last decade has been accompanied with\nworsening calibration of logit output to network accuracy [11] compared to earlier models [15]. It\nindicates over\ufb01tting in the negative log likelihood space even though DNNs avoid over\ufb01tting in the\noutput accuracy [16]. This has motivated the development of con\ufb01dence measures that use a held-out\nvalidation set for training a separate calibration model on the output of the DNNs [17, 18, 11]. But\nadditional training or validation data may not be available with the trained machine learning (ML)\nmodel due to privacy, security or other practical constraints. Another class of con\ufb01dence measures use\nsampling over model ensembles or training data to estimate conformance and data density [19\u201321].\nIn contrast, the focus of our work is to compute con\ufb01dence for a DNN model on a given input without\naccess to training data or the possibility of retraining a new model or model ensembles.\n\nFigure 1: (From left to right) The original image with a label of yawl; masking its top 0.2% attribution;\nmasking its top 4% of attribution; image with a banana adversarial patch [22]; masking its top 0.2% attribution;\n(rightmost) masking its top 4% of attribution. The classi\ufb01cation result for the original image is robust and\nconforming in the neighborhood obtained by removing top positive attributions. The model predicts the original\nlabel (yawl) in the neighborhood. But the prediction changes from banana to yawl for many samples in the\nattribution neighborhood of the adversarial image. We observe a similar difference in conformance for the\nout-of-distribution examples and the adversarial examples generated by state-of-the-art digital attack methods\nsuch as DeepFool [23], PGD [24], FGSM [25] and Carlini-Wagner (CW) [26]. This observation motivates the\nuse of conformance in the attribution-neighborhood as a con\ufb01dence metric.\n\nThe con\ufb01dence of a model on a given input can be measured by sampling the neighborhood of\nthe input and observing whether the model\u2019s output changes or conforms to the original output.\nBut accurately estimating the conformance by sampling in the neighborhood of the input becomes\nexponentially dif\ufb01cult with an increase in the dimension of the input [27]. We propose a novel\napproach which samples the high-dimensional neighborhood effectively using attributions provided\nby methods [28\u201330] developed recently to improve interpretability of DNNs. In particular, we adopt\nthe use of Shapley values with roots in cooperative game theory for computing attribution. Approaches\nusing Shapley values [30, 31] satisfy intuitively expected axiomatic properties over attributions [32].\nOur attribution-based con\ufb01dence metric is theoretically motivated by these axiomatic properties.\nThe key idea is to use attributions for importance sampling in the neighborhood of an input. This\nimportance sampling is more likely to select neighbors obtained by changing features with high\nattribution. The conformance of the machine learning model over these samples is computed as\nthe fraction of neighbors for which the output of the model does not change. Figure 1 illustrates\nhow an input that triggers incorrect responses in DNNs, such as an out-of-distribution sample or an\nadversarial example, does not conform in its neighborhood. Thus, the conformance of the model\u2019s\nprediction in the input\u2019s neighborhood, sampled using feature attributions, can be used as an effective\nmeasure of the con\ufb01dence of the model on that input.\nWe make the following new contributions in the paper.\n\n\u2022 We propose a novel attribution-based con\ufb01dence (ABC) metric computed by importance sampling\nin the neighborhood of a high-dimensional input using relative feature attributions, and estimating\nconformance of the model. It does not require access to training data or additional calibration. We\nmathematically motivate the proposed ABC metric using axioms on Shapley values.\n\n\u2022 We empirically evaluate the ABC metric over MNIST and ImageNet datasets using (a) out-of-\ndistribution data, (b) adversarial inputs generated using digital attacks such as FGSM, PGD, CW\nand DeepFool, and (c) physically-realizable adversarial patches and LaVAN attacks.\n\n2\n\n\f2 Attribution-based Con\ufb01dence (ABC) Metric\n\nFigure 2: ABC complements\nthe bottom-up inference of the\noriginal DNN model with the\ntop-down sample generation\nand conformance estimation.\n\nMotivation: The proposed ABC metric is motivated by Dual Process\nTheory [33, 34] and Kahneman\u2019s decomposition [35] of cognition\ninto System 1, or the intuitive system, and System 2, or the deliberate\nsystem. The original DNN model represents the bottom-up System 1,\nand the ABC metric computation is a deliberative top-down System\n2 that uses the attributions in System 1 to generate new samples in\nthe neighborhood of the original input. Kilbertus et al. [36] argue\nthat causal mechanisms are typically continuous but most learning\nproblems such as classi\ufb01cation are anti-causal. The lack of resilience to\nadversarial examples is hypothesized to be the result of learning in an\nanti-causal direction. Combining both the anti-causal System 1 DNN\nmodel and the attribution-driven System 2 that computes the ABC\nmetric creates a relatively more resilient cognition model. ABC metric\nuses the attribution over features for the decision of a machine learning\nmodel on a given input to construct a generator that can sample the\nattribution-neighborhood of the input and observe the conformance\nof the model in this neighborhood. While learning is still in the anti-\ncausal direction, ABC adds causal deliberative System 2 that reasons in the forward generative\ndirection to evaluate the conformance of the model.\nComputational challenge: The computation of ABC metric of an ML model on an input requires\naccurately determining conformance by sampling in the neighborhood of high-dimensional inputs.\nThis can be addressed by sampling over lower-dimensional intermediate or output layers of a\nDNN [19, 17], or relying on topological and manifold-based data analysis [20]. But these methods\nrequire training data that may not always be available at inference time. ABC addresses this\nchallenge by biasing our sampling using the quantitative attribution obtained via Shapley values [30].\nDeep learning models demonstrate a concentration of features,\nthat is, few features have\nrelatively very high attributions for\nany decision. Figure 3 illustrates\nfeature concentration for ImageNet\nwhere the attribution is computed via\nIntegrated Gradients [30]. Sampling\nover low attribution features will\nlikely lead to no change in the\nlabel. Low attribution indicates that\nis equivariant along these\nmodel\nfeatures.\nBy focusing on high\nattribution features during sampling,\nour method can ef\ufb01ciently sample\n\nFigure 3: Attributions concentrate over few features in ImageNet.\neven high-dimensional input spaces to obtain a conservative estimate of the con\ufb01dence score.\nGiven an input x for a model F where Fi denotes the i-th logit output of the model, we can compute\nattribution of feature xj of x for label i as Ai\nj(x). We can then compute ABC metric in two steps:\n\nPercentile (Magnifying 80 to 100 range)\n\ne\nd\nu\nt\ni\nn\ng\na\nM\nn\no\ni\nt\nu\nb\ni\nr\nt\nt\n\n40\n\n80\nFeatures Percentile\n\n60\n\n\u00b710\u22122\n\n\u00b710\u22122\n\n0\n80\n\n1\n\n0\n\n2\n\n1\n\n5\n\n4\n\n3\n\n2\n\n5\n\n4\n\n3\n\n0\n\n20\n\n95\n\n100\n\nA\n\n100\n\n85\n\n90\n\n\u2022 Sample the neighborhood: Select feature xj with probability\n\nj (x)/xj|\nj (x)/xj| and change it to \ufb02ip the\nlabel away from i, that is, change the decision of the model (for example, by changing the feature\u2019s\nvalue to a baseline used in computing the attribution).\n\n(cid:80)\n|Ai\nj |Ai\n\n\u2022 Measure conformance: Report the fraction of samples in the neighborhood for which the decision\nof the model does not change, that is, conforms to the original decision, as the conservatively\nestimated con\ufb01dence measure.\n\nABC uses feature attributions for dimensionality reduction followed by importance sampling in\nthe reduced-dimensional neighborhood of the input to estimate DNN model\u2019s conformance. But\nunlike typical principal component analysis techniques that search for globally important features,\nwe identify features that are locally relevant for the given input. This enables our approach to\nconservatively approximate conformance measure of a model in even high-dimensional input\u2019s\nneighborhood and, thus, ef\ufb01ciently compute the ABC con\ufb01dence metric of the model on the input.\n\n3\n\n\f3 ABC Algorithm\n\nIn this section, we theoretically motivate the proposed ABC metric and present an algorithm for\nits computation. Attribution methods using Shapley values often employ the notion of a baseline\ninput xb; for example, the all dark image can be the baseline for images. The baseline can also be\na set of random inputs where attribution is computed as an expected value. Let the attribution for\nthe j-th feature and output label i be Ai\nj(x). The attribution for the j-th input feature depends on\nthe complete input x and not just xj. The treatment for each logit is similar, and so, we drop the\nlogit/class and denote the network output simply as F(\u00b7) and attribution as Aj(x). For simplicity, we\nuse the baseline input xb = 0 for computing attributions. We make the following two assumptions\non the DNN model and the attributions, which re\ufb02ect the fact that the model is well-trained and the\nattribution method is well-founded:\n\n\u2022 The attribution is dominated by the linear term. This is also an assumption made by attribution\nmethods based on Shapley values such as Integrated Gradient [30] which de\ufb01ne attribution as the\npath integral of the gradients of the DNN output with respect to that feature along the path from the\nbaseline xb to the input x, that is,\n\nAi\nj(x) = (xj \u2212 xb\n\nj) \u00d7\n\n(1)\nwhere the gradient of i-th logit output of the model along the j-th feature is denoted by \u2202jF i(\u00b7).\n\u2022 Attributions are complete i.e. the following is true for any input x and the baseline input xb:\n\n\u03b1=0\n\n\u2202jF i(xb + \u03b1(x \u2212 xb))d\u03b1\n\nF(x) \u2212 F(xb) =\n\nAk(x) where x has n features.\n\n(2)\n\nShapley value methods such as Integrated Gradient and DeepShap [30, 31] satisfy this axiom too.\n\nk=1\n\n(cid:90) 1\n\nn(cid:88)\n\nWe \ufb01rst establish a relationship between attributions and the sensitivity of the model\u2019s output to change\nin an input feature. This is useful in attribution-based dimensionality reduction of a high-dimensional\nx and de\ufb01ning its attribution-neighborhood.\nTheorem 1. The sensitivity of the output F(x) with respect to an input feature xj in the neighborhood\nof x is approximately the ratio of the attribution Aj(x) to the value of that feature xj, that is, Aj (x)\n.\nProof. Given an input x and its neighbor x(cid:48) = x + \u03b4x, we can use Taylor series expansion to express\nthe output F(x(cid:48)) as:\n\nxj\n\nF(x(cid:48)) = F(x) +\n\n\u2202F(x)\n\u2202xk\n\n\u03b4xk) + max\n\nk=1,...,n\n\nO(\u03b4x2\n\nk) .\n\n(3)\n\nAssuming the completeness of attribution over the features of the input x(cid:48), we obtain the following:\n\nF(x(cid:48)) \u2212 F(xb) =\n\nAk(x(cid:48)) .\n\n(4)\n\nn(cid:88)\n\nk=1\n\nn(cid:88)\n\n(\nk=1\n\nn(cid:88)\nn(cid:88)\n\nk=1\n\nSubtracting Equation 2 from 4, we can eliminate the baseline input xb and use Taylor series to obtain:\n\nF(x(cid:48)) \u2212 F(x) =\n\n=\n\n(Ak(x(cid:48)) \u2212 Ak(x))\n\n(cid:18) \u2202Ak(x)\n\n(cid:19)\n\n\u03b4xk\n\n\u2202xk\n\nk=1\n\n.\n\n(5)\n\n+ max\n\nk=1,...,n\n\nO(\u03b4x2\nk)\n\nWhile computing conformance, we sample only neighbors of the input and so we can drop the higher\norder terms in both Equations 5 and 3. These equations hold for all neighbors including those which\ndiffer in only one of the features xj and so, we can conclude that the sensitivity of the model with\nrespect to a feature xj is \u2202Aj (x)\n\n.\n\n\u2202xj\n\n4\n\n\f(cid:90) 1\n(cid:90) 1\n(cid:90) 1\n\n\u03b1=0\n\n\u03b1=0\n\n=\n\n\u2248\n\n\u2202F(xb + \u03b1(x \u2212 xb))\n\n\u2202xj\n\n\u2202F(xb + \u03b1(x \u2212 xb))\n\nd\u03b1 + xj\n\n\u03b1=0\n\n\u2202xj\n\n(cid:18)(cid:90) 1\n(cid:32)(cid:90) 1\n\n\u03b1=0\n\nAj(x)\nxj\n\nTaking the derivative of Equation 1 on both sides with respect to feature xj and ignoring the non-linear\nattribution terms, we obtain the following:\n\u2202F(xb + \u03b1(x \u2212 xb))\n\u2202Aj(x)\n\u2202xj\n\n\u2202F(xb + \u03b1(x \u2212 xb))\n\n\u2202\n\u2202xj\n\nd\u03b1 + xj\n\n\u2202xj\n\n\u2202xj\n\n=\n\n\u03b1=0\n\n(cid:19)\n\nd\u03b1\n\n(cid:33)\n\n\u22022F(xb + \u03b1(x \u2212 xb))\n\nd\u03b1\n\n\u2202x2\nj\n\nd\u03b1 =\n\nfrom Eqn. 1 with baseline feature xb\n\nj = 0.\n\nxj\n\n\u2202xj\n\nvanishes exploiting the non-saturating nature of Shapley value attributions.\n\n(6)\nThus, we have shown that the sensitivity of the model\u2019s output with respect to an input feature in the\nneighborhood of the input is approximately given by the ratio of the attribution for that input feature\nto the value of that feature. Note that Aj (x)\ndoes not vanish even when the traditional sensitivity\ngiven by \u2202F (x)\nThe model is almost equivariant with respect to features with low |Aj(x)/xj| and, thus, the attribution-\nneighborhood is constructed by mutating features with high |Aj(x)/xj|. The overall algorithm for\ncomputing ABC metric of a DNN model on an input is as follows.\nAlgorithm 1 Evaluate ABC con\ufb01dence metric c(F, x) of machine learning model F on input x\nInput: Model F, Input x with features x1, x2, . . . xn, Sample size S\nOutput: ABC metric c(F, x)\n1: A1, . . .An \u2190 Attributions of features x1, x2, . . . xn from input x\n2: i \u2190 F(x) {Obtain model prediction}\n3: for j = 1 to n do\n(cid:80)n\n4:\nk=1 |Ak/xk|\n5: end for\n6: Generate S samples by mutating feature xj of input x to baseline xb\n7: Obtain the output of the model on the S samples.\n8: c(F, x) \u2190 Sconform /S where model\u2019s output on Sconform samples is i\n9: return c(F, x) as con\ufb01dence metric (ABC) of prediction by the model F on the input x\n\nP (xj) \u2190 |Aj /xj|\n\nj with probability P (xj)\n\n4 Related Work: Con\ufb01dence, Robustness and Interpretability\n\nOur proposed con\ufb01dence metric (ABC) is closely related to the literature on con\ufb01dence metrics,\nattribution methods and techniques for adversarial attacks and defenses for DNNs.\nCon\ufb01dence metrics: The need for con\ufb01dence metric to re\ufb02ect the uncertainty in the output of\nmachine learning models was recognized very early in the literature [18, 17]. The high accuracy but\nbrittleness of deep learning models has revived interest in de\ufb01ning con\ufb01dence metrics that re\ufb02ect\nthe accuracy of the model. DNNs are not well-calibrated [11, 19, 20] and so, the straightforward use\nof logit layer before softmax as a con\ufb01dence measure is not reliable. Several post-processing based\ncon\ufb01dence metrics have been proposed in the literature which can be grouped into three classes:\n\n\u2022 Calibration models trained using held-out validation set: Platt [18] proposed a parametric approach\nwhere the logits are used as features to learn a calibration model from a held-out validation\nset. An example calibration model is qi = maxk \u03c3(W zi + b)k where zi are the logits, \u03c3 is the\nstandard sigmoid function, k denotes a class, (\u00b7)k denotes the k-th element of a vector, and W, b\nare parameters [15]. Temperature scaling is a special case of Platt scaling with a single parameter.\nNonparametric learning of calibration models from held-out validation data has been also proposed\nin the form of histogram binning [37], isotonic regression [38] and Bayesian binning [39].\n\n\u2022 Model ensemble approaches: Lakshminarayanan et al. [40] use ensembles of networks to obtain\nuncertainty estimates. Bayesian neural networks [17, 41] return a probability distribution over\noutputs as an alternative way to represent model uncertainty. Sampling models by dropping nodes\nin a DNN has been shown to estimate probability distribution over all models [42].\n\n5\n\n\f\u2022 Training-set-based uncertainty estimation: Jiang et al. [20] compute the trust score on deeper\nlayers of a DNN than input to avoid the high-dimensionality of inputs. They propose a trust score\nthat measures the conformance between the classifer and a modi\ufb01ed nearest-neighbor classi\ufb01er\non the testing example. Papernot and McDaniel [19] use k-nearest neighbors regression using\ntraining set on the intermediate representations of the network which showed enhanced robustness\nto adversarial attacks and leads to better calibrated uncertainty estimates.\n\nIn contrast to these approaches, ABC metric only needs the trained model at inference time, and does\nnot require training data, separate held-out validation data or training a model ensemble.\nDNN robustness, adversarial inputs and defense: Szegedy et al. [6] used L-BFGS method to\ngenerate adversarial examples. Goodfellow [25] proposed a fast gradient sign method (FGSM) to\ngenerate faster adversarial images as compared to the L-BFGS method; this method performed only\na one-step gradient update at each pixel along the direction of the gradient\u2019s sign. Rozsa et al. [43]\nreplaced the sign of the gradients with raw gradients. Dong et al. [44] applied momentum to FGSM,\nand Kurakin et al. [45] further extended FGSM to a targeted attack. A number of adversarial example\ngeneration techniques [23, 24, 26] are now available in tools such as Cleverhans [7] which we use in\nour experiments. In addition to digital attacks, we also study how the proposed ABC metric measures\nuncertainty on physically realizable attacks in form of patches or stickers [22, 46]. Further, the\nadversarial examples have been shown to transfer [47, 48] across models making them agnostic to\navailability of model parameters and effective against even ensemble approaches. Efforts over the last\nfew years to defend against adversarial attacks have met with limited success. While approaches such\nas logit pairing [49], defensive distillation [50], manifold-based defense [51, 52], and adversarial\ntraining methods that exploit knowledge of the speci\ufb01c attack [53], have shown effectiveness against\nparticular attack methods, more principled techniques such as robust optimization [54, 55, 24] and\nformal methods [56] are limited to perturbations with bounded Lp norm. Schott et al. [57] present an\ninsightful study of the state of the art on attacks and defenses.\nOur approach to addressing the challenge of DNN\u2019s susceptibility to adversarial examples using ABC\nmetric differs from the majority of prior work in that it measures con\ufb01dence of a model\u2019s prediction to\ncharacterize the credibility of a DNN on a given input instead of attempting to classify all legitimate\nand malicious inputs correctly or make particular adversarial strategies fail.\nModel interpretability and attribution methods: A number of explanation techniques [31, 30, 58\u2013\n61] have been recently proposed in the literature that either \ufb01nd a complete logical explanation or\njust the relevant features or assign quantitative importance (attributions) to input features for a given\nmodel decision. Many of these methods are based on the gradient of the predictor function with\nrespect to the input [28\u201330]. Different attribution methods are compared in Adebayo et al. [62]. The\nsensitivity of these attributions to perturbations in the input are studied in Ghorbani et al. [63], and\nindicate the potential of adversarial attacks on the proposed and other existing con\ufb01dence metrics. But\nattacking the model and its con\ufb01dence measure is more dif\ufb01cult than just attacking the model. This\nmotivates the use of ABC metric to measure con\ufb01dence. The brittleness of learning has been related\nto anti-causal direction of learning [36]. We observe that the wrong decisions for out-of-distribution\nand adversarial inputs often hinge on a relatively small concentrated set of high attribution features,\nand thus, mutating these features and generating samples in the neighborhood of the original input is\nan effective top-down inference that is robust to adversarial attacks. In recent work [64], we have\ndemonstrated the use of attributions for detecting adversarial examples.\n\n5 Experimental Evaluation\n\nWe evaluate the attribition-based con\ufb01dence metric on out-of-distribution data and adversarial attacks.\nAll experiments were conducted on a 8 core Intel\u00ae Core\u2122 i9-9900K 3.60GHz CPU with NVIDIA\nTitan RTX Graphics and 32 GB RAM.\nOut-of-distribution data: MNIST [65] with rotation and background, notMNIST [66] and\nFashionMNIST [67]. We compare the predicted accuracy vs con\ufb01dence below to evaluate how well\nthe attribution-based con\ufb01dence metric re\ufb02ects the reduced accuracy on the rotated MNIST dataset\nwith a background image [68]. The dataset has MNIST images randomly rotated by 0 to 2\u03c0, and\nwith a randomly selected black/white background image. The accuracy and con\ufb01dence of the model\ndrops with increase in rotation angle (from 0 to 50 degrees) and decrease in accuracy as illustrated in\nFigure 4. The three examples show how the ABC metric re\ufb02ects the confusability of inputs.\n\n6\n\n\fFigure 4: ABC for rotated-background-MNIST and rotated-MNIST at different angles. Selected examples from\nrotated-background-MNIST with con\ufb01dence showing quantitative analysis of ABC metric.\n\nCalibrated Scaling\n\nAttribution-based Con\ufb01dence\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n0\n\n0\n\n0\n\n0.2\n\n0.4\n\nCon\ufb01dence on MNIST Rotated by 50 degrees\n\n0.6\n\n0.8\n\n1\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\nFashion-MNIST MNIST\n\nnotMNIST MNIST\n\nFigure 5: Comparison with a calibrated scaling model. Cumulative data fraction vs. ABC for FashionMNIST\nand nonMNIST compared with MNIST. Only 19% of the FashionMNIST dataset and 26% of the notMNIST\ndataset have a con\ufb01dence higher than 0.85 while 70% of MNIST dataset had con\ufb01dence higher than 0.85.\n\nWe compare the ABC metric with a trained calibrated scaling model and also evaluate it on out-of-\ndistribution FashionMNIST and notMNIST datasets in Figure 5.\nAdversarial FGSM and PGD attacks on MNIST. Figure 6 illustrates how the decreased ABC\nmetric re\ufb02ects the decrease in accuracy on adversarial examples. The accuracy with PGD attack\ndrops close to zero and so, we only plot the fraction of data with different con\ufb01dence levels.\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\nFGSM MNIST\n\nPGD MNIST\n\nComparison of attribution methods. We also compare different attribution methods in Figure 7.\n\nFigure 6: ABC metric for FGSM and PGD attacks on MNIST.\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nFashionMNIST-DS\n\nnotMNIST-DS\n\nMNIST-DS\nnotMNIST-IG\n\nFashion-MNIST-IG\n\nMNIST-IG\n\nFashionMNIST-Grad\n\nnotMNIST-Grad\n\nMNIST-Grad\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nCumulative distribution of con\ufb01dence\n\nFigure 7: Comparing ABC metric using different attribution methods: Gradients (Grad), Integrated Gradient\n(IG), and DeepSHAP (DS). For out of distribution examples (FashionMNIST and notMNIST), results with\nDeepShap are slightly better than IG (which is better than Gradient).\n\n7\n\n\f1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n0\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n0\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\nFGSM Imagenet\n\nPGD Imagenet\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\n\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\n1\n\nDF\n\nImagenet\n\nCW Imagenet\n\nFigure 8: ABC metric for FGSM, PGD, DeepFool and CW attacks on ImageNet.\n\nDigital Adversarial attacks on ImageNet [69]: FGSM, PGD, CW and DeepFool.\nillustrates how the ABC metric re\ufb02ects the decrease in accuracy under adversarial attack.\nPhysically realizable adversarial patch attacks on ImageNet. We apply physically realizable\nadversarial patch [22] and LaVAN [46] attacks on 1000 images from ImageNet. For adversarial patch\nattack, we used a patch size of 25%for two patch types: banana and toaster. For LaVAN, we used\nbaseball patch of size 50 \u00d7 50 pixels. Figure 9 illustrates how the ABC metric is low for most of the\nadversarial examples re\ufb02ecting the decrease in the accuracy of the model.\n\nFigure 8\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n0\n\n0\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\na\nt\na\nD\n\nf\no\nn\no\ni\nt\nc\na\nr\nF\ne\nv\ni\nt\na\nl\nu\nm\nu\nC\n\n1\n\n0\n\n0\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\n1\n\n0.4\n\n0.2\n0.8\nAttribution-based Con\ufb01dence\n\n0.6\n\nLavan\n\nImagenet\n\nToaster 25\n\nImagenet\n\nBanana 25\n\nImagenet\n\nFigure 9: Cumulative data fraction vs. ABC metric for ImageNet and adversarial patch attacks.\n\n6 Conclusion\n\nWe employ an attribution-driven sampling of the neighborhood of a given input and measure the\nconformance of the model\u2019s predictions to compute the attribution-based con\ufb01dence (ABC) metric\nfor DNN prediction on this input. While directly sampling the neighborhood of a high-dimensional\ninput is challenging, our approach uses attribution-based dimensionality reduction for \ufb01nding locally\nrelevant features in the vicinity of the input, which enables effective sampling. We theoretically\nmotivate the proposed ABC metric from the axioms of Shapley values, and experimentally evaluate\nits utility over out-of-distribution data and adversarial examples.\nAcknowledgement: The authors acknowledge support from the U.S. Army Research Laboratory\nCooperative Research Agreement W911NF-17-2-0196, DARPA Assured Autonomy under contract\nFA8750-19-C-0089, U.S. National Science Foundation(NSF) grants #1422257, #1740079, #1750009,\n#1822976, #1836978, #1804648, ARO grant W911NF-17-1-0405, Royal Bank of Canada, Cyber\nFlorida, U.S. Air Force Young Investigator award, and FA9550-18-1-0166. The views, opinions\nand/or \ufb01ndings expressed are those of the author(s) and should not be interpreted as representing the\nof\ufb01cial views or policies of the Department of Defense or the U.S. Government.\n\n8\n\n\fReferences\n[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classi\ufb01cation with deep\nconvolutional neural networks. Commun. ACM, 60(6):84\u201390, May 2017. ISSN 0001-0782. doi:\n10.1145/3065386. URL http://doi.acm.org/10.1145/3065386.\n\n[2] Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep\n\nlearning based natural language processing. CoRR, abs/1708.02709, 2017.\n\n[3] Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Michael L Seltzer, Andreas Stolcke,\nDong Yu, and Geoffrey Zweig. Toward human parity in speech recognition. TASLP, pages\n2410\u20132423, 2017.\n\n[4] George Saon, Hong-Kwang Jeff Kuo, Steven J. Rennie, and Michael Picheny. The IBM 2015\n\nenglish conversational telephone speech recognition system. CoRR, abs/1505.05899, 2015.\n\n[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into recti\ufb01ers:\nSurpassing human-level performance on imagenet classi\ufb01cation. In ICCV, pages 1026\u20131034,\n2015.\n\n[6] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian\nIntriguing properties of neural networks. arXiv preprint\n\nGoodfellow, and Rob Fergus.\narXiv:1312.6199, 2013.\n\n[7] Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel.\nCleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768,\n2016.\n\n[8] Xiaoqian Jiang, Melanie Osl, Jihoon Kim, and Lucila Ohno-Machado. Calibrating predictive\nmodel estimates to support personalized medicine. Journal of the American Medical Informatics\nAssociation, 19(2):263\u2013274, 2011.\n\n[9] Kihong Park, Seungryong Kim, and Kwanghoon Sohn. Uni\ufb01ed multi-spectral pedestrian\n\ndetection based on probabilistic fusion networks. Pattern Recognition, 80:143\u2013155, 2018.\n\n[10] Danbing Seto, Bruce Krogh, Lui Sha, and Alongkrit Chutinan. The simplex architecture for\nsafe online control system upgrades. In Proceedings American Control Conference., volume 6,\npages 3504\u20133508. IEEE, 1998.\n\n[11] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural\nnetworks. In Proceedings of the 34th International Conference on Machine Learning-Volume\n70, pages 1321\u20131330. JMLR. org, 2017.\n\n[12] Volodymyr Kuleshov and Percy S Liang. Calibrated structured prediction. In Advances in\n\nNeural Information Processing Systems, pages 3474\u20133482, 2015.\n\n[13] Yarin Gal. Uncertainty in deep learning. PhD thesis, University of Cambridge, 2016.\n\n[14] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim \u0160rndi\u00b4c, Pavel Laskov,\nGiorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. In Joint\nEuropean Conference on Machine Learning and Knowledge Discovery in Databases, pages\n387\u2013402. Springer, 2013.\n\n[15] Alexandru Niculescu-Mizil and Rich Caruana. Predicting good probabilities with supervised\nlearning. In Proceedings of the 22nd International Conference on Machine learning, pages\n625\u2013632. ACM, 2005.\n\n[16] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding\n\ndeep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.\n\n[17] John S Denker and Yann Lecun. Transforming neural-net output levels to probability\n\ndistributions. In Advances in Neural Information Processing Systems, pages 853\u2013859, 1991.\n\n[18] John Platt. Probabilistic outputs for support vector machines and comparisons to regularized\n\nlikelihood methods. Advances in Large Margin Classi\ufb01ers, 10(3):61\u201374, 1999.\n\n9\n\n\f[19] Nicolas Papernot and Patrick McDaniel. Deep k-nearest neighbors: Towards con\ufb01dent,\n\ninterpretable and robust deep learning. arXiv preprint arXiv:1803.04765, 2018.\n\n[20] Heinrich Jiang, Been Kim, Melody Guan, and Maya Gupta. To trust or not to trust a classi\ufb01er.\n\nIn Advances in Neural Information Processing Systems, pages 5541\u20135552, 2018.\n\n[21] Richard O Duda, Peter E Hart, et al. Pattern classi\ufb01cation and scene analysis, volume 3. Wiley\n\nNew York, 1973.\n\n[22] Tom B Brown, Dandelion Man\u00e9, Aurko Roy, Mart\u00edn Abadi, and Justin Gilmer. Adversarial\n\npatch. arXiv preprint arXiv:1712.09665, 2017.\n\n[23] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple\n\nand accurate method to fool deep neural networks. In CVPR, pages 2574\u20132582, 2016.\n\n[24] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.\nTowards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083,\n2017.\n\n[25] Ian J et al. Goodfellow. Explaining and harnessing adversarial examples. arXiv preprint\n\narXiv:1412.6572, 2014.\n\n[26] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In\n\nISSP, pages 39\u201357. IEEE, 2017.\n\n[27] Thomas Bengtsson, Peter Bickel, Bo Li, et al. Curse-of-dimensionality revisited: Collapse of\nthe particle \ufb01lter in very large scale systems. In Probability and statistics: Essays in honor of\nDavid A. Freedman, pages 316\u2013334. Institute of Mathematical Statistics, 2008.\n\n[28] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks:\nVisualising image classi\ufb01cation models and saliency maps. arXiv preprint arXiv:1312.6034,\n2013.\n\n[29] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi\nParikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based\nlocalization. In CVPR, pages 618\u2013626, 2017.\n\n[30] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In\n\nICML, pages 3319\u20133328. JMLR. org, 2017.\n\n[31] Scott M Lundberg and Su-In Lee. A uni\ufb01ed approach to interpreting model predictions. In\n\nAdvances in Neural Information Processing Systems, pages 4765\u20134774, 2017.\n\n[32] Pradeep Dubey. On the uniqueness of the shapley value. International Journal of Game Theory,\n\n4(3):131\u2013139, 1975.\n\n[33] Jonathan St BT Evans and Keith Frankish. In two minds: Dual processes and beyond, volume 10.\n\nOxford University Press Oxford, 2009.\n\n[34] Philip M Groves and Richard F Thompson. Habituation: a dual-process theory. Psychological\n\nreview, 77(5):419, 1970.\n\n[35] Daniel Kahneman. Thinking, fast and slow. Macmillan, 2011.\n\n[36] Niki Kilbertus, Giambattista Parascandolo, and Bernhard Sch\u00f6lkopf. Generalization in anti-\n\ncausal learning. arXiv preprint arXiv:1812.00524, 2018.\n\n[37] Bianca Zadrozny and Charles Elkan. Obtaining calibrated probability estimates from decision\n\ntrees and naive bayesian classi\ufb01ers. In ICML, volume 1, pages 609\u2013616. Citeseer, 2001.\n\n[38] Bianca Zadrozny and Charles Elkan. Transforming classi\ufb01er scores into accurate multiclass\nprobability estimates. In Proceedings of the Eighth ACM SIGKDD International Conference on\nKnowledge Discovery and Data Mining, pages 694\u2013699. ACM, 2002.\n\n10\n\n\f[39] Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht. Obtaining well calibrated\nprobabilities using bayesian binning. In Twenty-Ninth AAAI Conference on Arti\ufb01cial Intelligence,\n2015.\n\n[40] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable\npredictive uncertainty estimation using deep ensembles. In Advances in Neural Information\nProcessing Systems, pages 6402\u20136413, 2017.\n\n[41] David JC MacKay. Probable networks and plausible predictions\u2014a review of practical bayesian\nmethods for supervised neural networks. Network: Computation In Neural Systems, 6(3):\n469\u2013505, 1995.\n\n[42] Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing\nmodel uncertainty in deep learning. In International Conference on Machine Learning, pages\n1050\u20131059, 2016.\n\n[43] Andras Rozsa, Ethan M Rudd, and Terrance E Boult. Adversarial diversity and hard positive\ngeneration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition\nWorkshops, pages 25\u201332, 2016.\n\n[44] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo\nLi. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on\nComputer Vision and Pattern Recognition, pages 9185\u20139193, 2018.\n\n[45] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale.\n\narXiv preprint arXiv:1611.01236, 2016.\n\n[46] Danny Karmon, Daniel Zoran, and Yoav Goldberg. LaVAN: Localized and visible adversarial\n\nnoise. arXiv preprint arXiv:1801.02608, 2018.\n\n[47] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial\n\nexamples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.\n\n[48] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and\nAnanthram Swami. Practical black-box attacks against machine learning. In ACCS\u201917, 2017.\n\n[49] Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understanding the\n\nrobustness of adversarial logit pairing. arXiv preprint arXiv:1807.10272, 2018.\n\n[50] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation\n\nas a defense to adversarial perturbations against deep neural networks. In ISSP\u201916, 2016.\n\n[51] Andrew Ilyas, Ajil Jalal, Eirini Asteri, Constantinos Daskalakis, and Alexandros G Dimakis.\nThe robust manifold defense: Adversarial training using generative models. arXiv preprint\narXiv:1712.09196, 2017.\n\n[52] Susmit Jha, Uyeong Jang, Somesh Jha, and Brian Jalaian. Detecting adversarial examples using\n\ndata manifolds. In MILCOM, pages 547\u2013552. IEEE, 2018.\n\n[53] Florian Tram\u00e8r, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and\nPatrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint\narXiv:1705.07204, 2017.\n\n[54] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. arXiv preprint arXiv:1801.09344, 2018.\n\n[55] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I\nJordan. Theoretically principled trade-off between robustness and accuracy. arXiv:1901.08573,\n2019.\n\n[56] Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. Output range\nanalysis for deep feedforward neural networks. In NASA Formal Methods Symposium, pages\n121\u2013138. Springer, 2018.\n\n11\n\n\f[57] Lukas Schott, Jonas Rauber, Matthias Bethge, and Wieland Brendel. Towards the \ufb01rst\nadversarially robust neural network model on MNIST. International Conference on Learning\nRepresentations (ICLR), May 2019.\n\n[58] Guanbin Li and Yizhou Yu. Visual saliency based on multiscale deep features. In CVPR, pages\n\n5455\u20135463, 2015.\n\n[59] Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua. Lift: Learned invariant feature\n\ntransform. In ECCV, pages 467\u2013483. Springer, 2016.\n\n[60] Susmit Jha, Vasumathi Raman, Alessandro Pinto, Tuhin Sahai, and Michael Francis. On learning\nsparse Boolean formulae for explaining AI decisions. In NASA Formal Methods Symposium,\npages 99\u2013114. Springer, 2017.\n\n[61] Susmit Jha, Tuhin Sahai, Vasumathi Raman, Alessandro Pinto, and Michael Francis. Explaining\nAI decisions using ef\ufb01cient methods for learning sparse Boolean formulae. Journal of Automated\nReasoning, 63(4):1055\u20131075, 2019.\n\n[62] Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim.\n\nSanity checks for saliency maps. In NIPS, pages 9525\u20139536, 2018.\n\n[63] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile.\n\narXiv preprint arXiv:1710.10547, 2017.\n\n[64] Susmit Jha, Sunny Raj, Steven Fernandes, Sumit Kumar Jha, Somesh Jha, Jalaian Brian, Gunjan\nVerma, and Ananthram Swami. Attribution-driven causal analysis for detection of adversarial\nexamples. Safe Machine Learning workshop at ICLR (No Proceedings), 2019.\n\n[65] Yann LeCun, Corinna Cortes, and Christopher JC Burges. The MNIST database. URL\n\nhttp://yann. lecun. com/exdb/mnist, 1998.\n\n[66] Yaroslav Bulatov. NotMNIST dataset. Google (Books/OCR), Tech. Rep.[Online]. Available:\n\nhttp://yaroslavvb.blogspot.it/2011/09/notmnist-dataset.html, 2011.\n\n[67] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for\n\nbenchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.\n\n[68] University of Montreal.\n\nThe rotated MNIST with background image.\n\nhttps://sites.google.com/a/lisa.iro.umontreal.ca/public_static_twiki/variations-on-the-\nmnist-digits, 1998.\n\nURL\n\n[69] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng\nHuang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei.\nImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision\n(IJCV), 115(3):211\u2013252, 2015. doi: 10.1007/s11263-015-0816-y.\n\n12\n\n\f", "award": [], "sourceid": 6347, "authors": [{"given_name": "Susmit", "family_name": "Jha", "institution": "SRI"}, {"given_name": "Sunny", "family_name": "Raj", "institution": "University of Central Florida"}, {"given_name": "Steven", "family_name": "Fernandes", "institution": "University of Central Florida"}, {"given_name": "Sumit", "family_name": "Jha", "institution": "University of Central Florida"}, {"given_name": "Somesh", "family_name": "Jha", "institution": "University of Wisconsin, Madison"}, {"given_name": "Brian", "family_name": "Jalaian", "institution": "U.S. Army Research Laboratory"}, {"given_name": "Gunjan", "family_name": "Verma", "institution": "U.S. Army Research Laboratory"}, {"given_name": "Ananthram", "family_name": "Swami", "institution": "Army Research Laboratory, Adelphi"}]}