{"title": "Reducing Network Agnostophobia", "book": "Advances in Neural Information Processing Systems", "page_first": 9157, "page_last": 9168, "abstract": "Agnostophobia, the fear of the unknown, can be experienced by deep learning engineers while applying their networks to real-world applications. Unfortunately, network behavior is not well defined for inputs far from a networks training set. In an uncontrolled environment, networks face many instances that are not of interest to them and have to be rejected in order to avoid a false positive. This problem has previously been tackled by researchers by either a) thresholding softmax, which by construction cannot return \"none of the known classes\", or b) using an additional background or garbage class. In this paper, we show that both of these approaches help, but are generally insufficient when previously unseen classes are encountered. We also introduce a new evaluation metric that focuses on comparing the performance of multiple approaches in scenarios where such unseen classes or unknowns are encountered. Our major contributions are simple yet effective Entropic Open-Set and Objectosphere losses that train networks using negative samples from some classes. These novel losses are designed to maximize entropy for unknown inputs while increasing separation in deep feature space by modifying magnitudes of known and unknown samples. Experiments on networks trained to classify classes from MNIST and CIFAR-10 show that our novel loss functions are significantly better at dealing with unknown inputs from datasets such as Devanagari, NotMNIST, CIFAR-100 and SVHN.", "full_text": "Reducing Network Agnostophobia\n\nAkshay Raj Dhamija, Manuel G\u00a8unther, and Terrance E. Boult\n\nVision and Security Technology Lab, University of Colorado Colorado Springs\n\n{adhamija | mgunther | tboult} @ vast.uccs.edu\n\nAbstract\n\nAgnostophobia, the fear of the unknown, can be experienced by deep learning\nengineers while applying their networks to real-world applications. Unfortunately,\nnetwork behavior is not well de\ufb01ned for inputs far from a networks training set. In\nan uncontrolled environment, networks face many instances that are not of interest\nto them and have to be rejected in order to avoid a false positive. This problem\nhas previously been tackled by researchers by either a) thresholding softmax,\nwhich by construction cannot return none of the known classes, or b) using an\nadditional background or garbage class. In this paper, we show that both of these\napproaches help, but are generally insuf\ufb01cient when previously unseen classes are\nencountered. We also introduce a new evaluation metric that focuses on comparing\nthe performance of multiple approaches in scenarios where such unseen classes\nor unknowns are encountered. Our major contributions are simple yet effective\nEntropic Open-Set and Objectosphere losses that train networks using negative\nsamples from some classes. These novel losses are designed to maximize entropy\nfor unknown inputs while increasing separation in deep feature space by modifying\nmagnitudes of known and unknown samples. Experiments on networks trained to\nclassify classes from MNIST and CIFAR-10 show that our novel loss functions\nare signi\ufb01cantly better at dealing with unknown inputs from datasets such as\nDevanagari, NotMNIST, CIFAR-100, and SVHN.\n\n1\n\nIntroduction and Problem Formulation\n\nEver since a convolutional neural network (CNN) [19] won the ImageNet Large Scale Visual\nRecognition Challenge (ILSVRC) in 2012 [33], the extraordinary increase in the performance of\ndeep learning architectures has contributed to the growing application of computer vision algorithms.\nMany of these algorithms presume detection before classi\ufb01cation or directly belong to the category\nof detection algorithms, ranging from object detection [13, 12, 32, 23, 31], face detection [17],\npedestrian detection [42] etc. Interestingly, though each year new state-of-the-art-algorithms emerge\nfrom each of these domains, a crucial component of their architecture remains unchanged \u2013 handling\nunwanted or unknown inputs.\nObject detectors have evolved over time from using feature-based detectors to sliding windows [34],\nregion proposals [32], and, \ufb01nally, to anchor boxes [31]. The majority of these approaches can be\nseen as having two parts, the proposal network and the classi\ufb01cation network. During training, the\nclassi\ufb01cation network includes a background class to identify a proposal as not having an object of\ninterest. However, even for the state-of-the-art systems it has been reported that the object proposals\nto the classi\ufb01er \u201cstill contain a large proportion of background regions\u201d and \u201cthe existence of many\nbackground samples makes the feature representation capture less intra-category variance and more\ninter-category variance (...) causing many false positives between ambiguous object categories\u201d [41].\nIn a system that both detects and recognizes objects, the ability to handle unknown samples is crucial.\nOur goal is to improve the ability to classify correct classes while reducing the impact of unknown\ninputs. In order to better understand the problem, let us assume Y \u2282 N be the in\ufb01nite label space of\nall classes, which can be broadly categorized into:\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\f(a) Softmax\n\n(b) Background\n\n(c) Objectosphere\n\n(d) Open-Set Recognition Curve\n\nFigure 1: LENET++ RESPONSES TO KNOWNS AND UNKNOWNS. The network in (a) was only trained\nto classify the 10 MNIST classes (D(cid:48)\nc) using softmax, while the networks in (b) and (c) added NIST letters [15]\nas known unknowns (D(cid:48)\nb) trained with softmax or our novel Objectosphere loss. In the feature representation\nplots on top, colored dots represent test samples from the ten MNIST classes (Dc), while black dots represent\nsamples from the Devanagari[28] dataset (Da), and the dashed gray-white lines indicate class borders where\nsoftmax scores for neighboring classes are equal. This paper addresses how to improve recognition by reducing\nthe overlap of network features from known samples Dc with features from unknown samples Du. The \ufb01gures in\nthe bottom are histograms of softmax probability values for samples of Dc and Da with a logarithmic vertical\naxis. For known samples Dc, the probability of the correct class is used, while for samples of Da the maximum\nprobability of any known class is displayed. In an application, a score threshold \u03b8 should be chosen to optimally\nseparate unknown from known samples. Unfortunately, such a threshold is dif\ufb01cult to \ufb01nd for either (a) or (b),\na better separation is achievable with the Objectosphere loss (c). The proposed Open-Set Classi\ufb01cation Rate\n(OSCR) curve in (d) depicts the high accuracy of our approach even at a low false positive rate.\n\n\u2022 C = {1, . . . , C} \u2282 Y: The known classes of interest that the network shall identify.\n\u2022 U = Y \\ C: The unknown classes containing all types of classes the network needs to reject.\nSince Y is in\ufb01nite and C is \ufb01nite, U is also in\ufb01nite. The set U can further be divided:\n1. B \u2282 U: The background, garbage, or known unknown classes. Since U is in\ufb01nitely\nlarge, during training only a small subset B can be used.\n2. A = U \\ B = Y \\ (C \u222a B): The unknown unknown classes, which represent the rest\nof the in\ufb01nite space U, samples from which are not available during training, but only\noccur at test time.\n\nLet the samples seen during training belonging to B be depicted as D(cid:48)\nb and the ones seen during\ntesting depicted as Db. Similarly, the samples seen during testing belonging to A are represented as\nDa. The samples belonging to the known classes of interest C, seen during training and testing are\nrepresented as D(cid:48)\nc and Dc, respectively. Finally, we call the unknown test samples Du = Db \u222a Da.\nIn this paper, we introduce two novel loss functions that do not directly focus on rejecting unknowns,\nbut on developing deep features that are more robust to unknown inputs. When training our models\nwith samples from the background D(cid:48)\nb, we do not add an additional softmax output for the background\nclass. Instead, for x \u2208 D(cid:48)\nb, the Entropic Open-Set loss maximizes entropy at the softmax layer. The\nObjectosphere loss additionally reduces deep feature magnitude, which in turn minimizes the softmax\nresponses of unknown samples. Both yield networks where thresholding on the softmax scores is\neffective at rejecting unknown samples x \u2208 Du. Our approach is largely orthogonal to and could\nbe integrated with multiple prior works such as [1, 11, 20], all of which build upon network outputs.\nThe novel model of this paper may also be used to improve the performance of classi\ufb01cation module\nof detection networks by better handling false positives from the region proposal network.\nOur Contributions: In this paper, we make four major contributions: a) we derive a novel loss\nfunction, the Entropic Open-Set loss, which increases the entropy of the softmax scores for back-\nground training samples and improves the handling of background and unknown inputs, b) we extend\nthat loss into the Objectosphere loss, which further increases softmax entropy and performance\nby minimizing the Euclidean length of deep representations of unknown samples, c) we propose a\nnew evaluation metric for comparing the performance of different approaches under the presence of\nunknown samples, and d) we show that the new loss functions advance the state of the art for open-set\nimage classi\ufb01cation. Our code is publicly available.1\n\n1http://github.com/Vastlab/Reducing-Network-Agnostophobia\n\n2\n\n0.00.10.20.30.40.50.60.70.80.91.01101001000100000.00.10.20.30.40.50.60.70.80.91.01101001000100000.00.10.20.30.40.50.60.70.80.91.0110100100010000104103102101100False Positive Rate : Total Unknowns 100320.00.20.40.60.81.0Correct Classification RateSoftmaxBackgroundEntropic Open-SetObjectosphereOpenMax\f2 Background and Related Work\n\nFor traditional learning systems, learning with rejection or background classes has been around for\ndecades [6], [5]. Recently, approaches for deep networks have been developed that more formally\naddress the rejection of samples x \u2208 Du. These approaches are called open-set [1, 4, 3], outlier-\nrejection [40, 25], out-of-distribution detection [38], or selective prediction [11]. In addition, there is\nalso active research in network-based uncertainty estimation [14, 10, 36, 20].\nIn these prior works there are two goals. First, for a sample x of class \u02c6c \u2208 C, P (c | x) is computed\nsuch that arg maxc P (c | x) = \u02c6c. Second, for a sample x of class u \u2208 U, either the system provides\nan uncertainty score P (U | x) or the system provides a low P (c | x) from which arg maxc P (c | x)\nis thresholded to reject a sample as unknown. Rather than approximating P (u | x), this paper aims at\nreducing P (c | x) for unknown samples x \u2208 Du by improving the feature representation and network\noutput to be more robust to unknown samples.\nWe review a few details of the most related approaches to which we compare: thresholding softmax\nscores, estimating uncertainty, taking an open-set approach, and using a background class.\n\nThresholding Softmax Scores: This approach assumes that samples from a class on which the\nnetwork was not trained would have probability scores distributed across all the known classes,\nhence making the maximum softmax score for any of the known classes low. Therefore, if the\nsystem thresholds the maximum score, it may avoid classifying such a sample as one of the known\nclasses. While rejecting unknown inputs by thresholding some type of score is common [24, 7, 9],\nthresholding softmax is problematic. Almost since its inception [2], softmax has been known to\nbias the probabilities towards a certain class even though the difference between the logit values of\nthe winner class and its neighboring classes is minimal. This was highlighted by Matan et al. [24]\nwho noted that softmax would increase scores for a particular class even though they may have\nvery limited activation on the logit level. In order to train the network to provide better logit values,\nthey included an additional parameter \u03b1 in the softmax loss by modifying the loss function as:\nSc(x) = log elc (x)\nc(cid:48) (x) . This modi\ufb01cation forces the network to have a higher loss when the logit\nel\nvalues lc(x) are smaller than \u03b1 during training, and decreases the softmax scores when all logit values\nare smaller than \u03b1. This additional parameter can also be interpreted as an additional node in the\noutput layer that is not updated during backpropagation. The authors also viewed this node as a\nrepresentation of none of the above, i.e., the node accounts for x \u2208 Du.\n\ne\u03b1+ (cid:80)\n\nc(cid:48)\u2208C\n\nUncertainty Estimation:\nIn 2017, Lakshminarayanan et al. [20] introduced an approach to predict\nuncertainty estimates using MLP ensembles trained with MNIST digits and their adversarial examples.\nRather than approximating P (u | x), their approach is focused at reducing maxc P (c | x) whenever\nx \u2208 Du, which they solved using a network ensemble. We compare our approach to their results\nusing their evaluation processes as well as using our Open-Set Classi\ufb01cation Rate (OSCR) curve.\n\nOpen-Set Approach OpenMax: The OpenMax approach introduced by Bendale and Boult [1]\ntackles deep networks in a similar way to softmax as it does not use background samples during\ntraining, i.e., D(cid:48)\nb = \u2205. OpenMax aims at directly estimating P (U | x). Using the deep features from\ntraining samples, it builds per-class probabilistic models of the input not belonging to the known\nclasses, and combines these in the OpenMax estimate of each class probability, including P (U | x).\nThough this approach provided the \ufb01rst steps to formally address the open-set issue for deep networks,\nit is an of\ufb02ine solution after the network had already been trained. It does not improve the feature\nrepresentation to better detect unknown classes.\n\nBackground Class:\nInterestingly, none of the previous works compared with or combined with the\nbackground class modeling approach that dominates state-of-the-art detection approaches, i.e., most of\nthe above approaches assumed Db = \u2205. The background class approach can be seen as an extension\nof the softmax approach by Matan et al. [24] seen above. In this variation, the network is trained to\n\ufb01nd an optimal value of \u03b1 for each sample such that the resulting plane separates unknown samples\nfrom the rest of the classes. Systems trained with a background class use samples from D(cid:48)\nb, hoping that\nthese samples are suf\ufb01ciently representative of Du, so that after training the system correctly labels\nunknown, i.e., they assume \u2200x \u2208 D(cid:48)\nb : P (U | x) \u2248 1 =\u21d2 \u2200z \u2208 Da, c \u2208 C : P (U | z) > P (c | z).\n\n3\n\n\fWhile this is true by construction for most of the academic datasets like PASCAL [8] and MS-COCO\n[22], where algorithms are often evaluated, it is a likely source of \u201dnegative\u201d dataset bias [37] and\ndoes not necessarily hold true in the real world where the negative space has near in\ufb01nite variety of\ninputs that need to be rejected. To the best of our knowledge, this approach has never been formally\ntested for open-set effectiveness, i.e., handling unknown unknown samples from Da.\nThough all of these approaches provide partial solutions to address the problem of unknown samples,\nwe show that our novel approach advances the state of the art in open-set image classi\ufb01cation.\n\n3 Visualizing Deep Feature Responses to Unknown Samples\n\nIn order to highlight some of the issues and understand the response of deep networks to out of\ndistribution or unknown samples, we create a visualization of the responses from deep networks\nwhile encountering known and unknown samples. We use the LeNet++ network [39], which aims\nat classifying the samples in the MNIST hand-written digits database [21] while representing each\nsample in a two dimensional deep feature space. This allows for an easy visualization of the deep\nfeature space which can be seen as an imitation of the response of the network.\nWe train the network to classify the MNIST digits (Dc) and then feed characters from an unknown\ndataset (Da) to obtain the response of the network. In Fig. 1, we sampled unknowns (black points)\nfrom the Devanagari[28] dataset, while other plots in the supplemental material use samples from\nother unknown datasets. As seen in Figure 1(a), when using the standard softmax approach there is\nquite an overlap between features from Dc and Da. Furthermore, from the histogram of the softmax\nscores it is clear that majority of unknown samples have a high softmax score for one of the known\nclasses. This means that if a probability threshold \u03b8 has to be chosen such that we get a low number\nof false positives i.e.\nless unknown samples are identi\ufb01ed as a known class, we would also be\nrejecting most of the known samples since they would be below \u03b8. Clearly, when a network is not\nexplicitly trained to identify unknown samples it can result in signi\ufb01cant confusion between known\nand unknown samples.\nThe background class approach explicitly trains with out of distribution samples D(cid:48)\nb, while learning\nto classify Dc. Here, the goal is to account for any unknown inputs Da that occur at test time. In\nFig. 1(b) we display results from such an approach where during training NIST letters [15] were\nused as D(cid:48)\nb. It can be observed that majority of the unknown samples Da, from the Devanagari\ndataset, fall within the region of the background class. However, there are still many samples from\nDa overlapping the samples from Dc, mostly at the origin where low probabilities are to be expected.\nMany Da samples also overlap with the neighboring known classes far from the origin and, therewith,\nobtain high prediction probabilities for those known classes.\nFor our Objectosphere approach, we follow the same training protocol as in the background approach\ni.e. training with D(cid:48)\nb to the origin while pushing the lobes\nrepresenting the MNIST digits Dc farther from the origin. This results in a much clearer separation\nbetween the known Dc and the unknowns samples Da, as visible in Fig. 1(c).\n\nb. Here, we aim at mapping samples from D(cid:48)\n\n4 Approach\n\nOne of the limitations of training with a separate background class is that the features of all unknown\nsamples are required to be in one region of the feature space. This restriction is independent of the\nsimilarity a sample might have to one of the known classes. An important question not addressed in\nprior work is if there exists a better and simpler representation, especially one that is more effective\nat creating a separation between known and unknown samples.\nFrom the depiction of the test set of MNIST and Devanagari dataset in Figs. 1(a) and 2(a), we observe\nthat magnitudes for unknown samples in deep feature space are often lower than those of known\nsamples. This observation leads us to believe that the magnitude of the deep feature vector captures\ninformation about a sample being unknown. We want to exploit and exaggerate this property to\ndevelop a network where for x \u2208 D(cid:48)\nb we reduce the deep feature magnitude ((cid:107)F (x)(cid:107)) and maximize\nentropy of the softmax scores in order to separate them from known samples. This allows the network\nto have unknown samples that share features with known classes as long as they have a small feature\nmagnitude. It might also allow the network to focus learning capacity to respond to the known classes\n\n4\n\n\finstead of spending effort in learning speci\ufb01c features for unknown samples. We do this in two\nstages. First, we introduce the Entropic Open-Set loss to maximize entropy of unknown samples by\nmaking their softmax responses uniform. Second, we expand this loss into the Objectosphere loss,\nwhich requires the samples of D(cid:48)\nc to have a magnitude above a speci\ufb01ed minimum while driving the\nmagnitude of the features of samples from D(cid:48)\nb to zero, providing a margin in both magnitude and\nentropy between known and unknown samples.\nIn the following, for classes c \u2208 C let Sc(x) = elc (x)(cid:80)\n\nc(cid:48) (x) be the standard softmax score where\nel\nlc(x) represents the logit value for class c. Let F (x) be deep feature representation from the fully\nconnected layer that feeds into the logits. For brevity, we do not show the dependency on input x\nwhen its obvious.\n\nc(cid:48)\u2208C\n\n4.1 Entropic Open-Set Loss\n\nIn deep networks, the most commonly used loss function is the standard softmax loss given above.\nWhile we keep the softmax loss calculation untouched for samples of D(cid:48)\nc, we modify it for training\nwith the samples from D(cid:48)\nb seeking to equalize their logit values lc, which will result in equal softmax\nscores Sc. The intuition here is that if an input is unknown, we know nothing about what classes it\nrelates to or what features we want it to have and, hence, we want the maximum entropy distribution\nof uniform probabilities over the known classes. Let Sc be the softmax score as above, our Entropic\nOpen-Set Loss JE is de\ufb01ned as:\n\n\uf8f1\uf8f2\uf8f3\u2212 log Sc(x)\nC(cid:80)\n\n\u2212 1\n\nC\n\nlog Sc(x)\n\nc=1\n\nJE(x) =\n\nc is from class c\n\nif x \u2208 D(cid:48)\nif x \u2208 D(cid:48)\n\nb\n\n(1)\n\nWe \ufb01rst show that the minimum of the loss JE for sample x \u2208 Db is achieved when the softmax\nscores Sc(x) for all known classes are identical.\nLemma 1. For an input x \u2208 D(cid:48)\nb, the loss JE(x) is minimized when all softmax responses Sc(x) are\nequal: \u2200c \u2208 C : Sc(x) = S = 1\nC .\nFor x \u2208 D(cid:48)\nb, the loss JE(x) is similar in form to entropy over the per-class softmax scores. Thus,\nbased on Shannon [35], it is intuitive that the loss is minimized when all values are equal. However,\nsince JE(x) is not exactly identical to entropy, a formal proof is given in the supplementary material.\nLemma 2. When the logit values are equal, the loss JE(x) is minimized.\nProof.\nhence, all softmax scores are equal.\nTheorem 1. For networks whose logit layer does not have bias terms, and for x \u2208 D(cid:48)\nb, the loss JE(x)\nis minimized when the deep feature vector F (x) that feeds into the logit layer is the zero vector, at\nwhich point the softmax responses Sc(x) are equal: \u2200c \u2208 C : Sc(x) = S = 1\nC and the entropy of\nsoftmax and the deep feature is maximized.\nProof. Let F \u2208 RM be our deep feature vector, and Wc \u2208 RM be the weights in the layer that\nconnects F to the logit lc. Since the network does not have bias terms, lc = Wc \u00b7 F , so when F = (cid:126)0,\nthen the logits are all equal to zero: \u2200c : lc = 0. By Lemma 2, we know that when the logits are all\nequal the loss JE(x) is minimized and softmax scores are equal, and maximize entropy.\n\nIf the logits are equal, say lc = \u03b7, then each softmax has an equivalent numerator (e\u03b7) and,\n\nNote that the theorem does not show that F = (cid:126)0 is the only minimum because it is possible that there\nexists a subspace of the feature space that is orthogonal to all Wc. Minimizing loss JE(x) may, but\ndoes not have to, result in a small magnitude on unknown inputs. A small perturbation from such a\nsubspace may quickly increase decrease entropy, so we seek a more stable solution.\n\n4.2 Objectosphere Loss\n\nFollowing the above theorem, the Entropic Open-Set loss produces a network that generally represents\nthe unknown samples with very low magnitudes, while also producing high softmax entropy. This can\nbe seen in Fig. 2(b) where magnitudes of known unknown test samples (Db) are well-separated from\nmagnitudes of known samples (Dc). However, there is often a modest overlap between the feature\n\n5\n\n\f(a) Softmax\n\n(b) Entropic Open-Set Loss\n\n(c) Objectosphere Loss\n\nFigure 2: NORMALIZED HISTOGRAMS OF DEEP FEATURE MAGNITUDES. In (a) the magnitude of\nthe unknown samples (Da) are generally lower than the magnitudes of the known samples (Dc) for a typical\ndeep network. Using our novel Entropic Open-Set loss (b) we are able to further decrease the magnitudes of\nunknown samples and using our Objectosphere loss (c) we are able to create an even better separation between\nknown and unknown samples.\nmagnitudes of known and unknown samples. This should not be surprising as nothing is forcing\nknown samples to have a large feature magnitude or always force unknown samples to have small\nfeature magnitude. Thus, we attempt to put a distance margin between them. In particular, we seek to\npush known samples into what we call the Objectosphere where they have large feature magnitude\nand low entropy \u2013 we are training the network to have a large response to known classes. Also, we\npenalize (cid:107)F (x)(cid:107) for x \u2208 D(cid:48)\nb to minimize feature length and maximize entropy, with the goal of\nproducing a network that does not highly respond to anything other than the class samples. Targeting\nthe deep feature layer helps to ensure there is no accidental minima. Formally, the Objectosphere loss\nis calculated as:\n\n(cid:26)max(\u03be \u2212 (cid:107)F (x)(cid:107), 0)2\n\n(2)\n\nJR = JE + \u03bb\n\n(cid:107)F (x)(cid:107)2\n\nif x \u2208 D(cid:48)\nif x \u2208 D(cid:48)\n\nc\n\nb\n\nwhich both penalizes the known classes if their feature magnitude is inside the boundary of the\nObjectosphere, and unknown classes if their magnitude is greater than zero. We now prove this has\nonly one minimum.\nTheorem 2. For networks whose logit layer does not have bias terms, given an known unknown\ninput x \u2208 D(cid:48)\nb, the loss JR(x) is minimized if and only if the deep feature vector F = (cid:126)0, which in turn\nensures the softmax responses Sc(x) are equal: \u2200c \u2208 C : Sc(x) = S = 1\nC and maximizes entropy.\nProof. The \u201cif\u201d follows directly from Theorem 1 and the fact that adding 0 does not change the\nminimum and given F = (cid:126)0, the logits are zero and the softmax scores must be equal. For the \u201conly\nif\u201d, observe that of all possible features F with \u2200c \u2208 C : Wc \u00b7 F = 0 that minimize JE, the added\n(cid:107)F (x)(cid:107)2 ensures that the only minimum is at F = (cid:126)0.\n\nThe parameter \u03be sets the margin, but also implicitly increases scaling and can impact learning rate; in\npractice one can determine \u03be using cross-class validation. Note that larger \u03be values will generally\nscale up deep features, including the unknown samples, but what matters is the overall separation. As\nseen in the histogram plots of Fig. 2(c), compared to the Entropic Open-Set loss, the Objectosphere\nloss provides an improved separation in feature magnitudes.\nFinally, instead of thresholding just on the \ufb01nal softmax score Sc(x) of our Objectosphere network, we\ncan use the fact that we forced known and unknown samples to have different deep feature magnitudes\nand multiply the softmax score with the deep feature magnitude: Sc(x) \u00b7 (cid:107)F (x)(cid:107). Thresholding this\nmultiplication seems to be more reasonable and justi\ufb01able.\n\n4.3 Evaluating Open-Set Systems\nAn open-set system has a two-fold goal, it needs to reject samples belonging to unknown classes Du\nas well as classify the samples from the correct classes Dc. This makes evaluating open-set more\ncomplex. Various evaluation metrics attempt to handle the unknwon classes Du in their own way but\nhave certain drawbacks which we discuss for each of these measures individually.\n\nAccuracy v/s Con\ufb01dence Curve:\nIn this curve, the accuracy or precision of a given classi\ufb01cation\nnetwork is plotted against the threshold of softmax scores, which are assumed to be con\ufb01dences. This\ncurve was very recently applied by Lakshminarayanan et al. [20] to compare algorithms on their\nrobustness to unknown samples Du. This measure has the following drawbacks:\n1. Separation between performance on Du and Dc: In order to provide a quantitative value on\nthe vertical axis, i.e., accuracy or precision, samples belonging to the unknown classes Du are\n\n6\n\n1001011020.00.51.01011001011020.00.51.01001011020.00.51.0\f(a) AUC of PR\n\n(b) Precision v/s Con\ufb01dence\n\n(c) Open-Set Classi\ufb01cation Rate Curve\n\nFigure 3: COMPARISON OF EVALUATION METRICS. In (a) we depict the Area under the Curve (AUC) of\nPrecision-Recall Curves applied to the data from the CriteoLabs display ad challenge on Kaggle2. The algorithm\nwith the maximum AUC (I5) does not have best performance at almost any recall. In (b), our algorithms are\ncompared to the MLP ensemble of [20] using their accuracy v/s con\ufb01dence curve using MNIST as known and\nNotMNIST as unknown samples. In (c), the proposed Open-Set Classi\ufb01cation Rate curves are provided for the\nsame algorithms. While MLP and Squared MLP actually come from the same algorithm, they have a different\nperformance in (b), but are identical in (c).\n\nconsidered as members of another class which represents all classes of Du, making the total\nnumber of classes for the purpose of this evaluation C + 1. Therefore, this measure can be highly\nsensitive to the dataset bias since the number of samples belonging to Du may be many times\nhigher than those belonging to Dc. One may argue that a simpli\ufb01ed weighted accuracy might\nsolve this issue but it still would not be able to provide a measure of the number of samples from\nDu that were classi\ufb01ed as belonging to a known class c \u2208 C.\n\n2. Algorithms are Incomparable: Since con\ufb01dences across algorithms cannot be meaningfully\nrelated, comparing them based on their individual con\ufb01dences becomes tricky. An algorithm may\nclassify an equal number of unknown samples of Du as one of the known classes at a con\ufb01dence\nof 0.1 to another algorithm at a con\ufb01dence of 0.9, but still have the same precision since different\nnumber of known samples of Dc are being classi\ufb01ed correctly.\n\n3. Prone to Scaling Errors: An ideal evaluation metric should be independent of any monotonic\nre-normalization of the scores. In Fig. 3(b), we added a curve labeled squared MLP with Ensemble,\nwhich appears to be better than the MLP ensemble though it is the same curve with softmax scores\nscaled by simply squaring them.\n\nArea Under the Curve (AUC) of a Precision Recall Curve AUC is another evaluation metric\ncommonly found in research papers from various \ufb01elds. In object detection, it is popularly calculated\nfor a precision-recall (PR) curve and referred to as average precision (AP). The application of any\nalgorithm to a real world problem involves the selection of an operating point, with the natural choices\non a PR curve being either high precision (low number of false positives) or high recall (high number\nof true positives). Let us consider the PR curves in Fig. 3(a), which are created from real data. When\nhigh precision of 0.8 is chosen as an operating point, the algorithm I13 provides a better recall than\nI5 but this information is not clear from the AUC measure since I5 has a larger AUC than I13. In fact,\neven though I11 has same AUC as I13, it can not operate at a precision > 0.5. A similar situation\nexists when selecting an operating point based on the high recall. This clearly depicts that, though the\nAUC of a PR curve is a widely used measure in the research community, it cannot be reliably used\nfor selecting one algorithm over another. Also other researchers have pointed out that \u201cAP cannot\ndistinguish between very different [PR] curves\u201d [27]. Moreover as seen in Fig. 3(a), the PR curves are\nnon-monotonic by default. When object detection systems are evaluated, the PR curves are manually\nmade monotonic by assigning maximum precision value at any given recall for all recalls larger than\nthe current one, which provides an over-optimistic estimation of the \ufb01nal AUC value.\n\nRecall@K According to Plummer et al. [30], \u201cRecall@K (K = 1, 5, 10) [is] the percentage of\nqueries for which a correct match has rank of at most K.\u201d This measure has the same issue of the\nseparation between performance on Du and Dc as the accuracy v/s con\ufb01dence curve. Recall@K can\nonly be assumed as an open-set evaluation metric due to the presence of the background class in that\npaper, i.e., the total number of classes for the purpose of this evaluation is also C + 1. Furthermore, in\na detection setup the number of region proposals for two algorithms are dependent on their underlying\napproaches. Therefore, the Recall@K is not comparable for two algorithms since the number of\nsamples being compared are different.\n\n7\n\n0.00.20.40.60.81.0Confidence Threshold0.40.50.60.70.80.91.0Precision104103102101100False Positive Rate : Total Unknowns 187240.00.20.40.60.81.0Correct Classification RateSoftmaxBackgroundEntropic Open-SetObjectosphereMLP with EnsembleSquared MLP with Ensemble\fAlgorithm\nSoftmax\n\nEntropic Open-Set\n\nObjectosphere\n\nDc Entropy\n0.015\u00b1 .084\n0.050\u00b1 .159\n0.056\u00b1 .168\n\nDa Entropy Dc Magnitude Da Magnitude\n32.27 \u00b1 18.47\n0.318\u00b1 .312\n1.984\u00b1 .394\n1.50 \u00b1 2.50\n2.031\u00b1 .432\n2.19 \u00b1 4.73\n\n94.90\u00b1 27.47\n50.14\u00b1 17.36\n76.80\u00b1 28.55\n\nTable 1: ENTROPY AND FEATURE MAGNITUDE. Mean and standard deviation of entropy and feature\nmagnitudes for known and unknown test samples are presented for different algorithms on Experiment #1\n(LeNet++). As predicted by the theory, Objectosphere has the highest entropy for unknown samples (Da) and\ngreatest separation and between known (Dc) and unknown (Da) for both entropy and deep feature magnitude.\n\nOur Evaluation Approach To properly address the evaluation of an open-set system, we introduce\nthe Open-Set Classi\ufb01cation Rate (OSCR) curve as shown in Fig. 3(c), which is an adaptation of the\nDetection and Identi\ufb01cation Rate (DIR) curve used in open-set face recognition [29]. For evaluation,\nwe split the test samples into samples from known classes Dc and samples from unknown classes\nDu. Let \u03b8 be a score threshold. For samples from Dc, we calculate the Correct Classi\ufb01cation Rate\n(CCR) as the fraction of the samples where the correct class \u02c6c has maximum probability and has a\nprobability greater than \u03b8. We compute the False Positive Rate (FPR) as the fraction of samples from\nDu that are classi\ufb01ed as any known class c \u2208 C with a probability greater than \u03b8:\n\n(cid:12)(cid:12){x | x \u2208 Da \u2227 maxc P (c| x) \u2265 \u03b8}(cid:12)(cid:12)\n(cid:12)(cid:12){x | x \u2208 Dc \u2227 arg maxc P (c| x) = \u02c6c \u2227 P (\u02c6c|x) > \u03b8}(cid:12)(cid:12)\n\n|Da|\n\n,\n\n|Dc|\n\nFPR(\u03b8) =\n\nCCR(\u03b8) =\n\n(3)\n\n.\n\nFinally, we plot CCR versus FPR, varying the probability threshold large \u03b8 on the left side to\nsmall \u03b8 on the right side. For the smallest \u03b8, the CCR is identical to the closed-set classi\ufb01cation\naccuracy on Dc. Unlike the above discussed evaluation measures, which are prone to dataset bias,\nOSCR is not since its DIR axis is computed solely from samples belonging to Dc. Moreover, when\nalgorithms exposed to different number of samples from Da need to be compared, rather than using\nthe normalized FPR with an algorithm speci\ufb01c Da, we may use the raw number of false positives on\nthe horizontal axis [16].\n\n5 Experiments\n\nOur experiments demonstrate the application of our Entropic Open-Set loss and Objectosphere loss\nwhile comparing them to the background class and standard softmax thresholding approaches for\ntwo types of network architectures. The \ufb01rst set of experiments use a two dimensional LeNet++\narchitecture for which experiments are detailed in Sec. 3. We also compare our approaches to\nthe recent state of the art OpenMax [1] approach, which we signi\ufb01cantly outperform as seen in\nFig. 1(d). Our experiments include Devanagari, CIFAR-10 and NotMNIST datasets as Da, the results\nbeing summarized in Tab. 2 with more visualizations in the supplemental material. In Fig. 3(c), we\nuse NotMNIST as Da and signi\ufb01cantly outperform the adversarially trained MLP ensemble from\nLakshminarayanan et al. [20].\nThe second set of experiments use a ResNet-18 architecture with a 1024 feature dimension layer to\nclassify the ten classes from CIFAR-10 [18], Dc. We use the super classes of the CIFAR-100 [18]\ndataset to create a custom split for our Db and Da samples. We split the super classes into two equal\nparts, all the samples from one of these splits are used for training as Db, while samples from the\nother split is used only during testing as Da. Additionally, we also test on the Street View House\nNumbers (SVHN) [26] dataset as Da. The results for both the CIFAR-100 and SVHN dataset are\nsummarized in Tab. 2. In addition to the Entropic Open-Set and Objectosphere loss we also test the\nscaled objectosphere approach mentioned in Sec. 4.2.\n\n6 Discussion and Conclusion\n\nThe experimental evidence provided in Tab. 1 supports the theory that samples from unseen classes\ngenerally have a low feature magnitude and higher softmax entropy. Our Entropic Open-Set loss\n\n2Challenge website: http://www.kaggle.com/c/criteo-display-ad-challenge\nAlgorithm details: http://www.kellygwiseman.com/criteo-labs-advertising-challenge\n\n8\n\n\fDevanagri\n\n10032\n\nNotMNIST\n\n18724\n\nCIFAR10\n\n10000\n\nSVHN\n26032\n\nExperiment\n\nUnknowns\n\n|Da|\n\nSoftmax\n\nBackground\n\nSoftmax\n\nBackground\n\nEntropic Open-Set\n\nObjectosphere\n\nEntropic Open-Set\n\nObjectosphere\n\nAlgorithm\nSoftmax\n\nBackground\n\nLeNet++\nArchitecture\nTrained with\nMNIST digits as\nDc and NIST\nLetters as Db\n\n10\u22124\n0.0\n0.0\n0.7142\n0.7350\n0.0\n0.3806\n0.4201\n0.512\n0.7684\n0.8232\n0.973\n0.9656\n0.1924\n0.2012\n0.1071\n0.1862\n0.2547\n\n10\u22121\n0.9007\n0.9313\n0.9788\n0.9791\n0.8288\n0.9624\n0.9780\n0.9773\n0.9641\n0.973\n0.9806\n0.9794\n0.6473\n0.6981\n0.6214\n0.6886\n0.7013\n0.5139\n0.6049\n0.5855\n0.6345\n0.6647\nTable 2: EXPERIMENTAL RESULTS. Correct Classi\ufb01cation Rates (CCR) at different False Positive Rates\n(FPR) are given for multiple algorithms tested on different datasets. For each experiment and at each FPR,\nthe best performance is in bold. We show Scaled Objectosphere only when it was better than Objectosphere;\nmagnitude scaling does not help in the 2D feature space of LeNet++.\n\nCCR at FPR of\n10\u22122\n10\u22123\n0.0777\n0.0\n0.7527\n0.4402\n0.9580\n0.8746\n0.9108\n0.9658\n0.4954\n0.3397\n0.9068\n0.7179\n0.9515\n0.8578\n0.8965\n0.9563\n0.9288\n0.8617\n0.9726\n0.9546\n0.9804\n0.9787\n0.9785\n0.9735\n0.2949\n0.4599\n0.4803\n0.3022\n0.4277\n0.2338\n0.5074\n0.3387\n0.5454\n0.3896\nN/A 0.0706\n0.2339\n0.3429\nN/A 0.1598\n0.3501\nN/A 0.1776\n0.3595\nN/A 0.1866\nN/A 0.2584\n0.4334\n\nClasses as Dc and\nCIFAR-100 as Db\n\nSubset of\n\nResNet-18\nArchitecture\nTrained with\nCIFAR-10\n\nEntropic Open-Set\n\nObjectosphere\n\nObjectosphere\nScaled Objecto\n\nObjectosphere\nScaled Objecto\n\nSoftmax\n\nBackground\n\nCIFAR-100\n\nSubset\n4500\n\nSoftmax\n\nBackground\n\nEntropic Open-Set\n\nEntropic Open-Set\n\nand Objectosphere loss utilize this default behaviour by further increasing entropy and decreasing\nmagnitude for unknown inputs. This improves network robustness towards out of distribution samples\nas supported by our experimental results in Tab. 2. Here, we summarize results of the Open-Set\nClassi\ufb01cation Rate (OSCR) Curve by providing the Correct Classi\ufb01cation Rates (CCR) at various\nFalse Positive Rate (FPR) values.\nThe proposed solutions are not, however, without their limitations which we now discuss. Though\ntraining with the Entropic Open-Set loss is at about the same complexity as training with a background\nclass, the additional magnitude restriction for the Objectosphere loss can make the network a bit\nmore complicated to train. The Objectosphere loss requires determining \u03bb, which is used to balance\ntwo elements of the loss, as well as choosing \u03be, the minimum feature magnitude for known samples.\nThese can be chosen systematically, using cross-class calibration, where one trains with a subset of\nthe background classes, say half of them, then tests on the remaining unseen background classes.\nHowever, this adds complexity and computational cost.\nWe also observe that in case of the LeNet++ architecture, some random initializations during training\nresult in the Entropic Open-Set loss having better or equivalent performance to Objectosphere loss.\nThis may be attributed to the narrow two dimensional feature space used by the network. In high\ndimensional feature space as in the ResNet-18 architecture the background class performs better\nthan Entropic Open-Set and Objectosphere at very low FPR, but is beat by the Scaled-Objectosphere\napproach, highlighting the importance of low feature magnitudes for unknowns.\nDuring our experiments, it was also found that the choice of unknown samples used during training\nis important. E.g. in the LeNet++ experiment, training with CIFAR samples as the unknowns D(cid:48)\ndoes not provide robustness to unknowns from the samples of NIST Letters dataset, Da. Whereas,\nb\ntraining with NIST Letters D(cid:48)\nb does provide robustness against CIFAR images Da. This is because\nCIFAR images are distinctly different from the MNIST digits where as NIST letters have attributes\nvery similar to them. This \ufb01nding is consistent with the well known importance of hard-negatives in\ndeep network training.\nWhile there was considerable prior work on using a background or garbage class in detection, as well\nas work on open set, rejection, out-of-distribution detection, or uncertainty estimation, this paper\n\n9\n\n\fpresents the \ufb01rst theoretically grounded signi\ufb01cant steps to an improved network representation to\naddress unknown classes and, hence, reduce network agnostophobia.\n\n7 Acknowledgements\n\nThis research is based upon work funded in part by NSF IIS-1320956 and in part by the Of\ufb01ce of\nthe Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity\n(IARPA), via IARPA R&D Contract No. 2014-14071600012. The views and conclusions contained\nherein are those of the authors and should not be interpreted as necessarily representing the of\ufb01cial\npolicies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government.\nThe U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes\nnotwithstanding any copyright annotation thereon.\n\nReferences\n[1] Abhijit Bendale and Terrance E. Boult. Towards open set deep networks. In Conference on\n\nComputer Vision and Pattern Recognition (CVPR). IEEE, 2016.\n\n[2] John S. Bridle. Probabilistic interpretation of feedforward classi\ufb01cation network outputs, with\nrelationships to statistical pattern recognition. Neuro-Computing: Algorithms, Architectures,\n1989.\n\n[3] P. Panareda Busto and J\u00a8urgen Gall. Open set domain adaptation. In International Conference\n\non Computer Vision (ICCV). IEEE, 2017.\n\n[4] Douglas O. Cardoso, Jo\u02dcao Gama, and Felipe M.G. Franc\u00b8a. Weightless neural networks for open\n\nset recognition. Machine Learning, 106(9-10):1547\u20131567, 2017.\n\n[5] Eric I. Chang and Richard P. Lippmann. Figure of merit training for detection and spotting. In\n\nAdvances in Neural Information Processing Systems (NIPS), 1994.\n\n[6] Chi-Keung Chow. An optimum character recognition system using decision functions. IRE\n\nTransactions on Electronic Computers, (4):247\u2013254, 1957.\n\n[7] Claudio De Stefano, Carlo Sansone, and Mario Vento. To reject or not to reject: that is the\nquestion-an answer in case of neural classi\ufb01ers. Transactions on Systems, Man, and Cybernetics,\nPart C (Applications and Reviews), 30(1):84\u201394, 2000.\n\n[8] Mark Everingham, Luc van Gool, Christopher K.I. Williams, John Winn, and Andrew Zisserman.\nThe pascal visual object classes (VOC) challenge. International Journal of Computer Vision\n(IJCV), 88(2):303\u2013338, 2010.\n\n[9] Giorgio Fumera and Fabio Roli. Support vector machines with embedded reject option. In\n\nPattern Recognition with Support Vector Machines, pages 68\u201382. Springer, 2002.\n\n[10] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model\nuncertainty in deep learning. In International Conference on Machine Learning (ICML), 2016.\n[11] Yonatan Geifman and Ran El-Yaniv. Selective classi\ufb01cation for deep neural networks. In\n\nAdvances in Neural Information Processing Systems (NIPS), 2017.\n\n[12] Ross Girshick. Fast R-CNN. In International Conference on Computer Vision (CVPR). IEEE,\n\n2015.\n\n[13] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for\naccurate object detection and semantic segmentation. In Conference on Computer Vision and\nPattern Recognition (CVPR). IEEE, 2014.\n\n[14] Alex Graves. Practical variational inference for neural networks.\n\nInformation Processing Systems (NIPS), 2011.\n\nIn Advances in Neural\n\n[15] Patrick Grother and Kayee Hanaoka. NIST special database 19 handprinted forms and characters\n\n2nd edition. Technical report, National Institute of Standards and Technology (NIST), 2016.\n\n[16] Manuel G\u00a8unther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang,\nAkshay Raj Dhamija, Deva Ramanan, J\u00a8urgen Beyerer, Josef Kittler, Mohamad Al Jazaery,\nMohammad Iqbal Nouyed, Cezary Stankiewicz, and Terrance E. Boult. Unconstrained face de-\ntection and open-set face recognition challenge. In International Joint Conference on Biometrics\n(IJBC), 2017.\n\n10\n\n\f[17] Peiyun Hu and Deva Ramanan. Finding tiny faces. In Conference on Computer Vision and\n\nPattern Recognition (CVPR). IEEE, 2017.\n\n[18] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.\n\nTechnical report, Citeseer, 2009.\n\n[19] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classi\ufb01cation with deep\nconvolutional neural networks. In Advances in Neural Information Processing Systems (NIPS),\n2012.\n\n[20] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable\npredictive uncertainty estimation using deep ensembles. In Advances in Neural Information\nProcessing Systems (NIPS), 2017.\n\n[21] Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten\n\ndigits, 1998.\n\n[22] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan,\nPiotr Doll\u00b4ar, and C. Lawrence Zitnick. Microsoft COCO: Common objects in context. In\nEuropean Conference on Computer Vision (ECCV). Springer, 2014.\n\n[23] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang\nFu, and Alexander C. Berg. SSD: Single shot multibox detector. In European Conference on\nComputer Vision (ECCV). Springer, 2016.\n\n[24] Ofer Matan, R.K. Kiang, C.E. Stenard, B. Boser, J.S. Denker, D. Henderson, R.E. Howard,\nW. Hubbard, L.D. Jackel, and Yann Le Cun. Handwritten character recognition using neural\nnetwork architectures. In USPS Advanced Technology Conference, 1990.\n\n[25] Noam Mor and Lior Wolf. Con\ufb01dence prediction for lexicon-free OCR. In Winter Conference\n\non Applications of Computer Vision (WACV). IEEE, 2018.\n\n[26] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng.\nReading digits in natural images with unsupervised feature learning. In Advances in Neural\nInformation Processing Systems (NIPS) Workshop, 2011.\n\n[27] Kemal Oksuz, Baris Cam, Emre Akbas, and Sinan Kalkan. Localization recall precision (LRP):\nA new performance metric for object detection. In European Conference on Computer Vision\n(ECCV), 2018.\n\n[28] Ashok Kumar Pant, Sanjeeb Prasad Panday, and Shashidhar Ram Joshi. Off-line Nepali\nhandwritten character recognition using multilayer perceptron and radial basis function neural\nnetworks. In Asian Himalayas International Conference on Internet (AH-ICI). IEEE, 2012.\n\n[29] P. Jonathon Phillips, Patrick Grother, and Ross Micheals. Handbook of Face Recognition,\n\nchapter Evaluation Methods in Face Recognition. Springer, 2nd edition, 2011.\n\n[30] Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and\nSvetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer\nimage-to-sentence models. In International Conference on Computer Vision (ICCV). IEEE,\n2015.\n\n[31] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Uni\ufb01ed,\nreal-time object detection. In Conference on Computer Vision and Pattern Recognition (CVPR).\nIEEE, 2016.\n\n[32] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time\nobject detection with region proposal networks. In Advances in Neural Information Processing\nSystems (NIPS), 2015.\n\n[33] Olga Russakovsky, Jia Deng, Zhiheng Huang, Alexander C. Berg, and Li Fei-Fei. Detecting\navocados to zucchinis: what have we done, and where are we going? In International Conference\non Computer Vision (ICCV). IEEE, 2013.\n\n[34] Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun.\nOverfeat: Integrated recognition, localization and detection using convolutional networks. 2013.\n[35] C. E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal,\n\n27:623\u2013656, 1948.\n\n11\n\n\f[36] Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, and Frank Hutter. Bayesian optimization\nwith robust Bayesian neural networks. In Advances in Neural Information Processing Systems\n(NIPS), 2016.\n\n[37] Tatiana Tommasi, Novi Patricia, Barbara Caputo, and Tinne Tuytelaars. A deeper look at dataset\n\nbias. In Domain Adaptation in Computer Vision Applications, pages 37\u201355. Springer, 2017.\n\n[38] Apoorv Vyas, Nataraj Jammalamadaka, Xia Zhu, Dipankar Das, Bharat Kaul, and Theodore L.\nWillke. Out-of-distribution detection using an ensemble of self supervised leave-out classi\ufb01ers.\nIn European Conference on Computer Vision (ECCV). Springer, 2018.\n\n[39] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning\napproach for deep face recognition. In European Conference on Computer Vision (ECCV).\nSpringer, 2016.\n\n[40] Li Xu, Jimmy S.J. Ren, Ce Liu, and Jiaya Jia. Deep convolutional neural network for image\n\ndeconvolution. In Advances in Neural Information Processing Systems (NIPS), 2014.\n\n[41] Bin Yang, Junjie Yan, Zhen Lei, and Stan Z Li. Craft objects from images. In Conference on\n\nComputer Vision and Pattern Recognition (CVPR). IEEE, 2016.\n\n[42] Shanshan Zhang, Rodrigo Benenson, Mohamed Omran, Jan Hosang, and Bernt Schiele. Towards\nreaching human performance in pedestrian detection. Transactions on Pattern Analysis and\nMachine Intelligence (TPAMI), 40(4):973\u2013986, 2018.\n\n12\n\n\f", "award": [], "sourceid": 5501, "authors": [{"given_name": "Akshay Raj", "family_name": "Dhamija", "institution": "University of Colorado Colorado Springs"}, {"given_name": "Manuel", "family_name": "G\u00fcnther", "institution": "Vision and Security Technology Lab (VaST)"}, {"given_name": "Terrance", "family_name": "Boult", "institution": "University of Colorado Colorado Springs"}]}