{"title": "Are Labels Required for Improving Adversarial Robustness?", "book": "Advances in Neural Information Processing Systems", "page_first": 12214, "page_last": 12223, "abstract": "Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR- 10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-the- art on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training.", "full_text": "Are Labels Required for Improving\n\nAdversarial Robustness?\n\nJonathan Uesato\u21e4\n\nJean-Baptiste Alayrac\u21e4\n\nPo-Sen Huang\u21e4\n\nRobert Stanforth\n\nAlhussein Fawzi\n\nPushmeet Kohli\n\n{juesato,jalayrac,posenhuang}@google.com\n\nDeepMind\n\nAbstract\n\nRecent work has uncovered the interesting (and somewhat surprising) \ufb01nding that\ntraining models to be invariant to adversarial perturbations requires substantially\nlarger datasets than those required for standard classi\ufb01cation. This result is a key\nhurdle in the deployment of robust machine learning models in many real world\napplications where labeled data is expensive. Our main insight is that unlabeled\ndata can be a competitive alternative to labeled data for training adversarially robust\nmodels. Theoretically, we show that in a simple statistical setting, the sample\ncomplexity for learning an adversarially robust model from unlabeled data matches\nthe fully supervised case up to constant factors. On standard datasets like CIFAR-\n10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled\ndata improves robust accuracy by 21.7% over using 4K supervised examples alone,\nand captures over 95% of the improvement from the same number of labeled\nexamples. Finally, we report an improvement of 4% over the previous state-of-the-\nart on CIFAR-10 against the strongest known attack by using additional unlabeled\ndata from the uncurated 80 Million Tiny Images dataset. This demonstrates that\nour \ufb01nding extends as well to the more realistic case where unlabeled data is also\nuncurated, therefore opening a new avenue for improving adversarial training.\n\n1\n\nIntroduction\n\nDeep learning has revolutionized many areas of research such as natural language processing, speech\nrecognition or computer vision. System based on these techniques are now being developed and\ndeployed for a wide variety of applications, from recommending/ranking content on the web [13, 22]\nto autonomous driving [7] and even in medical diagnostics [14]. The safety-critical nature of some of\nthese tasks necessitates the need for ensuring that the deployed models are robust and generalize well\nto all sorts of variations that can occur in the inputs. Yet, it has been shown that the commonly used\ndeep learning models are vulnerable to adversarial perturbations in the input [43], e.g. it is possible\nto fool an image classi\ufb01er into predicting arbitrary classes by carefully choosing perturbations\nimperceptible to the human eye.\nSince the discovery of these results, many approaches have been developed to prevent this type of\nbehaviour. One of the most effective and popular approaches is known as supervised adversarial\ntraining [17, 30] which works by generating adversarial samples in an online manner through an\ninner optimization procedure and then using them to augment the standard training set. Despite\n\n*Equal contribution, random order.\n\u2020The authors declare that the present paper is independent of \u201cUnlabeled Data Improves Adversarial\n\nRobustness\u201d [10].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fsubstantial work in this space, accuracy of classi\ufb01ers on adversarial inputs remains much lower than\nthat on normal inputs. Recent theoretical work has offered a reason for this discrepancy, and argues\nthat training models to be invariant to adversarial perturbations requires substantially larger datasets\nthan those required for the standard classi\ufb01cation task [41].\nThis result is a key hurdle in the development and deployment of robust machine learning models\nin many real world applications where labeled data is expensive. Our central hypothesis is that\nadditional unlabeled examples may suf\ufb01ce for adversarial robustness. Intuitively, this is based on\ntwo related observations, explained in Sections 3 and 4.1.2. First, adversarial robustness depends on\nthe smoothness of the classi\ufb01er around natural images, which can be estimated from unlabeled data.\nSecond, only a relatively small amount of labeled data is needed for standard generalization. Thus, if\nadversarial training is robust to label noise, labels from supervised examples can be propagated to\nunsupervised examples to train a smoothed classi\ufb01er with improved adversarial robustness.\nMotivated by this, we explore Unsupervised Adversarial Training (UAT) to use unlabeled data for\nadversarial training. We study this algorithm in a simple theoretical setting, proposed by [41] to\nstudy adversarial generalization. We show that once we are given a single labeled example, the\nsample complexity of UAT matches the fully supervised case up to constant factors. In independent\nand concurrent work, [10, 53, 33] also study the use of unlabeled data for improving adversarial\nrobustness, which we discuss in Section 2.\nExperimentally, we \ufb01nd strong support for our main hypothesis. On CIFAR-10 and SVHN, with\nvery limited annotated data, our method reaches robust accuracies of 54.1% and 84.4% respectively\nagainst the FGSM20 attack [26]. These numbers represent a signi\ufb01cant improvement over purely\nsupervised approaches (32.5% and 66.0%) on same amount of data and almost match methods that\nhave access to full supervision (55.5% and 86.2%), capturing over 95% of their improvement, without\nlabels. Further, we show that we can successfully leverage realistically obtained unsupervised and\nuncurated data to improve the state-of-the-art on CIFAR-10 at \" = 8/255 from 52.58% to 56.30%\nagainst the strongest known attack.\nContributions. (i) In Section 3, we propose a simple and theoretically grounded strategy, UAT, to\nleverage unsupervised data for adversarial training. (ii) We provide, in Section 4.1, strong empirical\nsupport for our initial hypothesis that unlabeled data can be competitive with labeled data when it\ncomes to training adversarially robust classi\ufb01ers, therefore opening a new avenue for improving\nadversarial training. (iii) Finally, by leveraging noisy and uncurated data obtained from web queries,\nwe set a new state-of-the-art on CIFAR-10 without depending on any additional human labeling.\n\n2 Related Work\nAdversarial Robustness. [5, 43] observed that neural networks which achieve extremely high\naccuracy on a randomly sampled test set may nonetheless be vulnerable to adversarial examples,\nor small but highly optimized perturbations of the data which cause misclassi\ufb01cation. Since then,\nmany papers have proposed a wide variety of defenses to the so-called adversarial attacks, though\nfew have proven robust against stronger attacks [1, 8, 46]. One of the most successful approaches for\nobtaining classi\ufb01ers that are adversarially robust is adversarial training [1, 46]. Adversarial training\ndirectly minimizes the adversarial risk by approximately solving an inner maximization problem\nby projected gradient descent (PGD) to generate small perturbations that increase the prediction\nerror, and uses these perturbed examples for training [17, 26, 30]. The TRADES approach in [54]\nimproves these results by instead minimizing a surrogate loss which upper bounds the adversarial\nrisk. Their objective is very similar to the one used in UAT-OT (UAT with online targets introduced\nin Section 3.1) but is estimated purely on labeled rather than unlabeled data.\nCommon to all these approaches, a central challenge is adversarial generalization. For example, on\nCIFAR-10 with a perturbation of \" = 8/255, the adversarially trained model in [30] achieves an\nadversarial accuracy of 46%, despite near 100% adversarial accuracy on the train set. For comparison,\nstandard models can achieve natural accuracy of 96% on CIFAR-10 [52]. Several recent papers have\nstudied generalization bounds for adversarial robustness [2, 23, 41, 51]. Of particular relevance to our\nwork, [41] argues that adversarial generalization may require more data than natural generalization.\nOne solution explored in [21] is to use pretraining on ImageNet, a large supervised dataset, to improve\nadversarial robustness. In this work, we study whether more labeled data is necessary, or whether\nunlabeled data can suf\ufb01ce. While to our knowledge, this question has not been directly studied,\nseveral works such as [20, 40, 42] propose using generative models to detect or denoise adversarial\n\n2\n\n\fexamples, which can in principle be learned on unlabeled data. However, so far, such approaches\nhave not proven to be robust to strong attacks [1, 46].\nSemi-supervised learning. Learning from unlabeled data is an active area of research. The semi-\nsupervised learning approach [11] which, in addition to labeled data, also uses unlabeled data to\nlearn better models is particularly relevant to our work. One of the most effective technique for\nsemi-supervised learning is to use smoothness regularization: the model is trained to be invariant to\nsmall perturbation applied to unsupervised samples [3, 4, 27, 32, 39, 50]. Of particular relevance to\nUAT, [32] also uses adversarial perturbations to smooth the model outputs. In addition, co-training\n[6] and recent extensions [12, 37] use the most con\ufb01dent predictions on unlabeled data to iteratively\nconstruct additional labeled training data. These work all focus on improving standard generalization\nwhereas we explore the use of similar ideas in the context of adversarial generalization.\nSemi-supervised learning for adversarial robustness. The observation that adversarial robustness\ncan be optimized without labels was made independently and concurrently by [10, 33, 53]. Of\nparticular interest, [10] proposes a meta-algorithm Robust Self-Training (RST), similar to UAT.\nIndeed, the particular instantiation of RST used in [10] and the \ufb01xed-target variant of UAT are nearly\nequivalent: the difference is whether the base algorithm minimizes the robust loss from [54] or the\nvanilla adversarial training objective [30]. Their results also provide strong, independent evidence\nthat unlabeled and uncurated examples improve robustness on both CIFAR-10 and SVHN.\n\n3 Unsupervised Adversarial Training (UAT)\n\nIn this section, we introduce and motivate our approach, Unsupervised Adversarial Training (UAT),\nwhich enables the use of unlabeled data to train robust classi\ufb01ers.\nNotation. Consider the classi\ufb01cation problem of learning a predictor f\u2713 to map inputs x 2X to labels\ny 2Y . In this work, f is of the form: f\u2713(x) = arg maxy2Y p\u2713(y|x), where p\u2713(.|x) is parameterized\nby a neural network. We assume data points (x, y) are i.i.d. samples from the data-generating joint\ndistribution P (X, Y ) over X\u21e5Y . P (X) denotes the unlabeled distribution over X obtained by\nmarginalizing out Y in P (X, Y ). We assume access to a labeled training set Sn = {(xi, yi)}1\uf8ffi\uf8ffn,\nwhere (xi, yi) \u21e0 P (X, Y ) and an unlabeled training set Um = {xi}1\uf8ffi\uf8ffm, where xi \u21e0 P (X).\nEvaluation of Adversarial Robustness. The natural risk is Lnat(\u2713) = E(x,y)\u21e0P (X,Y ) `(y, f\u2713(x)),\nwhere ` is the 01 loss. Our primary objective is minimizing adversarial risk: Ladv(\u2713) =\nEP (X,Y ) supx02N\u270f(x) `(y, f\u2713(x0)). As is common, the neighborhood N\u270f(x) is taken in this work to\nbe the L1 ball: N\u270f(x) = {x0 : kx0 xk1 \uf8ff \u270f}. Because the inner maximization cannot be solved\nexactly, we report the surrogate adversarial risk Lg(\u2713) = EP (X,Y ) `(f\u2713(x0), y), where x0 = g(x, y, \u2713)\nis an approximate solution to the inner maximization computed by some \ufb01xed adversary g. Typically,\ng is (a variant of) projected gradient descent (PGD) with a \ufb01xed number of iterations.\n\n3.1 Unsupervised Adversarial Training (UAT)\n\nMotivation. As discussed in the introduction, a central challenge for adversarial training has been\nthe dif\ufb01culty of adversarial generalization. Previous work has argued that adversarial generalization\nmay simply require more data than natural generalization. We ask a simple question: is more labeled\ndata necessary, or is unsupervised data suf\ufb01cient? This is of particular interest in the common setting\nwhere unlabeled examples are dramatically cheaper to acquire than labeled examples (m n).\nFor example, for large-scale image classi\ufb01cation problems, unlabeled examples can be acquired by\nscraping images off the web, whereas gathering labeled examples requires hiring human labelers.\nWe now consider two algorithms to study this question. Both approaches are simple \u2013 we emphasize\nthe point that large unlabeled datasets can help bridge the gap between natural and adversarial\ngeneralization. Later, in Sections 3.2 and 4, we show that both in a simple theoretical model and\nempirically, unlabeled data is in fact competitive with labeled data. In other words, for a \ufb01xed number\nof additional examples, we observe similar improvements in adversarial robustness regardless of\nwhether or not they are labeled.\nStrategy 1: Unsupervised Adversarial Training with Online Targets (UAT-OT). We note\nthat adversarial risk can be bounded as Ladv = Lnat + (Ladv L nat) \uf8ffL nat +\nEP (X,Y ) supx02N\u270f(x) `(f\u2713(x0), f\u2713(x)), similarly to the decomposition in [54]. We refer to the \ufb01rst\n\n3\n\n\fterm as the classi\ufb01cation loss, and the second terms as the smoothness loss. Even for adversarially\ntrained models, it has been observed that the smoothness loss dominates the classi\ufb01cation loss on the\ntest set, suggesting that controlling the smoothness loss is the key to adversarial generalization. For\nexample, the adversarially trained model in [30] achieves natural accuracy of 87% but adversarial\naccuracy of 46% on CIFAR-10 at \" = 8/255.\nNotably, the smoothness loss has no dependence on labels, and thus can be minimized purely through\nunsupervised data. UAT-OT directly minimizes a differentiable surrogate of the smoothness loss on\nthe unlabeled data. Formally, we use the loss introduced in [32] and also used in [54]\n\nLOT\nunsup(\u2713) = E\n\nx\u21e0P (X)\n\nsup\n\nx02N\u270f(x)D(p\u02c6\u2713(.|x), p\u2713(.|x0)),\n\n(1)\n\nwhere D is the Kullback-Leibler divergence, and \u02c6\u2713 indicates a \ufb01xed copy of the parameters \u2713 in order\nto stop the gradients from propagating. While [32], which primarily focuses on natural generalization,\nuses a single step approximation of the inner maximization, we use an iterative PGD adversary, since\nprior work indicates strong adversaries are crucial for effective adversarial training [26, 30].\nStrategy 2: Unsupervised Adversarial Training with Fixed Targets (UAT-FT). This strategy\ndirectly leverages the gap between standard generalization and adversarial generalization. The main\nidea is to \ufb01rst train a base classi\ufb01er for standard generalization on the supervised set Sn. Then, this\nmodel is used to estimate labels, hence \ufb01xed targets, on the unsupervised set Um. This allows us to\nemploy standard supervised adversarial training using these \ufb01xed targets. Formally, it corresponds to\nusing the following loss:\n\nLF T\nunsup(\u2713) = E\n\nx\u21e0P (X)\n\nsup\n\nx02N\u270f(x)\n\nxent(\u02c6y(x), p\u2713(.|x0)),\n\n(2)\n\nunsup (UAT-OT), LFT\n\nwhere xent is the cross entropy loss and \u02c6y(x) is a pseudo-label obtained from a model trained for\nstandard generalization on Sn alone. Thus, provided a suf\ufb01ciently large unlabeled dataset, UAT-FT\nrecovers a smoothed version of the base classi\ufb01er, which matches the predictions of the base classi\ufb01er\non clean data, while maintaining stability of the predictions within local neighborhoods of the data.\nOverall training. For the overall objective, we use a weighted combination of the supervised loss\nand the chosen unsupervised loss, controlled by a hyperparameter : L(\u2713) = Lsup(\u2713) + Lunsup(\u2713).\nThe unsupervised loss can be either LOT\nunsup (UAT-FT) or both (UAT++). Finally,\nnote that the unsupervised loss can also be used on the samples of the supervised set by simply adding\nthe xi\u2019s of Sn in Um. The pseudocode and implmenetation details are described in Appendix A.1.\n3.2 Theoretical model\nTo improve our understanding of the effects of unlabeled data, we study the simple setting proposed\nby [41] to analyze the required sample complexity of adversarial robustness.\nDe\ufb01nition 1 (Gaussian model [41]). Let \u2713\u21e4 2 Rd be the per-class mean vector and let > 0 be the\nvariance parameter. Then the (\u2713\u21e4, )-Gaussian model is de\ufb01ned by the following distribution over\n(x, y) 2 Rd \u21e5 {\u00b11}: First, draw a label y 2 {\u00b11} uniformly at random. Then sample the data point\nx 2 Rd from N (y \u00b7 \u2713\u21e4, 2I).\nIn [41], this setting was chosen to model the empirical observation that adversarial generalization\nrequires more data than natural generalization. They provide an algorithm which achieves \ufb01xed,\narbitrary (say, 1%) accuracy using a single sample. However, to achieve the same adversarial\naccuracy, they show that any algorithm requires at least c1\"2pd / log d samples and provide an\nalgorithm requiring n c2\"2pd samples, for \ufb01xed constants c1, c2.\n\nHere, we show that this sample complexity can be dramatically improved by replacing labeled\nexamples with unlabeled samples. We \ufb01rst de\ufb01ne an analogue of UAT-FT to leverage unlabeled data\nin this setting. For training an adversarially robust classi\ufb01er, the algorithm in [41] computes a sample\nmean of per-point estimates. We straightforwardly adapt this procedure for unlabeled data, as in\nUAT-FT: we \ufb01rst estimate a base classi\ufb01er from the labeled examples, then compute a sample mean\nusing \ufb01xed targets from this base classi\ufb01er.\nDe\ufb01nition 2 (Gaussian UAT-FT). Given n labeled examples (x1, y1), . . . , (xn, yn) and m unlabeled\nexamples xn+1, . . . , xn+m, let \u02c6wsup denote the sample mean estimator on labeled examples: \u02c6wsup =\n\n4\n\n\fPn\ni=1 yixi. The UAT-FT estimator is then de\ufb01ned as the sample mean \u02c6w =Pn+m\n\u02c6yi = f \u02c6wsup(xi).\nTheorem 1 states that in contrast to the purely supervised setting which requires O(pd / log d )\nexamples, in the semi-supervised setting, a single labeled example, along with O(pd ) examples are\nsuf\ufb01cient to achieve \ufb01xed, arbitrary accuracy.\nTheorem 1. Consider the (\u2713\u21e4, )-Gaussian model with k\u2713\u21e4k2 = pd and \uf8ff 1\n\n32 d1/4. Let \u02c6w be the\nthe UAT-FT estimator as in De\ufb01nition 2. Then with high probability, for n = 1, the linear classi\ufb01er\nf \u02c6w has `\u270f\n1\n\ni=n+1 \u02c6yixi where\n\n-robust classi\ufb01cation error at most 1% if\n\nm c\u270f2pd\n\nwhere c is a \ufb01xed, universal constant. The proof is deferred to Appendix G. For ease of comparison,\nwe consider the same Gaussian model parameters k\u2713\u21e4k2 and as used in [41]. The sample complexity\nin Theorem 1 matches the sample complexity of the algorithm provided in [41] up to constant factors,\ndespite using unlabeled rather than labeled examples. We now turn to empirical investigation of\nwhether this result is re\ufb02ected in practical settings.\n\n4 Experiments\n\nIn section 4.1, we \ufb01rst investigate our primary question: for adversarial robustness, can unlabeled\nexamples be competitive with labeled examples? These operate in the standard semi-supervised\nsetting where we use a small fraction of the original training set as Sn, and provide varying amounts\nof the remainder as Um. After observing high robustness, particularly for UAT-FT and UAT++, we\nrun several controlled experiments in section 4.1.2 to understand why this approach works well. In\nsection 4.2, we explore the robustness of UAT to shift in the distribution P (X). Finally, we use UAT\nto improve existing state-of-the-art adversarial robustness on CIFAR-10, using the 80 Million Tiny\nImages dataset as our source of unlabeled data.\n\n4.1 Adversarial robustness with few labels\nExperimental setup. We run experiments on the CIFAR-10 and SVHN datasets, with L1 constraints\nof \" = 8/255 and \" = 0.01 respectively, which are standard for studying adversarial robustness\nof image classi\ufb01ers [18, 30, 54, 48]. For adversarial evaluation, we report against 20-step iterative\nFGSM [26], for consistency with previous state-of-the-art [54]. In our later experiments for Section\n4.2.2, we also evaluate against a much stronger attack, MultiTargeted [19], which provides a more\naccurate proxy for the adversarial risk. As we demonstrate in Appendix E.1, the MultiTargeted\nattack is signi\ufb01cantly stronger than an expensive PGD attack with random restarts, which is in turn\nsigni\ufb01cantly stronger than FGSM20. We follow previous work [30, 54] for our choices of model\narchitecture, data preprocessing, and hyperparameters, which are detailed in Appendix A.\nTo study the effect of unlabeled data, we randomly split the existing training set into a small supervised\nset Sn and use the remaining N n training examples as a source of unlabeled data. We also split\nout 10000 examples from the training set to use as validation, for both CIFAR-10 and SVHN, since\nneither dataset comes with a validation set. We then study the effect on robust accuracy of increasing\nm, the number of unsupervised samples, across different regimes (m \u21e1 n vs. m n).\nBaselines. We compare results with the two strongest existing supervised approaches, standard\nadversarial training [30] and TRADES [54], which do not use unsupervised data. We also compare\nto VAT [32], which was designed for standard semi-supervised learning but can be adapted for\nunsupervised adversarial training as explained in Appendix B. Finally, to compare the bene\ufb01ts of\nlabeled and unlabeled data, we compare to the supervised oracle, which represents the best possible\nperformance, where the model is provided the ground-truth label even for samples from Um.\n4.1.1 Main results\nWe \ufb01rst test the hypothesis that for adversarial robustness, additional unlabeled data is competitive\nwith additional labeled data. Figure 1 summarizes the results. We report the adversarial accuracy for\nvarying m, when n is \ufb01xed to 4000 and 1000, for CIFAR-10 and SVHN respectively.\n\n5\n\n\fFigure 1: Comparison of labeled data and unsupervised data for improving adversarial generalization on\nCIFAR-10 (left,a) and SVHN (right,b)\n\nComparison to baselines. All models show signi\ufb01cant improvements in adversarial robustness over\nthe baselines for all numbers of unsupervised samples. With the maximum number (32k / 60k) of\nunlabeled images, even the weakest UAT model, UAT-OT, shows 12.9% / 16.9% improvement over\nthe baselines not leveraging unlabeled data, and 6.4% / 1.6% improvement over VAT on CIFAR-10\nand SVHN, respectively.\nComparison between UAT variants. We compare the results of 3 different UAT variants: UAT-\nOT, UAT-FT, and UAT++. Comparing UAT-FT and UAT-OT, when there are larger number of\nunsupervised samples, we observe that the UAT-FT shows a signi\ufb01cant improvement compared to\nUAT-OT on CIFAR-10, UAT-OT performs similarly to UAT-FT on SVHN. With smaller numbers of\nunsupervised samples, the two approaches perform similarly. Empirically, we observe that UAT++,\nwhich combines the two approaches, outperforms either individually. We thus primarily use UAT++\nfor our later experiments.\nComparison to the oracle. Figure 1 provides strong support for our main hypothesis. In particular,\nwe observe that when using large unsupervised datasets, UAT++ performs nearly as well as the\nsupervised oracle. In Fig. 1a, with 32K unlabeled examples, UAT++ achieves 54.1% on CIFAR-10,\nwhich is 1.4% lower than the supervised oracle. Similarly, with 60K unlabeled data, in Fig. 1b,\nUAT++ achieves 84.4% on SVHN which is 1.8% lower than the supervised oracle.\nConclusion. We demonstrate that, leveraging large amounts of unlabeled examples, UAT++ achieves\nsimilar adversarial robustness to supervised oracle, which uses label information. In particular,\nwithout requiring labels, UAT++ captures over 97.6% / 97.9% of the improvement from 32K / 60K\nadditional examples compared with supervised oracle on CIFAR-10 and SVHN, respectively.\n\n4.1.2 Label noise analysis\n\nGiven the effectiveness of UAT-FT and UAT++, we perform an ablation study on the impact of label\nnoise on UAT for adversarial robustness.\nExperimental setup. To do so, we \ufb01rst divide the CIFAR-10 training set into halves, where the \ufb01rst\n20K examples are used for training the base classi\ufb01er and the latter 20K are used to train a UAT\nmodel. Of the latter 20K, we treat 4K examples as labeled, as in Section 4.1.1, and the remaining\n16K as unlabeled. We consider two different approaches to introducing label noise. For UAT-FT\n(Correlated), we produce pseudo-labels using the UAT-FT procedure, where the number of training\nexamples used for the base classi\ufb01er varies between between 500 and 20K. This produces base\nclassi\ufb01ers with error rates between 7% and 48%. For UAT-FT (Random), we randomly \ufb02ip the label\nto a randomly selected incorrect class. The results are shown in Figure 2.\nAnalysis. In Fig. 2a, in the UAT-FT (Random) case, adversarial accuracy is relatively \ufb02at between\n1% and 20%. Even with 50% of the examples mislabeled, the decrease in robust accuracy is less\nthan 10%. At the highest level of noise, UAT-FT still obtains a 8.0% improvement in robustness\naccuracy over the strongest baseline which does not exploit unsupervised data. Similarly, in the\n\n6\n\n4k8k16k32kNumberofunsupervisedsamplesm15%20%25%30%35%40%45%50%55%Adv.accuracyusingFGSM20UAT++UAT-FTUAT-OTVATSupervisedOracleAdv.TrainingTRADES\fFigure 2: Effects of label noise on adversarial (left, a) and natural (right, b) accuracies, on CIFAR-10\n\nUAT-FT (Correlated) case, robust accuracy is relatively \ufb02at between 7% and 23% noise level, and\neven at 48% corrupted labels, UAT-FT outperforms the purely supervised baselines by 6.3%.\nTo understand these results, we believe that the main function of the unsupervised data in UAT is to\nimprove generalization of the smoothness loss, rather than the classi\ufb01cation loss. While examples with\ncorrupted labels have limited utility for improving classi\ufb01cation accuracy, they can still be leveraged\nto improve the smoothness loss. This is most obvious in UAT-OT, which has no dependence on the\npredicted labels (and is thus a \ufb02at line in Figure 2a). However, Figure 2a supports the hypothesis\nthat UAT-FT also works similarly, given its effectiveness even in cases where up to half of the\nlabels are corrupted. As mentioned in Section 3, because generalization gap of the classi\ufb01cation\nloss is typically already small, controlling generalization of the smoothness loss is key to improved\nadversarial robustness.\nComparison to standard generalization. We compare the robustness of UAT-FT to label noise,\nto an analogous pseudo-labeling technique applied to natural generalization. Comparing between\nFigures 2a and 2b, we observe that with increasing label noise, the rate of degradation in robustness of\nadversarial trained models is much lower than the rate of degradation in accuracy of models obtained\nwith standard training. In particular, while standard training procedures can be robust to random label\nnoise, as observed in previous work [36, 38], accuracy decreases almost one-to-one (slope -0.78)\nwith correlated errors. This is natural, as with a very large unsupervised dataset, we expect to recover\nthe base classi\ufb01er (modulo the 4k additional supervised examples).\nConclusion. UAT shows signi\ufb01cant robustness to label noise, achieving an 8.0% improvement over\nthe baseline even with nearly 50% error in the base classi\ufb01er. We hypothesize that this is primarily\nbecause UAT operates primarily on the smoothness loss, rather than the classi\ufb01cation loss, and is\nthus less dependent on the pseudo-labels.\n\n4.2 Unsupervised data with distribution shift\n\nMotivation. In Section 4.1, we studied the standard semi-supervised setting, where P (X) is the\nmarginal of the joint distribution P (X, Y ). As pointed out in [35], real-world unlabeled datasets\nmay involve varying degrees of distribution shift from the labeled distribution. For example, images\nfrom CIFAR-10, even without labels, required human curation to not only restrict to images of the\nchoosen 10 classes but also to ensure that selected images were photo-realistic (line drawings were\nrejected) or that only one instance of the object was present (see Appendix C of [25] for the full\nlabeler instruction sheet). We thus study whether our approach is robust to such distribution shift,\nallowing us to fully leverage data which is not only unlabeled, but also uncurated.\nWe use the 80 Million Tiny Images [44] dataset (hereafter, 80m) as our uncurated data source, a\nlarge dataset obtained by web queries for 75,062 words. Because collecting this dataset required no\nhuman \ufb01ltering, it provides a perfect example of uncurated data that is cheaply available at scale.\nNotably, CIFAR-10 is a human-labeled subset of 80m, which has been restricted to 10 classes.\nPreprocessing. Because the majority of 80m contains images distinct from the CIFAR-10 classes, we\napply an automated \ufb01ltering technique similar to [50], detailed in Appendix C. Brie\ufb02y, we \ufb01rst restrict\n\n7\n\n05101520253035404550Fractionofcorruptedlabels(in%)10%15%20%25%30%35%40%45%50%55%Adv.accuracyusingFGSM20UAT-FT(Correlated)UAT-FT(Random)UAT-OTSupervised(Clean)TRADES(4k)05101520253035404550Fractionofcorruptedlabels(in%)50%55%60%65%70%75%80%85%90%95%TestaccuracyAnatSupervised(Correlated)Supervised(Random)Supervised(Clean)Supervised(4k)\fMethod\n[48]\nAT [30]\n[55]\n[26]\n[21]\nAT-Reimpl. [30]\nTRADES [54]\nUAT++\nUAT++\nUAT++\nUAT++\nUAT++\n\nImageNet + CIFAR-10\n\nSup. Data\nCIFAR-10\nCIFAR-10\nCIFAR-10\nCIFAR-10\n\nCIFAR-10\nCIFAR-10\nCIFAR-10\nCIFAR-10\nCIFAR-10\nCIFAR-10\nCIFAR-10\n\nUnsup. Data\n\nNetwork\n\nWRN-28\n\n-\n\n-\n-\n\n7\n7\n7\n7\n7\n7\n7\n\nWRN-28\nWRN-34\nWRN-34\n80m@100K WRN-34\n80m@200K WRN-34\n80m@500K WRN-34\n80m@200K WRN-70\n80m@200K WRN-106\n\nAnat\n27.07%\n87.30%\n94.64%\n85.25%\n87.1%\n87.08%\n84.92%\n86.04%\n85.85%\n78.34%\n86.75%\n86.46%\n\n-\n\n-\n-\n\nAF GSM 20 AM ultiT ar.\n23.54%\n47.04%\n44.54%\n0.15%\n45.89%\n57.40%\n52.93%\n57.11%\n59.41%\n62.18%\n58.04%\n62.89%\n63.65%\n\n\uf8ff52.9%*\n47.10%\n52.58%\n52.64%\n53.35%\n48.99%\n55.04%\n56.30%\n\nTable 1: Experimental results using 80m Tiny Images dataset (as a unsupervised data) and CIFAR-10 (as\nsupervised data), where Anat represents the original test accuracy, AF GSM 20 represents the adversarial accuracy\nunder 20 step FGSM, and AM ultiT ar. represents the adversarial accuracy under the strong MultiTargeted\nattack. WRN-k denotes the Wide-ResNet with depth k. \u2018*\u2019 indicates it is from [21] using 100 PGD steps with\n1000 random restarts, an attack that we have found to be weaker than the MultiTargeted attack.\n\nto images obtained from web queries matching the CIFAR-10 classes, and \ufb01lter out near duplicates\nof the CIFAR-10 test set using GIST features [34, 15]. For each class, we rank the images based\non the prediction con\ufb01dence from a WideResNet-28-10 model pretrained on the CIFAR-10 dataset.\nWe then take the top 10k, 20k, or 50k images per class, to create the 80m@100K, 80m@200K, and\n80m@500K datasets, respectively.\nOverview. We \ufb01rst conduct a preliminary study on the impact of distribution shift in a low data\nregime in Section 4.2.1, and we \ufb01nally demonstrate how UAT can be used to leverage large scale\nrealistic uncurated data in Section 4.2.2.\n\n4.2.1 Preliminary study: Low data regime\nTo study the effect of having unsupervised data from a different distribution, we repeat the same\nexperimental setup described in Section 4.1.1 where we draw Um from 80m@200K rather than\nCIFAR-10. Results are given in Figure 3. For simplicity, we report our best performing method,\nUAT++, in both settings: Um \u21e2 80m@200K and Um \u21e2 CIFAR-10. First, we observe that when\nusing 32K images from unsupervised data, either UAT++ (80m@200K) or UAT++ (CIFAR-10)\noutperforms the baseline, TRADES [54], which only uses the 4K supervised examples. Speci\ufb01cally,\nUAT++ with 80m@200K achieves 48.6% robust accuracy, a 16.2% improvement over TRADES. On\nthe other hand, UAT++ performs substantially better when Um is drawn from CIFAR-10 rather than\n80m@200K, by a margin of 5.5% with 32K unlabeled examples.\nConclusion. While unlabeled data from the same dis-\ntribution is signi\ufb01cantly better than off-distribution un-\nlabeled data, the off-distribution unlabeled data is still\nmuch better than no unsupervised data at all. In the\nnext section, we explore scaling up the off-distribution\ncase.\n\n4.2.2 Large scale regime\nWe now study whether uncurated data alone can be\nleveraged to improve the state-of-the-art for adversar-\nial robustness on CIFAR-10. For these experiments,\nwe use subsets of 80m in conjunction with the full\nCIFAR-10 training set. Table 1 summarizes the re-\nsults. We report adversarial accuracies against two\nattacks. First, we consider the FGSM [17, 26] attack with 20 steps (FGSM20) to allow for direct\ncomparison with previous state-of-the-art [54]. Second, we evaluate against the MultiTargeted\nattack, which we \ufb01nd to be signi\ufb01cantly stronger than the commonly used PGD attack with random\nrestarts. Details are provided in Appendix E.1.\n\nFigure 3: Distribution shift on CIFAR-10\n\n8\n\n4k8k16k32kNumberofunsupervisedsamplesm20%25%30%35%40%45%50%55%Adv.accuracyusingFGSM20UAT++,UmCIFAR-10UAT++,Um80m@200TRADES\fBaselines. For baseline models, we evaluate the models released by [30, 54]. For fair comparison\nwith our setup, we also reimplement adversarial training (AT-Reimpl. [30]) using the same attack we\nuse for UAT, which we found to be slightly more ef\ufb01cient than the original attack. This is detailed in\nAppendix A.3. We also compare to [21], which uses more labeled data by pretraining on ImageNet.\nAll other reported numbers are taken from [54].\nComparison with same model. First, we compare UAT++ with three different sets of unsupervised\ndata (80m@100K, 80m@200K, and 80m@500K) using the same model architecture (WRN-34)\nas in TRADES. In all cases, we outperform TRADES under FGSM20. When using 80m@200K,\nwe improve upon TRADES by 5.07% under FGSM20and 0.77% under the MultiTargeted attack.\nWe note the importance of leveraging more unsupervised data when going from 80m@100K to\n80m@200K. However, performance degrades when using 80m@500K which we attribute to the fact\nthat 80m@500K contains signi\ufb01cantly more out-of-distribution images. Finally, comparing with\nthe recent work of [21], we note that using more unsupervised data can outperform using additional\nsupervised data for pretraining.\nFurther analysis. We run several additional checks against gradient masking [1, 45, 46], detailed in\nAppendix E. We show that a gradient-free attack, SPSA [46], does not lower accuracy compared to\nuntargeted PGD (Appendix E.2), visualize loss landscapes (Appendix E.3), and empirically analyze\nattack convergence (Appendix E.4). Overall, we do not \ufb01nd evidence that other attacks could\noutperform the MultiTargeted attack.\nA new state-of-the-art on CIFAR-10. Finally, when using these signi\ufb01cantly larger training sets,\nwe observe signi\ufb01cant under\ufb01tting, where robust accuracy is low even on the training set. We thus\nalso explore using deeper models. We observe that UAT++ trained on the 80m@200K unsupervised\ndataset using WRN-106 achieves state-of-the-art performance, +6.54% under FGSM20and +3.72%\nagainst MultiTargeted attack, compared to TRADES [54]. Our trained model is available on our\nrepository.1\n\n5 Conclusion\n\nDespite the promise of adversarial training, its reliance on large numbers of labeled examples has\npresented a major challenge towards developing robust classi\ufb01ers. In this paper, we hypothesize that\nannotated data might not be as important as commonly believed for training adversarially robust\nclassi\ufb01ers. To validate this hypothesis, we introduce two simple UAT approaches which we tested\non two standard image classi\ufb01cation benchmarks. These experiments reveal that indeed, one can\nreach near state-of-the-art adversarial robustness with as few as 4K labels for CIFAR-10 (10 times\nless than the original dataset) and as few as 1K labels for SVHN (100 times less than the original\ndataset). Further, we demonstrate that our method can also be applied to uncurated data obtained\nfrom simple web queries. This approach improves the state-of-the-art on CIFAR-10 by 4% against\nthe strongest known attack. These \ufb01ndings open a new avenue for improving adversarial robustness\nusing unlabeled data. We believe this could be especially important for domains such as medical\napplications, where robustness is essential and gathering labels is particularly costly [16].\n\nAcknowledgements. We would like to especially thank Sven Gowal for helping us evaluate with the\nMultiTargeted attack and for the loss landscape visualizations, as well as insightful discussions\nthroughout this project. We would also like to thank Andrew Zisserman, Catherine Olsson, Chongli\nQin, Relja Arandjelovi\u00b4c, Sam Smith, Taylan Cemgil, Tom Brown, and Vlad Firoiu for helpful\ndiscussions throughout this work.\n\nReferences\n[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing\n\ndefenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018. 2, 3, 9, 16\n\n[2] I. Attias, A. Kontorovich, and Y. Mansour. Improved generalization bounds for robust learning. In ALT,\n\n2018. 2\n\n[3] P. Bachman, O. Alsharif, and D. Precup. Learning with pseudo-ensembles. In NeurIPS, 2014. 3\n\n1https://github.com/deepmind/deepmind-research/tree/master/unsupervised_\n\nadversarial_training\n\n9\n\n\f[4] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel. MixMatch: A Holistic\n\nApproach to Semi-Supervised Learning. arXiv:1905.02249, 2019. 3\n\n[5] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. \u0160rndi\u00b4c, P. Laskov, G. Giacinto, and F. Roli. Evasion\nattacks against machine learning at test time. In Joint European Conference on Machine Learning and\nKnowledge Discovery in Databases, pages 387\u2013402. Springer, 2013. 2\n\n[6] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the\n\nEleventh Annual Conference on Computational Learning Theory, pages 92\u2013100. ACM, 1998. 3\n\n[7] M. Bojarski, D. D. Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort,\nU. Muller, J. Zhang, X. Zhang, J. Zhao, and K. Zieba. End to end learning for self-driving cars. CoRR,\nabs/1604.07316, 2016. 1\n\n[8] N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods.\nIn Proceedings of the 10th ACM Workshop on Arti\ufb01cial Intelligence and Security, pages 3\u201314. ACM, 2017.\n2\n\n[9] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In Security and Privacy\n\n(SP), 2017 IEEE Symposium on, pages 39\u201357. IEEE, 2017. 14\n\n[10] Y. Carmon, A. Raghunathan, L. Schmidt, P. Liang, and J. C. Duchi. Unlabeled Data Improves Adversarial\n\nRobustness. In NeurIPS, 2019. 1, 2, 3\n\n[11] O. Chapelle, B. Scholkopf, and A. Zien. Semi-Supervised Learning. MITPress, 2009. 3\n[12] D.-D. Chen, W. Wang, W. Gao, and Z.-H. Zhou. Tri-net for semi-supervised deep learning. In IJCAI, 2018.\n\n3\n\n[13] P. Covington, J. Adams, and E. Sargin. Deep neural networks for Youtube recommendations.\nProceedings of the 10th ACM Conference on Recommender Systems, pages 191\u2013198. ACM, 2016. 1\n\nIn\n\n[14] J. De Fauw, J. R. Ledsam, B. Romera-Paredes, S. Nikolov, N. Tomasev, S. Blackwell, H. Askham,\nX. Glorot, B. O\u2019Donoghue, D. Visentin, et al. Clinically applicable deep learning for diagnosis and referral\nin retinal disease. Nature Medicine, 24(9):1342, 2018. 1\n\n[15] M. Douze, H. J\u00e9gou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-\nscale image search. In Proceedings of the ACM International Conference on Image and Video Retrieval,\npage 19. ACM, 2009. 8, 15\n\n[16] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane. Adversarial attacks on\n\nmedical machine learning. Science, 2019. 9\n\n[17] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv\n\npreprint arXiv:1412.6572, 2014. 1, 2, 8\n\n[18] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato, R. Arandjelovic, T. Mann, and P. Kohli.\nOn the effectiveness of interval bound propagation for training veri\ufb01ably robust models. arXiv preprint\narXiv:1810.12715, 2018. 5\n\n[19] S. Gowal, J. Uesato, C. Qin, P.-S. Huang, T. Mann, and P. Kohli. An alternative surrogate loss for\n\nPGD-based adversarial testing. 2019. 5, 15\n\n[20] S. Gu and L. Rigazio. Towards deep neural network architectures robust to adversarial examples. arXiv\n\npreprint arXiv:1412.5068, 2014. 2\n\n[21] D. Hendrycks, K. Lee, and M. Mazeika. Using pre-training can improve model robustness and uncertainty.\n\nIn ICML, 2019. 2, 8, 9\n\n[22] P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models\nfor web search using clickthrough data. In Proceedings of the 22nd ACM international conference on\nInformation & Knowledge Management, pages 2333\u20132338. ACM, 2013. 1\n\n[23] J. Khim and P.-L. Loh. Adversarial risk bounds for binary classi\ufb01cation via function transformation. arXiv\n\npreprint arXiv:1810.09519, 2018. 2\n\n[24] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,\n\n2014. 14\n\n[25] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report,\n\n2009. 7, 15\n\n[26] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial Machine Learning at Scale. In ICLR, 2017. 2, 4, 5,\n\n8, 14\n\n[27] S. Laine and T. Aila. Temporal ensembling for semi-supervised learnings. In ICLR, 2017. 3\n[28] B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Annals of\n\nStatistics, pages 1302\u20131338, 2000. 17\n\n[29] Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks.\n\narXiv preprint arXiv:1611.02770, 2016. 14\n\n[30] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards Deep Learning Models Resistant\n\nto Adversarial Attacks. In ICLR, 2018. 1, 2, 3, 4, 5, 8, 9, 12, 14\n\n[31] G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39\u201341, 1995.\n\n15\n\n10\n\n\f", "award": [], "sourceid": 6609, "authors": [{"given_name": "Jean-Baptiste", "family_name": "Alayrac", "institution": "Deepmind"}, {"given_name": "Jonathan", "family_name": "Uesato", "institution": "DeepMind"}, {"given_name": "Po-Sen", "family_name": "Huang", "institution": "DeepMind"}, {"given_name": "Alhussein", "family_name": "Fawzi", "institution": "DeepMind"}, {"given_name": "Robert", "family_name": "Stanforth", "institution": "DeepMind"}, {"given_name": "Pushmeet", "family_name": "Kohli", "institution": "DeepMind"}]}