{"title": "Model-Agnostic Private Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 7102, "page_last": 7112, "abstract": "We design differentially private learning algorithms that are agnostic to the learning model assuming access to limited amount of unlabeled public data. First, we give a new differentially private algorithm for answering a sequence of $m$ online classification queries (given by a sequence of $m$ unlabeled public feature vectors) based on a private training set. Our private algorithm follows the paradigm of subsample-and-aggregate, in which any generic non-private learner is trained on disjoint subsets of the private training set, then for each classification query, the votes of the resulting classifiers ensemble are aggregated in a differentially private fashion. Our private aggregation is based on a novel combination of distance-to-instability framework [Smith & Thakurta 2013] and the sparse-vector technique [Dwork et al. 2009, Hardt & Talwar 2010].  We show that our algorithm makes a conservative use of the privacy budget. In particular, if the underlying non-private learner yields classification error at most $\\alpha\\in (0, 1)$, then our construction answers more queries, by at least a factor of $1/\\alpha$ in some cases, than what is implied by a straightforward application of the advanced composition theorem for differential privacy. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer unlimited number of queries. In the PAC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. As in non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class.", "full_text": "Model-Agnostic Private Learning\n\nRaef Bassily\u2217\n\nOm Thakkar\u2020\n\nAbhradeep Thakurta\u2021\n\nAbstract\n\nWe design differentially private learning algorithms that are agnostic to the learn-\ning model assuming access to a limited amount of unlabeled public data. First,\nwe provide a new differentially private algorithm for answering a sequence of m\nonline classi\ufb01cation queries (given by a sequence of m unlabeled public feature\nvectors) based on a private training set. Our algorithm follows the paradigm of\nsubsample-and-aggregate, in which any generic non-private learner is trained on\ndisjoint subsets of the private training set, and then for each classi\ufb01cation query,\nthe votes of the resulting classi\ufb01ers ensemble are aggregated in a differentially\nprivate fashion. Our private aggregation is based on a novel combination of the\ndistance-to-instability framework [26], and the sparse-vector technique [15, 18].\nWe show that our algorithm makes a conservative use of the privacy budget. In\nparticular, if the underlying non-private learner yields a classi\ufb01cation error of at\nmost \u03b1 \u2208 (0, 1), then our construction answers more queries, by at least a fac-\ntor of 1/\u03b1 in some cases, than what is implied by a straightforward application\nof the advanced composition theorem for differential privacy. Next, we apply the\nknowledge transfer technique to construct a private learner that outputs a classi\ufb01er,\nwhich can be used to answer an unlimited number of queries. In the PAC model,\nwe analyze our construction and prove upper bounds on the sample complexity\nfor both the realizable and the non-realizable cases. Similar to non-private sample\ncomplexity, our bounds are completely characterized by the VC dimension of the\nconcept class.\n\n1\n\nIntroduction\n\nThe main goal in the standard setting of differentially private learning is to design a differentially\nprivate learner that, given a private training set as input, outputs a model (or, a classi\ufb01er) that is\nsafe to publish. Despite being a natural way to de\ufb01ne the private learning problem, there are several\nlimitations with this standard approach. First, there are pessimistic lower bounds in various learning\nproblems implying that the error associated with the \ufb01nal private model will generally have neces-\nsary dependence on the dimensionality of the model [2], or the size of the model class [9]. Second,\nthis approach often requires non-trivial, white-box modi\ufb01cation of the existing non-private learn-\ners [19, 11, 21, 26, 2, 27, 1], which can make some of these constructions less practical since they\nrequire making changes in the infrastructure of the existing systems. Third, designing algorithms\nfor this setting often requires knowledge about the underlying structure of the learning problem,\ne.g., speci\ufb01c properties of the model class [4, 5, 9]; or convexity, compactness, and other geometric\nproperties of the model space [2, 27].\nWe study the problem of differentially private learning when the learner has access to a limited\namount of public unlabeled data. Our central goal is to characterize in a basic model, such as the\nstandard PAC model, the improvements one can achieve for private learning in such a relaxed setting\n\n\u2217Department of Computer Science & Engineering, The Ohio State University. bassily.1@osu.edu\n\u2020Department of Computer Science, Boston University. omthkkr@bu.edu\n\u2021Department of Computer Science, University of California Santa Cruz. aguhatha@ucsc.edu\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\fcompared to the aforementioned standard setting. Towards this goal, we \ufb01rst consider a simpler\nproblem, namely, privately answering classi\ufb01cation queries given by a sequence of public unlabeled\ndata Q = {x1,\u00b7\u00b7\u00b7 , xm}. In this problem, one is given a private labeled dataset denoted by D,\nand the goal is to design an (\u0001, \u03b4)-differentially private algorithm that labels all m public feature\nvectors in Q. In designing such an algorithm, there are four main goals we aim to achieve: (i) We\nwish to provide an algorithm that enables answering as many classi\ufb01cation queries as possible while\nensuring (\u0001, \u03b4)-differential privacy. This is a crucial property for the utility of such an algorithm\nsince the utility in this problem is limited by the number of queries we can answer while satisfying\nthe target privacy guarantee. (ii) We want to have a modular design paradigm in which the private\nalgorithm can use any generic non-private algorithm (learner) in a black-box fashion, i.e., it only\nhas oracle access to the non-private algorithm. This property is very attractive from a practical\nstandpoint as the implementation of such an algorithm does not require changing the internal design\nof the existing non-private algorithms. (iii) We want to have a design paradigm that enables us to\neasily and formally transfer the accuracy guarantees of the underlying non-private algorithm into\nmeaningful accuracy guarantees for the private algorithm. The most natural measure of accuracy in\nthat setting would be the misclassi\ufb01cation rate. (iv) We want to be able to use such an algorithm\ntogether with the public unlabeled data to construct a differentially private learner that outputs a\nclassi\ufb01er, which can then be used to answer as many classi\ufb01cation queries as we wish. In particular,\ngiven the second goal above, the \ufb01nal private learner would be completely agnostic to the intricacies\nof the underlying non-private learner and its model. Namely, it would be oblivious to whether the\nmodel is simple logistic regression, or a multi-layer deep neural network.\nGiven the above goals, a natural framework to consider is knowledge aggregation and transfer,\nwhich is inspired by the early work of Breiman [8]. The general idea is to train a non-private\nlearner on different subsamples from the private dataset to generate an ensemble of classi\ufb01ers. The\nensemble is collectively used in a differentially private manner to generate privatized labels for the\ngiven unlabeled public data. Finally, the public data together with the private labels are used to train\na non-private learner, which produces a \ufb01nal classi\ufb01er that is safe to publish.\nRelated Work: For private learning via knowledge aggregation and transfer, Hamm et al. [17]\nexplored a similar technique, however their construction deviated from the above description. In\nparticular, it was a white-box construction with weak accuracy guarantees; their guarantees also in-\nvolved making strong assumptions about the learning model and the loss function used in training. In\na recent work [23, 24], of which [24] is independent from our work, Papernot et al. gave algorithms\nthat follow the knowledge transfer paradigm described above. Their constructions are black-box.\nHowever, only empirical evaluations are given for their constructions; no formal utility guarantees\nare provided. For the query-answering setting, a recent independent work [12] considers the prob-\nlem of private prediction, but only in the single-query setting, whereas we study the multiple-query\nsetting. The earliest idea of using ensemble classi\ufb01ers to provide differentially private prediction\ncan be traced to Dwork, Rothblum, and Thakurta from 2013.\nOur Techniques: In this work, we give a new construction for privately answering classi\ufb01cation\nqueries that is based on a novel framework combining two special techniques in the literature of dif-\nferential privacy, namely, the subsampling stability framework [22, 26] and the sparse vector tech-\nnique [15, 18, 16]. Our construction also follows the knowledge aggregation and transfer paradigm,\nbut it exploits the stability properties of good non-private learners in a quanti\ufb01able and formal man-\nner. Our construction is based on the following idea: if a good learner is independently trained k\ntimes on equally sized, independent training sets, then one would expect the corresponding output\nclassi\ufb01ers h1,\u00b7\u00b7\u00b7 , hk to predict \u201csimilarly\u201d on a new example from the same distribution. Using\nthis idea, we show that among m classi\ufb01cation queries, one only needs to \u201cpay the price of privacy\u201d\nfor the queries for which there is signi\ufb01cant disagreement among the k classi\ufb01ers. Using our con-\nstruction and the unlabeled public data, we also provide a \ufb01nal private learner. We show via formal\nand quanti\ufb01able guarantees that our construction achieves our four main goals stated earlier.\nWe note that our framework is not restricted to classi\ufb01cation queries; it can be used for privately\nanswering any sequence of online queries that satisfy certain stability properties in the sense of [26].\nDue to space limitations, and to avoid distracting the reader from the main results of this work, we\ndefer the description of the generic framework to the full version, and focus here on the special case\nof classi\ufb01cation queries.\n\n2\n\n\f1.1 Our Contributions\n\nAnswering online classi\ufb01cation queries using the privacy budget conservatively: In Section 3,\nwe give our (\u0001, \u03b4)-differentially private construction for answering a sequence of online classi\ufb01cation\nqueries. Our construction uses any generic non-private learner in a black-box fashion. The privacy\nguarantee is completely independent of the non-private learner and its accuracy. Moreover, the\naccuracy guarantee can be obtained directly from the accuracy of the non-private learner, i.e., the\nconstruction allows us to directly and formally \u201ctransform\u201d the accuracy guarantee for the non-\nprivate learner into an accuracy guarantee for the \ufb01nal private algorithm.\nWe provide a new privacy analysis for the novel framework combining subsampling stability and\nsparse vector techniques. We analyze the accuracy of our algorithm in terms of its misclassi\ufb01cation\nrate, de\ufb01ned as the ratio of misclassi\ufb01ed queries to the total number of queries, in the standard (ag-\nnostic) PAC model. Our accuracy analysis is new, and is based on a simple counting argument. We\nconsider both the realizable and non-realizable (agnostic) cases. In the realizable case, the underly-\ning non-private learner is assumed to be a PAC learner for a hypothesis class H of VC-dimension\nV . The private training set consists of n labeled examples, where the labels are generated by some\nunknown hypothesis h\u2217 \u2208 H. The queries are given by a sequence of m i.i.d. unlabeled domain\npoints drawn from the same distribution as the domain points in the training set. We show that, with\nhigh probability, our private algorithm can answer up to \u2248 n/V queries with a misclassi\ufb01cation rate\nof \u2248 V /n, which is essentially the optimal misclassi\ufb01cation rate attainable without privacy. Thus,\nanswering those queries essentially comes with no cost for privacy. When answering m > n/V\nqueries, the misclassi\ufb01cation rate is \u2248 mV 2/n2. A straightforward application of the advanced\ncomposition theorem of differential privacy would have led to a misclassi\ufb01cation rate \u2248 \u221a\nmV /n,\nwhich can be signi\ufb01cantly larger than our rate. This is because our construction pays a privacy cost\nonly for \u201chard\u201d queries for which the PAC learner tends to be incorrect. Our result for the realiz-\nable case is summarized below. We also provide an analogous statement for the non-realizable case\n(Theorem 3.5).\nInformal Theorem 1.1 (Corresponding to Theorem 3.4). Given a PAC learner for a class H of VC-\ndimension V , a private training set of size n, and assuming realizability, our private construction\n(Algorithm 2) answers a sequence of up to \u02dc\u2126(n/V ) binary classi\ufb01cation queries such that, with\nhigh probability, the misclassi\ufb01cation rate is \u02dcO(V /n). When the number of queries m is beyond\n\u02dc\u2126(n/V ), then with high probability, the misclassi\ufb01cation rate is \u02dcO(mV 2/n2).\n\nA model-agnostic private learner with formal guarantees: In Section 4, we use the knowledge\ntransfer technique to bootstrap a private learner from our construction above. The idea is to use\nour private construction to label a suf\ufb01cient number of public feature vectors. Then, we use these\nnewly labeled public data for training a non-private learner to \ufb01nally output a classi\ufb01er. Since there\nis no privacy constraint associated with the public data, the overall construction remains private\nas differential privacy is closed under post-processing. Note that this construction also uses the\nnon-private learner as a black box, and hence it is agnostic to the structure of such learner and the\nassociated model. This general technique has also been adopted in [23]. Our main contribution here\nis that we provide formal and explicit utility guarantees for the \ufb01nal private learner in the standard\n(agnostic) PAC model. Our guarantees are in terms of upper bounds on the sample complexity\n(x,y)\u223cD [h(x) (cid:54)= y] denote the true\nin both realizable and non-realizable cases. Let err(h;D) (cid:44)\nclassi\ufb01cation error of a hypothesis h. Given black-box access to an agnostic PAC learner for a class\nH of VC-dimension V , we obtain the following results:\ngiven access to m = \u02dcO(cid:0) V\n(cid:1) unlabeled public data points, w.h.p. outputs a classi\ufb01er \u02c6h \u2208 H such\nInformal Theorem 1.2 (Corresponding to Theorems 4.2, 4.3). Let 0 < \u03b1 < 1. Let n be the size\nthat the following guarantees hold: (i) Realizable case: err(\u02c6h;D) \u2264 \u03b1 for n = \u02dcO(cid:0)V 3/2/\u03b13/2(cid:1),\nof the private training set. There exists an (\u0001, \u03b4)-differentially private algorithm (Algorithm 3) that,\nand (ii) Agnostic case: err(\u02c6h;D) \u2264 \u03b1 + O(\u03b3) for n = \u02dcO(cid:0)V 3/2/\u03b15/2(cid:1) , where \u03b3 = min\nh\u2208H err(h;D).\nOur bounds are only a factor of \u02dcO((cid:112)V /\u03b1) worse than the corresponding optimal non-private\n\nbounds. In the agnostic case, however, we note that the accuracy of the output hypothesis in our\ncase has a suboptimal dependency (by a small constant factor) on \u03b3 (cid:44) min\n\nP\n\n\u03b12\n\nh\u2208H err(h;D).\n\n3\n\n\fWe note that the same construction can serve as a private learner in a less restrictive setting where\nonly the labels of the training set are considered private information. This setting is known as label-\nprivate learning, and it has been explored before in [10] and [6]. Both works have only considered\npure, i.e., (\u0001, 0), differentially private learners, and their constructions are white-box, i.e., they do not\nallow for using a black-box non-private learner. The bounds in [10] involve smoothness assumptions\non the underlying distribution. In [6], an upper bound on the sample complexity is derived for the\nrealizable case. Their bound is a factor of O(1/\u03b1) worse than the optimal non-private bound for the\nrealizable case.\n\n2 Preliminaries\n\nIn this section, we formally de\ufb01ne the notation, provide important de\ufb01nitions, and state the main\nexisting results used in this work.\nWe denote the data universe by U = X \u00d7 Y, where X denotes abstract domain for unla-\nbeled data (feature-vector space) and Y = {0, 1}. An n-element dataset is denoted by D =\n{(x1, y1), (x2, y2), . . . , (xn, yn)} \u2208 U n. For any two datasets D, D(cid:48) \u2208 U\u2217, we denote the sym-\nmetric difference between them by D\u2206D(cid:48).\nWe will use the standard notion of agnostic PAC learning [20] (see the full version for a de\ufb01nition).\nWe will also use the following parameterized version of the de\ufb01nition of agnostic PAC learning.\nDe\ufb01nition 2.1 ((\u03b1, \u03b2, n)-learner for a class H). Let \u03b1, \u03b2 \u2208 (0, 1) and n \u2208 N. An algorithm \u0398 is an\n(\u03b1, \u03b2, n) (agnostic PAC) learner if, given an input dataset D of n i.i.d. examples from the underlying\nunknown distribution D, with probability 1\u2212 \u03b2 it outputs a hypothesis hD with err(hD;D) \u2264 \u03b3 + \u03b1,\nwhere err(h;D) (cid:44) P\n\n(x,y)\u223cD [hD(x) (cid:54)= y] and \u03b3 (cid:44) min\n\nh\u2208H err(h;D).\n\nNext, we de\ufb01ne the notion of differential privacy.\nDe\ufb01nition 2.2 ((\u0001, \u03b4)-Differential Privacy [13, 14]). A (randomized) algorithm M with input domain\nU\u2217 and output range R is (\u0001, \u03b4)-differentially private (DP) if for all pairs of datasets D, D(cid:48) \u2208\n|D\u2206D(cid:48)| = 1, and every measurable S \u2286 R, we have that: Pr (M (D) \u2208 S) \u2264 e\u0001 \u00b7\nU\u2217 s.t.\nPr (M (D(cid:48)) \u2208 S) + \u03b4, where the probability is over the coin \ufb02ips of M.\n\n2.1 Distance to Instability Framework\n\nNext, we describe the distance to instability framework from [26] that releases the exact value of\na function on a dataset while preserving differential privacy, provided the function is suf\ufb01ciently\nstable on the dataset. We de\ufb01ne the notion of stability \ufb01rst, and provide a pseudocode for a private\nestimator for any function via this framework in Algorithm 1.\nDe\ufb01nition 2.3 (k-stability [26]). A function f : U\u2217 \u2192 R is k-stable on dataset D if adding or\nremoving any k elements from D does not change the value of f, i.e., \u2200D(cid:48)s.t. |D\u2206D(cid:48)| \u2264 k, we have\nf (D) = f (D(cid:48)). We say f is stable on D if it is (at least) 1-stable on D, and unstable otherwise.\nThe distance to instability of a dataset D \u2208 U\u2217 with respect to a function f is the number of elements\nthat must be added to or removed from D to reach a dataset that is not stable.\nAlgorithm 1 Astab [26]: Private release of a classi\ufb01cation query via distance to instability\nInput: Dataset D \u2208 U\u2217, a function f : U\u2217 \u2192 R for some range R, distance to instability function\n\nassociated with f: distf : U\u2217 \u2192 R, threshold: \u0393, privacy parameter \u0001 > 0\n\n1: (cid:100)dist \u2190 distf (D) + Lap (1/\u0001)\n2: If(cid:100)dist > \u0393, then output f (D), else output \u22a5\n\nTheorem 2.4 (Privacy guarantee for Astab). If the threshold \u0393 = log(1/\u03b4)/\u0001, and the distance to\n[f (D) is k-stable], then Astab (Algorithm 1) is (\u0001, \u03b4)-DP.\ninstability function distf (D) = arg max\nTheorem 2.5 (Utility guarantee for Astab). If the threshold \u0393 = log(1/\u03b4)/\u0001, the distance to insta-\nbility function is chosen as in Theorem 2.4, and f (D) is ((log(1/\u03b4) + log(1/\u03b2)) /\u0001)-stable, then\nAlgorithm 1 outputs f (D) with probability at least 1 \u2212 \u03b2.\nThe proof of the above two theorems follows from [26, Proposition 3].\n\nk\n\n4\n\n\f3 Privately Answering Classi\ufb01cation Queries\n\nIn this section, we instantiate the distance to instability framework (Algorithm 1) with the sub-\nsample and aggregate framework [22, 26], and then combine it with the sparse vector technique\n[15, 16] to obtain a construction for privately answering classi\ufb01cation queries with a conservative\nuse of the privacy budget (Algorithm 2 below). We consider here the case of binary classi\ufb01cation\nfor simplicity. However, we note that one can easily extend the construction (and obtain analogous\nguarantees) for multi-class classi\ufb01cation.\nNote: The full version [3] contains a detailed discussion of a more general framework together\nwith a more modular and generic description of the algorithmic techniques. The description of the\nalgorithms in this short version involves only classi\ufb01cation queries.\nA private training set, denoted by D,\nis a set of n private binary-labeled data points\n{(x1, y1), . . . , (xn, yn)} \u2208 (X \u00d7 Y)n drawn i.i.d. from some (arbitrary unknown) distribution D\nover X \u00d7 Y. We will refer to the induced marginal distribution over X as DX . We consider a\nsequence of (online) classi\ufb01cation queries de\ufb01ned by a sequence of m unlabeled points from X\nQ = {\u02dcx1,\u00b7\u00b7\u00b7 , \u02dcxm} \u2208 X m, drawn i.i.d. from DX , and let {\u02dcy1,\u00b7\u00b7\u00b7 , \u02dcym} \u2208 {0, 1}m be the cor-\nresponding true unknown labels. Algorithm 2 has oracle access to a non-private learner \u0398 for a\nhypothesis class H. We will consider both realizable and non-realizable cases of the standard PAC\nmodel. In particular, \u0398 is assumed to be an (agnostic) PAC learner for H.\nAlgorithm 2 AbinClas: Private Online Binary Classi\ufb01cation via subsample and aggregate; and sparse\nvector\nInput: Private dataset: D, sequence of online unlabeled public data (de\ufb01ning the classi\ufb01cation\nqueries) Q = {\u02dcx1,\u00b7\u00b7\u00b7 , \u02dcxm}, oracle access to a non-private learner \u0398 : U\u2217 \u2192 H for a hypothe-\nsis class H, cutoff parameter: T , privacy parameters \u0001, \u03b4 > 0, failure probability: \u03b2\n\n1: c \u2190 0, \u03bb \u2190(cid:112)32T log(2/\u03b4)/\u0001, and k \u2190 34\n2: w \u2190 2\u03bb \u00b7 log(2m/\u03b4), and (cid:98)w \u2190 w + Lap(\u03bb)\n\n2\u03bb \u00b7 log (4mT / min (\u03b4, \u03b2/2))\n\n\u221a\n\n3: Arbitrarily split D into k non-overlapping chunks of size n/k. Call them D1,\u00b7\u00b7\u00b7 , Dk\n4: for j \u2208 [k], train \u0398 on Dj to get a classi\ufb01er hj \u2208 H\n5: for i \u2208 [m] and c \u2264 T do\n6:\n7:\n\nLet Si = {h1(xi),\u00b7\u00b7\u00b7 , hk(xi)}, and for y \u2208 {0, 1}, let ct(y) = # times y appears in Si\n\n\u2190 max{0, ct ((cid:98)qxi) \u2212 ct (1 \u2212(cid:98)qxi ) \u2212 1}\n\n(cid:98)qxi(D) \u2190 arg max\n(cid:17)\n, \u0393 = (cid:98)w, \u0001 = 1/2\u03bb\nIf outi = \u22a5, then c \u2190 c + 1 and (cid:98)w \u2190 w + Lap(\u03bb)\n\n(cid:16)\ny\u2208{0,1} [ct(y)], dist(cid:98)qxi\nD,(cid:98)qxi, dist(cid:98)qxi\n\nouti \u2190 Astab\n\nOutput outi\n\n8:\n9:\n10:\nTheorem 3.1 (Privacy guarantee for AbinClas). Algorithm AbinClas (Algorithm 2) is (\u0001, \u03b4)-DP.\nThe proof of this theorem follows from combining the guarantees of the distance to instability frame-\nwork [26], and the sparse vector technique [16]. The idea is that in each round of query response,\nif the algorithm outputs a label in {0, 1}, then there is \u201cno loss in privacy\u201d in terms of \u0001 (as there\nis suf\ufb01cient consensus). However, when the output is \u22a5, there is a loss of privacy. This argument\nis formalized via the distance to instability framework. Sparse vector helps account for the privacy\nloss across all the m queries. A formal proof of this theorem is deferred to the full version.\nTheorem 3.2. Let \u03b1, \u03b2 \u2208 (0, 1), and \u03b3 (cid:44) min\n(Note that in the realizable case\nIn Algorithm AbinClas (Algorithm 2), suppose we set the cutoff parameter as T =\n\u03b3 = 0).\n. If \u0398 is an (\u03b1, \u03b2/k, n/k)-agnostic PAC learner (De\ufb01-\n3\nnition 2.1), where k is as de\ufb01ned in AbinClas, then i) with probability at least 1 \u2212 2\u03b2, AbinClas does\nnot halt before answering all the m queries in Q, and outputs \u22a5 for at most T queries; and ii) the\nmisclassi\ufb01cation rate of AbinClas is at most T /m = O(\u03b3 + \u03b1).\nProof. First, notice that \u0398 is an (\u03b1, \u03b2/k, n/k)-agnostic PAC learner, hence w.p. \u2265 1 \u2212 \u03b2, the\nmisclassi\ufb01cation rate of hj for all j \u2208 [k] is at most \u03b3 + \u03b1. So, by the standard Chernoff\u2019s\nbound, with probability at least 1 \u2212 \u03b2 none of the hj\u2019s misclassify more than (\u03b3 + \u03b1)m +\n\n(\u03b3 + \u03b1)m +(cid:112)(\u03b3 + \u03b1)m log(m/\u03b2)/2\n\nh\u2208H err(h;D).\n\n(cid:16)\n\n(cid:17)\n\n5\n\n\f(cid:12)(cid:12)(cid:12)(cid:12)(cid:26)\n\ni \u2208 [m] : |{j \u2208 [k] : hj(\u02dcxi) (cid:54)= \u02dcyi}| > \u03bek\n\nthere are at most 3B queries \u02dcxi \u2208 Q, where the votes of\n\nargument (Lemma 3.3) to bound the number of queries for which at least k/3 classi\ufb01ers in the\nensemble {h1, . . . , hk} result in a misclassi\ufb01cation.\nLemma 3.3. Consider a set of {(\u02dcx1, \u02dcy1), . . . , (\u02dcxm, \u02dcym)} \u2282 X \u00d7 Y, and k binary classi\ufb01ers\nh1, . . . , hk, where each classi\ufb01er is guaranteed to make at most B mistakes in predicting the m\nlabels {\u02dcy1, . . . , \u02dcym}. For any \u03be \u2208 (0, 1/2],\n\n(cid:112)(\u03b3 + \u03b1)m log(m/\u03b2)/2 (cid:44) B queries in Q. Now, we use the following Markov-style counting\n(cid:27)(cid:12)(cid:12)(cid:12)(cid:12) < B/\u03be.\n32 log (4mT / min (\u03b4, \u03b2/2))(cid:112)2T log(2/\u03b4)/\u0001 (taking into account the noise in the threshold passed\n\nTherefore,\nthe ensemble\n{h1(\u02dcxi), . . . , hk(\u02dcxi)} has number of ones (or, zeros) > k/3 (i.e., they signi\ufb01cantly disagree).\nNow, to prove part (i) of the theorem, observe that to satisfy the distance to instability con-\ndition (in Theorem 2.5) for the remaining m \u2212 3B queries, it would suf\ufb01ce to have k/3 \u2265\nto Astab in Step 8 of AbinClas\n4). This condition on k is satis\ufb01ed by the setting of k in AbinClas. For\npart (ii), note that by the same lemma above, w.p. 1\u2212 \u03b2, there are at least 2k/3 classi\ufb01ers that output\nthe correct label in each of the remaining m\u22123B queries. Hence, w.p. \u2265 1\u22122\u03b2, Algorithm AbinClas\nwill correctly classify such queries. This completes the proof.\nRemark 1. A natural question for using Theorem 3.2 in the agnostic case is that how would one\nknow the value of \u03b3 in practice, in order to set the right value for T ? One simple approach is to\nset aside half the training dataset, and compute the empirical misclassi\ufb01cation rate with differential\nprivacy to get a suf\ufb01ciently accurate estimate for \u03b3 + \u03b1 (as in standard validation techniques [25]),\nand use it to set T . Since the sensitivity of misclassi\ufb01cation rate is small, the amount of noise added\nwould not affect the accuracy of the estimation. Furthermore, with a large enough training dataset,\nthe asymptotics of Theorem 3.2 would not change either.\n\nExplicit misclassi\ufb01cation rate: In Theorem 3.2, it might seem that there is a circular dependency\nof the following terms: T \u2192 \u03b1 \u2192 k \u2192 T . However, the number of independent relations is\nequal to the number of parameters, and hence, we can set them meaningfully to obtain non-trivial\nmisclassi\ufb01cation rates. We now obtain an explicit misclassi\ufb01cation rate for AbinClas in terms of the\nVC-dimension of H. Let V denote the VC-dimension of H. First, we consider the realizable case\n(\u03b3 = 0). Our result for this case is formally stated in the following theorem.\nTheorem 3.4 (Misclassi\ufb01cation rate in the realizable case). For any \u03b2 \u2208 (0, 1), there exists M =\n1 \u2212 \u03b2, AbinClas yields the following misclassi\ufb01cation rate: (i) \u02dcO(V /\u0001 n) for up to M queries, and\n\n\u02dc\u2126(\u0001 n/V ), and a setting for T = \u02dcO(cid:0) \u00afm2 V 2/\u00012 n2(cid:1), where \u00afm (cid:44) max(M, m), such that w.p. \u2265\n(ii) \u02dcO(cid:0)mV 2/\u00012 n2(cid:1) for m > M queries.\n\nProof. By standard uniform convergence arguments [25], there is an (\u03b1, \u03b2, n/k)-PAC learner with\nmisclassi\ufb01cation rate \u03b1 = \u02dcO (kV /n). Setting T as in Theorem 3.2 with the aforementioned setting\nof \u03b1, and setting k as in Algorithm AbinClas gives the setting of T in the theorem statement. For up\nto m = \u02dc\u2126(\u0001 n/V ) queries, the setting of T becomes T = O(1), and hence Theorem 3.2 implies\nAbinClas yields a misclassi\ufb01cation rate \u02dcO(V /\u0001 n), which is essentially the same as the optimal non-\nprivate rate. Beyond \u02dc\u2126(\u0001 n/V ) queries, T = \u02dcO(m2 V 2/\u00012 n2), and hence, Theorem 3.2 implies that\n\nthe misclassi\ufb01cation rate of AbinClas is \u02dcO(cid:0)mV 2/\u00012 n2(cid:1).\n\n\u221a\n\n(cid:16)\n\nWe note that\nthe attainable misclassi\ufb01cation rate is signi\ufb01cantly smaller than the rate of\n\u02dcO (\nmV /\u0001 n) implied by a direct application of the advanced composition theorem of differential\nprivacy. Next, we provide analogous statement for the non-realizable case (\u03b3 > 0).\nTheorem 3.5 (Misclassi\ufb01cation rate in the non-realizable case). For any \u03b2 \u2208 (0, 1), there exists\nM = \u02dc\u2126\n\u00afm (cid:44) max{M, m}, such that w.p. \u2265 1 \u2212 \u03b2, AbinClas yields the following misclassi\ufb01cation rate:\n(i) O(\u03b3) + \u02dcO\nm > M queries.\n\n(cid:111)(cid:17)\n, and a setting for T = O( \u00afm\u03b3) + \u02dcO(cid:0) \u00afm4/3 V 2/3/\u00012/3 n2/3(cid:1), where\nfor up to M queries, and (ii) O(\u03b3) + \u02dcO(cid:0)m1/3 V 2/3/\u00012/3 n2/3(cid:1) for\n\n(cid:110)\n1/\u03b3,(cid:112)\u0001 n/V\n(cid:16)(cid:112)V /\u0001 n\n(cid:17)\n\nmin\n\n4Detailed argument given in the full version.\n\n6\n\n\f(cid:16)(cid:112)kV /n\n(cid:17)\n\n(cid:16)(cid:112)kV /n\n(cid:17)\n\n(cid:16)\n\n(cid:110)\n1/\u03b3,(cid:112)\u0001 n/V\n\n, and hence, it has a misclassi\ufb01cation rate of \u2248 \u03b3 + \u02dcO\n\nProof. Again, by a standard argument, \u0398 is (\u03b1, \u03b2, n/k)-agnostic PAC learner with \u03b1 =\n\u02dcO\nwhen trained\non a dataset of size n/k. Setting T as in Theorem 3.2 with this value of \u03b1, and setting k as in\nAbinClas, and then solving for T in the resulting expression, we get the setting of T as in the the-\norem statement (it would help here to consider the cases where \u03b3 > \u03b1 and \u03b3 \u2264 \u03b1 separately).\nFor up to m = \u02dc\u2126\nqueries, the setting of T becomes T = O(1), and\nhence Theorem 3.2 implies AbinClas yields a misclassi\ufb01cation rate O(\u03b3) + \u02dcO\nis essentially the same as the optimal non-private rate. Beyond \u02dc\u2126\n\nT = O(m\u03b3) + \u02dcO(cid:0)m4/3 V 2/3/\u00012/3 n2/3(cid:1), and hence, Theorem 3.2 implies that the misclassi\ufb01cation\nrate of AbinClas is O(\u03b3) + \u02dcO(cid:0)m1/3 V 2/3/\u00012/3 n2/3(cid:1).\n\n(cid:16)(cid:112)V /\u0001 n\n(cid:17)\n(cid:110)\n(cid:111)(cid:17)\n1/\u03b3,(cid:112)\u0001 n/V\n\n(cid:111)(cid:17)\n\nqueries,\n\n, which\n\n(cid:16)\n\nmin\n\nmin\n\n4 From Answering Queries to Model-agnostic Private Learning\n\nIn this section, we build on our algorithm and results in Section 3 to achieve a stronger objective.\nIn particular, we bootstrap from our previous algorithm an (\u0001, \u03b4)-differentially private learner that\npublishes a \ufb01nal classi\ufb01er. The idea is based on a knowledge transfer technique: we use our private\nconstruction above to generate labels for suf\ufb01cient number of unlabeled domain points. Then, we\nuse the resulting labeled set as a new training set for any standard (non-private) learner, which in\nturn outputs a classi\ufb01er. We prove explicit sample complexity bounds for the \ufb01nal private learner in\nboth PAC and agnostic PAC settings.\nOur \ufb01nal construction can also be viewed as a private learner in the less restrictive setting of label-\nprivate learning where the learner is only required to protect the privacy of the labels in the training\nset. Note that any construction for our original setting can be used as a label-private learner simply\nby splitting the training set into two parts and throwing away the labels of one of them.\nLet hpriv denote the mapping de\ufb01ned by AbinClas (Algorithm 2) on a single query (unlabeled data\npoint). That is, for x \u2208 X , hpriv(x) \u2208 {0, 1,\u22a5} denotes the output of AbinClas on a single input\nquery x. Note that w.l.o.g., we can view hpriv as a binary classi\ufb01er by replacing \u22a5 with a uniformly\nrandom label in {0, 1}. Our private learner is described in Algorithm 3 below.\nAlgorithm 3 APriv: Private Learner\nInput: Unlabeled set of m i.i.d. feature vectors: Q = {\u02dcx1, . . . , \u02dcxm}, oracle access to our private\n1: for t = 1, . . . , m do\n2:\n3: Output \u02c6h \u2190 \u0398( \u02dcD), where \u02dcD = {(\u02dcx1, \u02c6y1), . . . , (\u02dcxm, \u02c6ym)}\nNote that since differential privacy is closed under post-processing, APriv is (\u0001, \u03b4)-DP w.r.t.\nthe\noriginal dataset (input to AbinClas). Note also that the mapping hpriv is independent of Q; it only\ndepends on the input training set D (in particular, on h1, . . . , hk), and the internal randomness of\nAbinClas. We now make the following claim about hpriv.\nClaim 4.1. Let 0 < \u03b2 \u2264 \u03b1 < 1, and m \u2265 4 log(1/\u03b1\u03b2)/\u03b1. Suppose that \u0398 in AbinClas (Algorithm 2)\nis an (\u03b1, \u03b2/k, n/k)-(agnostic) PAC learner for the hypothesis class H. Then, with probability at\nleast 1\u2212 2\u03b2 (over the randomness of the private training set D, and the randomness in AbinClas), we\nhave err(hpriv;D) \u2264 3\u03b3 + 7\u03b1 = O(\u03b3 + \u03b1), where \u03b3 = min\n\nclassi\ufb01er hpriv, oracle access to an agnostic PAC learner \u0398 for a class H.\n\n\u02c6yt \u2190 hpriv(\u02dcxt)\n\nh\u2208H err(h;D).\n\nProof. The proof largely relies on the proof of Theorem 3.2. First, note that w.p. \u2265 1 \u2212 \u03b2 (over\nthe randomness of the input dataset D), for all j \u2208 [k], we have err(hj;D) \u2264 \u03b1. For the remainder\nof the proof, we will condition on this event. Let \u02dcx1, . . . , \u02dcxm be a sequence of i.i.d. domain points,\nand \u02dcy1, . . . , \u02dcym be the corresponding (unknown) labels. Now, for every t \u2208 [m], de\ufb01ne vt (cid:44)\n1 (|{j \u2208 [k] : hj(\u02dcxt) (cid:54)= \u02dcyt}| > k/3). Note that since (\u02dcx1, \u02dcy1), . . . , (\u02dcxm, \u02dcym) are i.i.d., it follows\nthat v1, . . . , vm are i.i.d. (this is true conditioned on the original dataset D). As in the proof of\nTheorem 3.2, we have:\n< \u03b2. Hence, for any\n\n(cid:113) log(m/\u03b2)\n\n(cid:80)m\n\n(cid:104) 1\n\n(cid:17)(cid:105)\n\n\u03b1 + \u03b3 +\n\n(cid:16)\n\nP\n\nt=1 vt > 3\n\n\u02dcx1,...,\u02dcxm\n\nm\n\n2m(\u03b1+\u03b3)\n\n7\n\n\f(cid:80)m\n\n(cid:2) 1\n\nm\n\n(cid:3) < \u03b2 + 3\n\n(cid:16)\n\n(cid:113) log(m/\u03b2)\n\n2m(\u03b1+\u03b3)\n\n(cid:17) \u2264 7\u03b1 + 3\u03b3. Let\n\n[vt] = E\n\nt \u2208 [m], we have E\n\u00afvt = 1 \u2212 vt. Using the same technique as in the proof of Theorem 3.2, we can show that w.p. at\nleast 1 \u2212 \u03b2 over the internal randomness in Algorithm 2, we have \u00afvt = 1 \u21d2 hpriv(\u02dcxt) = \u02dcyt. Hence,\nconditioned on this event, we have P\n\n[hpriv(\u02dcxt) (cid:54)= \u02dcyt] \u2264 P\n\n[vt] \u2264 7\u03b1 + 3\u03b3.\n\n[vt = 1] = E\n\n\u03b1 + \u03b3 +\n\nt=1 vt\n\n\u02dcx1,...,\u02dcxm\n\n\u02dcxt\n\n\u02dcxt\n\n\u02dcxt\n\n\u02dcxt\n\n\u03b12\n\n(cid:17)\n\nWe now state and prove the main results of this section. Let V denote the VC-dimension of H.\nTheorem 4.2 (Sample complexity bound in the realizable case). Let 0 < \u03b2 \u2264 \u03b1 < 1. Let m be such\nthat \u0398 is an (\u03b1, \u03b2, m)-agnostic PAC learner of H, i.e., m = O\n. Let the parameter T\n\nof AbinClas (Algorithm 2) be set as in Theorem 3.4. There exists n = \u02dcO(cid:0)V 3/2/\u0001 \u03b13/2(cid:1) for the size of\n\n(cid:16) V +log(1/\u03b2)\n\nstatement, we get k = \u02dcO(V 2/\u00012 \u03b12n). Hence, there is a setting n = \u02dcO(cid:0)V 3/2/\u0001 \u03b13/2(cid:1) such that\n\nthe private dataset such that, w.p. \u2265 1 \u2212 3\u03b2, the output hypothesis \u02c6h of APriv (Algorithm 3) satis\ufb01es\nerr(\u02c6h;D) = O(\u03b1).\nProof. Let h\u2217 \u2208 H denote the true labeling hypothesis. We will denote the true distribution D as\n(DX , h\u2217). Note that since T is set as in Theorem 3.4, and given the value of m in the theorem\n\u0398 is an (\u03b1, \u03b2/k, n/k)-PAC learner for H (in particular, sample complexity in the realizable case\n= n/k = \u02dcO(V /\u03b1)). Hence, by Claim 4.1, w.p. \u2265 1 \u2212 2\u03b2, err(hpriv;D) \u2264 7\u03b1. For the remainder of\nthe proof, we will condition on this event. Note that each (\u02dcxt, \u02c6yt), t \u2208 [m], is drawn independently\nfrom (DX , hpriv). Now, since \u0398 is also an (\u03b1, \u03b2, m)-agnostic PAC learner for H, w.p. \u2265 1\u2212 \u03b2 (over\nthe new set \u02dcD), the output hypothesis \u02c6h satis\ufb01es\nerr(\u02c6h; (DX , hpriv)) \u2212 err(h\u2217; (DX , hpriv)) \u2264 err(\u02c6h; (DX , hpriv)) \u2212 min\n\nh\u2208H err(h; (DX , hpriv)) \u2264 \u03b1.\n\nObserve that\n[1 (h\u2217(x) (cid:54)= hpriv(x))] = err(hpriv; (DX , h\u2217)) = err(hpriv;D) \u2264 7\u03b1,\nerr(h\u2217; (DX , hpriv)) = E\nx\u223cDX\nwhere the last inequality follows from Claim 4.1 (with \u03b3 = 0). Hence, err(\u02c6h; (DX , hpriv)) \u2264 8\u03b1.\nFurthermore, observe that\n\n(cid:104)\n\n(cid:105) \u2264 E\n\nx\u223cDX\n\n(cid:104)\n\n(cid:105)\n\nerr(\u02c6h;D) = E\nx\u223cDX\n\n1(\u02c6h(x) (cid:54)= h\u2217(x))\n\n= err(\u02c6h; (DX , hpriv)) + err(hpriv;D) \u2264 15\u03b1.\n\n1(\u02c6h(x) (cid:54)= hpriv(x)) + 1(hpriv(x) (cid:54)= h\u2217(x))\n\nHence, w.p. \u2265 1 \u2212 3\u03b2, we have err(\u02c6h;D) \u2264 15\u03b1.\nRemark 2. In Theorem 4.2, if \u0398 is an ERM learner, then the value of m can be reduced to \u02dcO(V /\u03b1).\nHence, the resulting sample complexity would be n = \u02dcO(V 3/2/\u0001 \u03b1), saving us a factor of 1\u221a\n\u03b1 . This\nis because the disagreement rate in the labels produced by AbinClas is \u2248 \u03b1, and agnostic learning\nwith such a low disagreement rate can be done using \u02dcO(V /\u03b1) if the learner is an ERM [7, Corollary\n5.2].\nRemark 3. Our result involves using an agnostic PAC learner \u0398. Agnostic PAC learners with opti-\nmal sample complexity can be computationally inef\ufb01cient. One way to give an ef\ufb01cient construction\nin the realizable case (with a slightly worse sample complexity) is to use a PAC learner (rather than\nan agnostic one) in APriv with target accuracy \u03b1 (and hence, m = \u02dcO(V /\u03b1)), but then train the PAC\nlearner in AbinClas towards a target accuracy 1/m. Hence, the misclassi\ufb01cation rate of AbinClas can\nbe driven to zero. This yields a sample complexity bound n = \u02dcO(V 2/\u0001 \u03b1).\nTheorem 4.3 (Sample complexity bound in the non-realizable case). Let 0 < \u03b2 \u2264 \u03b1 < 1, and\nm = O\nthat, w.p. \u2265 1 \u2212 3\u03b2, the output hypothesis \u02c6h of (Algorithm 3) satis\ufb01es err(\u02c6h;D) = O(\u03b1 + \u03b3).\n\n. Let T be set as in Theorem 3.5. There exists n = \u02dcO(cid:0)V 3/2/\u0001 \u03b15/2(cid:1) such\n\n(cid:16) V +log(1/\u03b2)\n\n(cid:17)\n\n\u03b12\n\nProof. The proof is similar to the proof of Theorem 4.2.\n\n8\n\n\f5 Discussion\n\nImplications, and comparison to prior work on label privacy: Our results also apply to the\nsetting of label-private learning, where the learner is only required to protect the privacy of the labels\nin the training set. That is, in this setting, all unlabeled features in the training set can be viewed\nas public information. This is a less restrictive setting than the setting we consider in this paper.\nIn particular, our construction can be directly used as a label-private learner simply by splitting the\ntraining set into two parts and discarding the labels in one of them. The above theorems give sample\ncomplexity upper bounds that are only a factor of \u02dcO\nworse than the optimal non-private\nsample complexity bounds. We note, however, that our sample complexity upper bound for the\nagnostic case has a suboptimal dependency (by a small constant factor) on \u03b3 (cid:44) min\n\n(cid:16)(cid:112)V /\u03b1\n\n(cid:17)\n\nh\u2208H err(h;D).\n\nLabel-private learning has been considered before in [10] and [6]. Both works have only considered\npure, i.e., (\u0001, 0), differentially private learners for those settings, and the constructions in both works\nare white-box, i.e., they do not allow for modular construction based on a black-box access to a non-\nprivate learner. The work of [10] gave upper and lower bounds on the sample complexity in terms\nof the doubling dimension. Their upper bound involves a smoothness condition on the distribution\nof the features DX . The work of [6] showed that the sample complexity (of pure differentially label-\nprivate learners) can be characterized in terms of the VC dimension. They proved an upper bound\non the sample complexity for the realizable case. The bound of [6] is only a factor of O(1/\u03b1) worse\nthan the optimal non-private bound for the realizable case.\nBeyond standard PAC learning with binary loss: In this paper, we used our algorithmic frame-\nwork to derive sample complexity bounds for the standard (agnostic) PAC model with the binary 0-1\nloss. However, it is worth pointing out that our framework is applicable in more general settings.\nIn particular, if a surrogate loss (e.g., hinge loss or logistic loss) is used instead of the binary loss,\nthen our framework can be instantiated with any non-private learner with respect to that loss. That\nis, our construction does not necessarily require an (agnostic) PAC learner. However, in such case,\nthe accuracy guarantees of our construction will be different from what we have here for the stan-\ndard PAC model. In particular, in the surrogate loss model, one often needs to invoke some weak\nassumptions on the data distribution in order to bound the optimization error [25]. One can still pro-\nvide meaningful accuracy guarantees since our framework allows for transferring the classi\ufb01cation\nerror guarantee of the underlying non-private learner to a classi\ufb01cation error guarantee for the \ufb01nal\nprivate learner.\nAcknowledgement: The authors would like to thank Vitaly Feldman, and Adam Smith for helpful\ndiscussions during the course of this project. In particular, the authors are grateful for Vitaly\u2019s ideas\nabout the possible extensions of the results in Section 4, which we outlined in Remarks 2 and 3. This\nwork is supported by NSF grants TRIPODS-1740850, TRIPODS+X-1839317, and IIS-1447700, a\ngrant from the Sloan foundation, and start-up supports from OSU and UC Santa Cruz.\n\nReferences\n[1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar,\nIn Proceedings of the 2016 ACM\nand Li Zhang. Deep learning with differential privacy.\nSIGSAC Conference on Computer and Communications Security, pages 308\u2013318. ACM, 2016.\n\n[2] Raef Bassily, Adam Smith, and Abhradeep Thakurta. Private empirical risk minimization:\nIn Foundations of Computer Science (FOCS),\n\nEf\ufb01cient algorithms and tight error bounds.\n2014 IEEE 55th Annual Symposium on, pages 464\u2013473. IEEE, 2014.\n\n[3] Raef Bassily, Om Thakkar, and Abhradeep Thakurta. Model-agnostic private learning via\n\nstability. arXiv preprint arXiv:1803.05101, 2018.\n\n[4] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the Sample\nComplexity for Private Learning and Private Data Release. In TCC, pages 437\u2013454. Springer,\n2010.\n\n[5] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Characterizing the sample complexity of\n\nprivate learners. In ITCS. ACM, 2013.\n\n9\n\n\f[6] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs.\n\napproximate differential privacy. Theory of Computing, 12(1):1\u201361, 2016.\n\n[7] St\u00b4ephane Boucheron, Olivier Bousquet, and G\u00b4abor Lugosi. Theory of classi\ufb01cation: A survey\n\nof some recent advances. ESAIM: probability and statistics, 9:323\u2013375, 2005.\n\n[8] Leo Breiman. Bagging predictors. Machine learning, 24(2):123\u2013140, 1996.\n\n[9] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Differentially private release and\nlearning of threshold functions. In Venkatesan Guruswami, editor, IEEE 56th Annual Sympo-\nsium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October,\n2015, pages 634\u2013649. IEEE Computer Society, 2015.\n\n[10] Kamalika Chaudhuri and Daniel Hsu. Sample complexity bounds for differentially private\nlearning. In Proceedings of the 24th Annual Conference on Learning Theory, pages 155\u2013186,\n2011.\n\n[11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empir-\n\nical risk minimization. JMLR, 2011.\n\n[12] Cynthia Dwork and Vitaly Feldman.\n\narXiv:1803.10266, 2018.\n\nPrivacy-preserving prediction.\n\narXiv preprint\n\n[13] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our\n\ndata, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006.\n\n[14] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\nsensitivity in private data analysis. In Theory of Cryptography Conference, pages 265\u2013284.\nSpringer, 2006.\n\n[15] Cynthia Dwork, Moni Naor, Omer Reingold, Guy Rothblum, and Salil Vadhan. On the com-\nplexity of differentially private data release: ef\ufb01cient algorithms and hardness results. In STOC,\npages 381\u2013390, 2009.\n\n[16] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foun-\n\ndations and Trends in Theoretical Computer Science, 9(3-4):211\u2013407, 2014.\n\n[17] Jihun Hamm, Yingjun Cao, and Mikhail Belkin. Learning privately from multiparty data. In\n\nInternational Conference on Machine Learning, pages 555\u2013563, 2016.\n\n[18] Moritz Hardt and Guy N. Rothblum. A multiplicative weights mechanism for privacy-\n\npreserving data analysis. In FOCS, 2010.\n\n[19] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, and\nAdam Smith. What can we learn privately? In FOCS, pages 531\u2013540. IEEE Computer Society,\n2008.\n\n[20] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory.\n\nMIT Press, Cambridge, MA, USA, 1994.\n\n[21] Daniel Kifer, Adam Smith, and Abhradeep Thakurta. Private convex empirical risk minimiza-\n\ntion and high-dimensional regression. Journal of Machine Learning Research, 1:41, 2012.\n\n[22] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in\n\nprivate data analysis. In STOC, 2007.\n\n[23] Nicolas Papernot, Mart\u0131n Abadi, \u00b4Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. Semi-\nsupervised knowledge transfer for deep learning from private training data. stat, 1050, 2017.\n[24] Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and \u00b4Ulfar\n\nErlingsson. Scalable private learning with pate. arXiv preprint arXiv:1802.08908, 2018.\n\n[25] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to\n\nalgorithms. Cambridge university press, 2014.\n\n10\n\n\f[26] Adam Smith and Abhradeep Thakurta. Differentially private feature selection via stability\n\narguments, and the robustness of the lasso. In COLT, 2013.\n\n[27] Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Nearly optimal private lasso.\n\n2015.\n\nIn NIPS,\n\n11\n\n\f", "award": [], "sourceid": 3534, "authors": [{"given_name": "Raef", "family_name": "Bassily", "institution": "The Ohio State University"}, {"given_name": "Om", "family_name": "Thakkar", "institution": "Boston University"}, {"given_name": "Abhradeep", "family_name": "Guha Thakurta", "institution": "University of California Santa Cruz"}]}