{"title": "Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization", "book": "Advances in Neural Information Processing Systems", "page_first": 6343, "page_last": 6354, "abstract": "Distributed learning allows a group of independent data owners to collaboratively learn a model over their data sets without exposing their private data. We present a distributed learning approach that combines differential privacy with secure multi-party computation. We explore two popular methods of differential privacy, output perturbation and gradient perturbation, and advance the state-of-the-art for both methods in the distributed learning setting. In our output perturbation method, the parties combine local models within a secure computation and then add the required differential privacy noise before revealing the model. In our gradient perturbation method, the data owners collaboratively train a global model via an iterative learning algorithm.  At each iteration, the parties aggregate their local gradients within a secure computation, adding sufficient noise to ensure privacy before the gradient updates are revealed. For both methods, we show that the noise can be reduced in the multi-party setting by adding the noise inside the secure computation after aggregation, asymptotically improving upon the best previous results. Experiments on real world data sets demonstrate that our methods provide substantial utility gains for typical privacy requirements.", "full_text": "Distributed Learning without Distress:\n\nPrivacy-Preserving Empirical Risk Minimization\n\nBargav Jayaraman\n\nDepartment of Computer Science\n\nUniversity of Virginia\n\nCharlottesville, VA 22903\nbj4nq@virginia.edu\n\nDavid Evans\n\nDepartment of Computer Science\n\nUniversity of Virginia\n\nCharlottesville, VA 22903\nevans@virginia.edu\n\nLingxiao Wang\n\nDepartment of Computer Science\n\nUniversity of California, Los Angeles\n\nLos Angeles, CA 90095\nlingxw@cs.ucla.edu\n\nQuanquan Gu\n\nDepartment of Computer Science\n\nUniversity of California, Los Angeles\n\nLos Angeles, CA 90095\n\nqgu@cs.ucla.edu\n\nAbstract\n\nDistributed learning allows a group of independent data owners to collaboratively\nlearn a model over their data sets without exposing their private data. We present a\ndistributed learning approach that combines differential privacy with secure multi-\nparty computation. We explore two popular methods of differential privacy, output\nperturbation and gradient perturbation, and advance the state-of-the-art for both\nmethods in the distributed learning setting. In our output perturbation method,\nthe parties combine local models within a secure computation and then add the\nrequired differential privacy noise before revealing the model. In our gradient\nperturbation method, the data owners collaboratively train a global model via an\niterative learning algorithm. At each iteration, the parties aggregate their local\ngradients within a secure computation, adding suf\ufb01cient noise to ensure privacy\nbefore the gradient updates are revealed. For both methods, we show that the noise\ncan be reduced in the multi-party setting by adding the noise inside the secure\ncomputation after aggregation, asymptotically improving upon the best previous\nresults. Experiments on real world data sets demonstrate that our methods provide\nsubstantial utility gains for typical privacy requirements.\n\n1\n\nIntroduction\n\nIn many applications, such as medical research and \ufb01nancial fraud detection, it is valuable to\nbuild machine learning models by training on sensitive data. This raises privacy concerns since\nadversaries may be able to infer information about the training data from the learned model. Model\nparameters can reveal sensitive information about individual records including speci\ufb01c features of the\nrecords [20] to the presence of particular records in the data set [47]. In the case of neural networks,\nthe model parameters can also inadvertently store sensitive parts of the training data [8]. Differential\nprivacy [19, 16] aims to thwart such analysis. It provides statistical privacy for individual records\nby adding random noise to the model parameters. Many works have shown that differential privacy\ncan be used to enable privacy-preserving machine learning in the centralized setting where a single\norganization owns all the data [10, 11, 29, 30, 51, 58].\nThe problem becomes more acute when the data is owned by different organizations that wish to\ncollaboratively learn from their private data. For instance, multiple hospitals may want to collabora-\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\ftively train a classi\ufb01er over their patient medical records without disclosing their own records to other\nhospitals. The goal of distributed machine learning (also referred to as federated learning [36]) is to\nenable a group of independent data owners to develop a model from their combined data without\nexposing that data to others.\nMulti-party computation (MPC) protocols allow participants to jointly compute a functionality over\ntheir private inputs by employing cryptographic techniques like homomorphic encryption, secret\nsharing, and oblivious transfer. Lindell and Pinkas [31] proposed one of the earliest approaches to use\nMPC for private data mining, which was followed by several works considering different adversarial\nmodels or applications [55, 50, 33, 42]. A recent focus has been to achieve practical and ef\ufb01cient\ndistributed machine learning using MPC protocols [12, 52, 34], and in certain settings such methods\nhave been shown to scale to learning tasks with hundreds of millions of records [39, 21]. However,\nunlike approaches using differential privacy on the model, these approaches only protect the training\ndata during the learning process; they provide no protection against inference attacks on the resulting\nmodel.\nPathak et al. [41] proposed the \ufb01rst differentially-private machine learning in distributed setting. Their\nmethod securely aggregates local models and uses output perturbation to achieve differential privacy.\nHowever, the noise scales inversely proportional to the smallest data set size of the m parties. This can\nbe improved by a factor of\nm by \ufb01rst training differentially-private local models using the method\nof Chaudhuri et al. [11], and then performing na\u00efve aggregation of the local models. In this work, we\npropose an output perturbation method that improves over Pathak et al.\u2019s method by a factor of m by\nadding the noise inside an MPC with the scale of noise required roughly inversely proportional to the\nsize of the entire data set. Recent works on distributed noise generation [17, 3, 24, 45] try to achieve\na similar bound by requiring parties to add partial noise locally, and combining these noises to ensure\ndifferential privacy. However, these methods require additional noise to tolerate corruptions and\ncollusion. More concretely, with a minimum of k honest parties out of m, their noise bound is worse\n\nthan ours by a factor of(cid:112)m/k. On the other hand, in our approach the noise is generated inside the\n\n\u221a\n\nMPC such that any honest participant can be assured that suf\ufb01cient noise is added to protect their\nown privacy even if all other participants are dishonest and colluding.\nWhile these model aggregation approaches are computationally ef\ufb01cient, they tend to produce less\naccurate global models compared to the centralized setting, especially when the number of data\nowners is large (in the extreme, when each party has only one training instance). For such scenarios,\ndistributed iterative learning with gradient perturbation is a better option. Shokri and Shmatikov [46]\nprovide such a solution for deep learning, where the local gradients are perturbed and then revealed\nfor updating the global model. Their privacy budget is per parameter, however, and not for the\nentire training so huge total privacy budgets are required. Abadi et al. [1] proposed a tighter bound\non the privacy budget using moments accountant which is applicable to centralized setting. Wang\net al. [51] used the moments accountant to propose iterative learning with gradient perturbation\nfor the centralized setting. We propose a method for distributed setting using zero-concentrated\ndifferential privacy [6] which achieves similar tight bound on privacy budget. Moreover, we add\nnoise inside MPC after gradient aggregation, thus reducing the noise by a factor of\nm compared\nto the na\u00efve aggregation of noisy gradients. While Chase et al. [9] also achieve similar bound on\nnoise in distributed learning setting, their method considers only the convex case. We achieve a\ndifferent (and tighter) utility bound for the strong convexity case. Further, Chase et al. use differential\nprivacy which has different composition properties than the zero-concentrated differential privacy\nthat we consider. We also note that the method proposed by Rajkumar and Agarwal [43] has similar\nobjectives, but their protocol requires a trusted third party to execute the SGD algorithm, whereas\nour method does not depend on any trusted party. In addition, although their method has the same\nscale of noise as ours, in their method each party samples local noise which is aggregated by the\ntrusted third party. This is not secure in the presence of colluding parties as noted by Bindschaedler\net al. [3] and Shi et al. [45]. In our method, parties collaboratively generate noise within the MPC.\nFinally, their method requires noise from two sources: the Gaussian noise \u03b7 and the Laplace noise \u03c1.\nGeneration of \u03c1 consumes \u0001 privacy budget per iteration, as opposed to using \u0001 budget for the entire\nlearning process, and hence violates the privacy constraints.\nIn this paper, we introduce differentially-private distributed machine learning protocols using both\noutput perturbation and gradient perturbation where the noise is added within a secure multi-party\ncomputation. Our output perturbation method securely aggregates the local models and achieves\n\u0001-differential privacy by adding Laplace noise to the aggregated model parameters. In our gradient\n\n\u221a\n\n2\n\n\fperturbation method, the parties collaboratively run an iterative, gradient-based learning algorithm\nwhere they securely aggregate the local gradients at each iteration. This provides (\u0001, \u03b4)-differential\nprivacy by adding Gaussian noise to the aggregated gradients. In both the methods the sampled\nnoise is (roughly) inversely proportional to the size of the entire data set. While the \ufb01rst method is\ncomputationally ef\ufb01cient, requiring only single invocation of MPC, its accuracy decreases compared\nto centralized method when the number of parties is large relative to the total amount of data \u2014 this\nis inherent to any model aggregation based method. The iterative gradient perturbation method, on\nthe other hand, does not suffer from accuracy degradation but requires one MPC protocol execution\nper iteration. Both methods achieve accuracy close to their non-private counterparts where no noise\nis added and no data privacy provided.\n\n1.1 Contributions\n\nThis work makes the following contributions, which address challenges in distributed learning.\n\nOutput Perturbation and Gradient Perturbation Methods. We propose two approaches to pri-\nvately train accurate machine learning models in the distributed setting. While the output perturbation\nmethod (Section 3.1) is computationally more ef\ufb01cient, the gradient perturbation method (Section 3.2)\nmaintains high accuracy regardless of how the data is partitioned.\n\nReduced Noise Bounds. We give noise bounds for each method that are smaller than the best\nprevious approaches for output perturbation [41] and gradient perturbation [46], while ensuring\ndifferential privacy in the distributed setting (Theorem 3.1 for output perturbation and Theorem 3.4\nfor gradient perturbation). For gradient perturbation, we use zero-concentrated differential privacy\nto achieve the lowest known bound on the privacy budget. Moreover, we generate the noise within\nthe MPC protocol. This allows us to add only a single copy of noise, compared to previous works\nthat combine noise from each participant [17, 4, 3, 24, 45]. We provide a theoretical analysis of our\nmethods\u2019 error bounds which match the state-of-art error bounds in centralized settings.\n\nExperimental Evaluation on Real Data Sets. We implement regularized logistic regression and\nregularized linear regression models for classi\ufb01cation and regression tasks respectively. We report\nresults from experiments performed on the KDDCup99 and Adult data sets for classi\ufb01cation and the\nKDDCup98 data set for regression. We compare our methods with previous work on distributed\nlearning, varying the number of parties and local data set sizes. Our methods produce models that are\nclosest to the non-private models in terms of model accuracy and generalization error since we add\nless noise than previous distributed learning methods.\n\n2 Background on Differential Privacy and Multi-Party Computation\n\nThis section introduces differential privacy (including the zero-concentrated differential privacy\nnotion we use), and secure multi-party computation.\n\nNotation: For any d-dimensional vector x = [x1, ..., xd](cid:62), we use (cid:107)x(cid:107) = ((cid:80)d\n0 < C < \u221e such that an \u2264 Cbn, and we use (cid:101)O(\u00b7) to hide the logarithmic factors.\n\ni=1 |xi|2)1/2 to denote\nits (cid:96)2-norm. Given two sequences {an} and {bn}, we write an = O(bn) if there exists a constant\n\n2.1 Differential Privacy\n\nDifferential privacy was introduced by Dwork [18] and is de\ufb01ned as follows:\nDe\ufb01nition 2.1 ((\u0001, \u03b4)-Differential Privacy). Given two adjacent data sets D, D(cid:48) \u2208 Dn differing by\na single element, a randomized mechanism M : Dn \u2192 Rd provides (\u0001, \u03b4)-differential privacy if it\nproduces response in the set S with probability P[M(D) \u2208 S] \u2264 e\u0001P[M(D(cid:48)) \u2208 S] + \u03b4.\n\nThe above de\ufb01nition reduces to \u0001-Differential Privacy (\u0001-DP) when \u03b4 = 0. We can achieve \u0001-\nDP and (\u0001, \u03b4)-DP by adding noise sampled from Laplace and Gaussian distributions respectively,\nwhere the noise is proportional to the sensitivity of M, given as \u2206M = (cid:107)M(D) \u2212 M(D(cid:48))(cid:107).\n\n3\n\n\fThroughout this paper we assume the (cid:96)2-sensitivity which considers the upper bound on the (cid:96)2-norm\nof M(D) \u2212 M(D(cid:48)).\n\nZero-Concentrated Differential Privacy. While, the notion of differential privacy performs well for\nmethods like output perturbation, it is not suitable for gradient perturbation methods which require re-\npeated sampling of noise in the iterative training procedure. Zero-concentrated differential privacy [6]\n(zCDP) has a tight composition bound and hence is a better option for gradient perturbation.\nWe \ufb01rst de\ufb01ne the privacy loss random variable which is used in the de\ufb01nition of zCDP.\nDe\ufb01nition 2.2. For two adjacent data sets D, D(cid:48) \u2208 Dn differing by one sample, a randomized\nmechanism M : Dn \u2192 Rd, and an outcome o \u2208 Rd, the privacy loss random variable Z is de\ufb01ned as\n\nP[M(D) = o]\nP[M(D(cid:48)) = o]\n\n.\n\nE(cid:2)e(\u03b1\u22121)Z(cid:3) \u2264 e(\u03b1\u22121)\u03b1\u03c1.\n\n(1)\nDe\ufb01nition 2.3. A randomized mechanism M : Dn \u2192 Rd satis\ufb01es \u03c1-zCDP if for any two adjacent\ndata sets D, D(cid:48) \u2208 Dn differing by one sample, it holds that for all \u03b1 \u2208 (1,\u221e),\n\nZ = log\n\n(2)\nNote that (2) implies that P[Z > \u03bb + \u03c1] \u2264 e\u2212\u03bb2/(4\u03c1) for all \u03bb > 0, which suggests that the privacy\nloss Z is tightly concentrated around zero mean, and hence it is unlikely to distinguish D from D(cid:48)\ngiven their outputs.\nBun and Steinke [6] give the following lemmas to achieve zCDP with the Gaussian mechanism.\nLemma 2.1 bounds the amount of Gaussian noise to guarantee \u03c1-zCDP. Lemma 2.2 gives the\ncomposition of multiple zCDP mechanisms. Finally, Lemma 2.3 speci\ufb01es the mapping from \u03c1-zCDP\nto (\u0001, \u03b4)-DP.\nLemma 2.1. Given a function q : Dn \u2192 Rd, the Gaussian Mechanism M = q(D) + u, where\nu \u223c N (0, \u03c32Id), satis\ufb01es \u22062(q)2/(2\u03c32)-zCDP.\nLemma 2.2. For two randomized mechanisms M1 : Dn \u2192 Rd, M2 : Dn \u00d7 Rd \u2192 Rd. If M1\nsatis\ufb01es \u03c11-zCDP and M2 satis\ufb01es \u03c12-zCDP, then M2(D,M1(D)) satis\ufb01es (\u03c11 + \u03c12)-zCDP.\nLemma 2.3. If a randomized mechanism M : Dn \u2192 Rd satis\ufb01es \u03c1-zCDP, then it satis\ufb01es (\u03c1 +\n\n2(cid:112)\u03c1 log(1/\u03b4), \u03b4)-differential privacy for any \u03b4 > 0.\n\n2.2 Secure Multi-Party Computation\n\nOur threat model considers semi-honest participants who wish to compute a joint functionality\nwithout revealing their individual inputs to other participants. In this threat model, while the parties\ndo not tamper with the joint functionality or provide garbage inputs, they are allowed to passively\ninfer about inputs of other parties based on the protocol execution. We use generic multi-party\ncomputation protocols to securely aggregate local models and gradients. A multi-party computation\n(MPC) protocol enables two or more parties to jointly compute a function of their private inputs,\nwithout disclosing any information about those inputs other than their size and whatever can be\ninferred from the revealed output [56]. The notion of MPC goes back to a series of talks given by\nAndrew Yao in the 1980s. The protocol he introduced, now known as Yao\u2019s garbled circuits protocol,\ncan compute any function securely. Numerous other secure multi-party computation protocols have\nbeen devised since then (e.g., [22, 32, 14, 37]), and many tools have been developed for ef\ufb01ciently\nimplementing MPC computations (e.g., [35, 13, 7, 27, 26, 44, 53, 57]). It is now practical to execute\ntwo-party protocols with millions of inputs [21, 23], and global-scale, many-party protocols with\nmalicious level security for small inputs [54].\nSecure aggregation of local classi\ufb01cation models using MPC was shown to be practical by Tian et\nal. [49]. This work used a two-party computation, with a semi-honest threat model and non-colluding\nservers. A similar approach has been used to scale multi-party regressions [38, 21]. We can use\nthese methods to achieve secure aggregation. For scenarios where the risks of collusion are too high,\nmany-party MPC protocols can be used that provide security to a single honest participant even if all\nother participants are malicious. In this work, we do not focus on improving or evaluating the MPC\nexecution, since the methods we propose can be implemented using well known MPC techniques.\nAppendix C provides information on the MPC implementation we use and its cost.\n\n4\n\n\f3 Multi-Party Machine Learning\n\nIn this section we describe our output perturbation and gradient perturbation methods in detail along\nwith theoretic analysis of differential privacy and generalization error bound.\nWe consider the following empirical risk minimization (ERM) objective:\n\nn(cid:88)\n\ni=1\n\nJD(\u03b8) =\n\n1\nn\n\n(cid:96)(\u03b8, xi, yi) + \u03bbN (\u03b8),\n\nwhere (cid:96)(\u03b8) is a convex loss function that is G-Lipschitz and L-smooth over \u03b8 \u2208 Rd. N (\u00b7) is\nregularization term. We consider J(\u00b7) to be \u03bb-strongly convex. Each data instance (xi, yi) \u2208 D lies\nin a unit ball. For a party j, with data set Dj of size nj, we denote its data instance as (x(j)\n).\n\n, y(j)\n\ni\n\ni\n\n3.1 Model Aggregation with Output Perturbation\n\nobtained by minimizing the local objective function: JDj (\u03b8) = 1\nnj\n\nWe extend the differential privacy bound of Chaudhuri et al. [10] to the multi-party setting, ensur-\ning suf\ufb01cient noise to preserve the privacy of each participant\u2019s data throughout the multi-party\ncomputation, including the \ufb01nal output.\n\nGiven m parties, each having a data set Dj of size nj and the corresponding local model estimator(cid:98)\u03b8(j)\n(cid:80)nj\n(cid:80)m\nj=1(cid:98)\u03b8(j) + \u03b7, where \u03b7 is the\nThe perturbed aggregate model estimator is given as (cid:98)\u03b8priv = 1\n(cid:80)m\nj=1(cid:98)\u03b8(j) + \u03b7 where\nTheorem 3.1. Given a perturbed aggregate model estimator (cid:98)\u03b8priv = 1\n(cid:98)\u03b8(j) = arg min\u03b8\nLipschitz , then(cid:98)\u03b8priv is \u0001-differentially private if\n\nLaplace noise added to the aggregate model estimator to preserve differential privacy. Secure model\naggregation can be performed using the framework of Tian et al. [49] as mentioned in Section 2.2.\nThe next theory provides a bound on the noise magnitude needed to achieve differential privacy:\n\n) + \u03bbN (\u03b8) and the data lie in a unit ball and (cid:96)(\u00b7) is G-\n\ni=1 (cid:96)(\u03b8, x(j)\n\ni=1 (cid:96)(\u03b8, x(j)\n\n(cid:80)nj\n\n) + \u03bbN (\u03b8).\n\n, y(j)\n\n, y(j)\n\n1\nnj\n\nm\n\nm\n\ni\n\ni\n\ni\n\ni\n\n(cid:18) 2G\n\nmn(1)\u03bb\u0001\n\n(cid:19)\n\n,\n\n\u03b7 = Lap\n\nwhere n(1) is the size of the smallest data set among the m parties, \u03bb is the regularization parameter\nand \u0001 is the differential privacy budget.\n\nProof. Let there be m parties such that one record of party j changes in the neighbouring data sets,\nthen\n\nPr((cid:98)\u03b8|D)\nPr((cid:98)\u03b8|D(cid:48))\n\nm\n\n(cid:17)\n(cid:16) 1\n(cid:80)\nm(cid:98)\u03b8(j) + \u03b7|D\ni(cid:54)=j(cid:98)\u03b8(i) + 1\nm(cid:98)\u03b8(cid:48)(j) + \u03b7|D(cid:48)(cid:17) =\n(cid:16) 1\n(cid:80)\ni(cid:54)=j(cid:98)\u03b8(i) + 1\n(cid:104) n(1)\u0001\u03bb\n(cid:107)(cid:98)\u03b8(j) \u2212(cid:98)\u03b8(cid:48)(j)(cid:107)(cid:105) \u2264 exp\n(cid:104) n(1)\u0001\u03bb\n\nm\n\nexp\n\nexp\n\n(cid:104) m.n(1)\u0001\u03bb\n(cid:105)\n(cid:107)(cid:98)\u03b8(j)(cid:107)\n(cid:104) m.n(1)\u0001\u03bb\n(cid:105)\n(cid:107)(cid:98)\u03b8(cid:48)(j)(cid:107)\n(cid:105) \u2264 exp (\u0001),\n\n2G\n\n2G\n\nm\n\nm\n\nPr\n\n=\n\nPr\n\u2264 exp\n\n2G\nnj\u03bb\n\n2G\n\n2G\n\nwhere the second inequality follows from Corollary 8 of Chaudhuri et al. [11].\n\nWe now provide a bound on the excess empirical risk and true risk similar to Pathak et al. [41]. Our\nbounds are tighter than Parthak et al.\u2019s as we require less differential privacy noise.\n\nTheorem 3.2. Given a perturbed aggregate model estimator (cid:98)\u03b8priv = 1\n(cid:98)\u03b8(j) = arg min\u03b8\n\n) + \u03bbN (\u03b8) and an optimal model estimator \u03b8\u2217 trained on\nthe centralized data such that the data lie in a unit ball and (cid:96)(\u00b7) is G-Lipschitz and L-smooth, then\nthe bound on excess empirical risk is given as:\n\n(cid:80)m\nj=1(cid:98)\u03b8(j) + \u03b7 where\n(cid:33)\n\ni=1 (cid:96)(\u03b8, x(j)\n\n(cid:80)nj\n\n(cid:32)\n\n, y(j)\n\n1\nnj\n\nm\n\ni\n\ni\n\nJ((cid:98)\u03b8priv) \u2264 J(\u03b8\u2217) + C1\n\nG2(\u03bb + L)\n\nn2\n(1)\u03bb2\n\nd2 log2(d/\u03b4)\n\nm2\u00012\n\n+\n\n\u0001\n\nd log(d/\u03b4)\n\n,\n\nwhere C1 is an absolute constant.\n\nm2 +\n\n5\n\n\fThe proof of Theorem 3.2 follows from Pathak et al. [41]. The main difference is that we use the\nsensitivity bound as 2G/(mn(1)\u03bb) instead of 2G/(n(1)\u03bb) and thereby achieve a tighter bound. The\nfull proof is given in Appendix A.1.\n\n) + \u03bbN (\u03b8) and an optimal model estimator \u03b8\u2217 trained on\nthe centralized data such that the data lie in a unit ball and (cid:96)(\u00b7) is G-Lipschitz and L-smooth, then\nthe following bound on true excess risk holds with probability at least 1 \u2212 \u03b3:\n\nTheorem 3.3. Given a perturbed aggregate model estimator (cid:98)\u03b8priv = 1\n(cid:98)\u03b8(j) = arg min\u03b8\nE[(cid:101)J((cid:98)\u03b8priv)] \u2212 min\nwhere n is the size of the centralized data set. (cid:101)J(\u03b8) = Ex,y[(cid:96)(\u03b8, x, y)] + \u03bbN (\u03b8), C1, C2 are absolute\n\n(cid:80)m\nj=1(cid:98)\u03b8(j) + \u03b7 where\n(cid:33)\n\n(cid:80)nj\n(cid:101)J(\u03b8) \u2264 C1\n\ni=1 (cid:96)(\u03b8, x(j)\n\nd2 log2(d/\u03b4)\n\nG2 log(1/\u03b3)\n\nG2(\u03bb + L)\n\nd log(d/\u03b4)\n\nn2\n(1)\u03bb2\n\n(cid:32)\n\n, y(j)\n\ni\n\ni\n\n,\n\n\u03bbn\n\nm2 +\n\nm2\u00012\n\n+\n\n\u0001\n\n+ C2\n\n1\nnj\n\nm\n\n\u03b8\n\nconstants, and the expectation is taking with respect to the noise \u03b7.\n\nSee Appendix A.2 for the proof of Theorem 3.3. The true excess risk bound in Theorem 3.3 implies\nthat the private output of our algorithm converges to the population optimum at the order of 1/n.\n\n3.2\n\nIterative Learning with Gradient Perturbation\n\nWe consider this centralized ERM objective for m parties, each with a data set Dj of size nj:\n\nm(cid:88)\n\nnj(cid:88)\n\nJD(\u03b8) = min\n\n\u03b8\n\n1\nm\n\n1\nnj\n\nj=1\n\ni=1\n\n(cid:96)(\u03b8, x(j)\n\ni\n\n, y(j)\n\ni\n\n) + \u03bbN (\u03b8).\n\nThe parties can collaboratively learn a differentially private model via iterative learning by adding\nnoise to the aggregated gradients within the MPC in each iteration with the following noise bound.\nTheorem 3.4. Given a centralized model estimator \u03b8T obtained by minimizing JD(\u03b8) after T\niterations of gradient descent algorithm executed jointly by m parties each having dataset D(j)\n) \u2208 D(j) lie in a unit ball and (cid:96)(\u03b8) is G-Lipschitz\nof size nj where each data instance (x(j)\nand L-smooth over \u03b8 \u2208 C. If the learning rate is 1/L and the gradients are perturbed with noise\nz \u2208 N (0, \u03c32Id), then \u03b8T is (\u0001, \u03b4)-differentially private if\n\n, y(j)\n\ni\n\ni\n\n\u03c32 =\n\n8G2T log(1/\u03b4)\n\nm2n2\n\n(1)\u00012\n\n,\n\n(3)\n\nwhere n(1) is the size of the smallest data set among the m parties.\n\nProof. Given a gradient at step t,\n\nMt = \u2207J(\u03b8, D) + N (0, \u03c32Ip) =\n\nm(cid:88)\n\nj=1\n\n1\nm\n\n1\nnj\n\nnj(cid:88)\n\ni=1\n\n\u2207(cid:96)(\u03b8, x(j)\n\ni\n\n, y(j)\n\ni\n\n) + N (0, \u03c32Ip).\n\nWe assume that only one data instance of one party changes in neighbouring datasets D and D(cid:48).\nHence the sensitivity bound, (cid:107)\u2207J(\u03b8, D) \u2212 \u2207J(\u03b8, D(cid:48))(cid:107) \u2264 2G\n\nThus, using Lemma 2.1, Mt is \u03c1-zCDP with \u03c1 = 2G2\nm2n2\n\nobserve that \u03b8T is T \u03c1-zCDP. Applying Lemma 2.3, we obtain T \u03c1 + 2(cid:112)T \u03c1 log(1/\u03b4) = \u0001. Solving\n\n(1)\u03c32 . By composition from Lemma 2.2, we\n\nmn(1)\n\n.\n\nthe roots of this equation, we obtain\n\n\u03c1 \u2248\n\n\u00012\n\n4T log(1/\u03b4)\n\n=\u21d2 \u03c32 =\n\n8G2T log(1/\u03b4)\n\nm2n2\n\n(1)\u00012\n\n.\n\nThus, \u03b8T is (\u0001, \u03b4)-differentially private for the above value of \u03c32.\n\nAdditionally, we observe that differential privacy is also guaranteed for each intermediate model\nestimator:\n\n6\n\n\fCorollary 3.5. Intermediate model estimator \u03b8t at each iteration t \u2208 [1, T ] is ((cid:112)t/T \u0001, \u03b4)-\n\ndifferentially private.\n\nHence, an adversary cannot obtain additional information from the intermediate computations. See\nAppendix A.3 for the proof of Corollary 3.5.\nNext, we provide theoretical bounds on the excess empirical risk and the true excess risk of our\nproposed method.\nTheorem 3.6. Given a centralized model estimator \u03b8T obtained by minimizing JD(\u03b8) after T\niterations of gradient descent algorithm executed jointly by m parties each having dataset D(j)\n) \u2208 D(j) lie in a unit ball and (cid:96)(\u03b8) is G-Lipschitz\nof size nj where each data instance (x(j)\nand L-smooth over \u03b8 \u2208 C. If the learning rate is 1/L and the gradients are perturbed with noise\nz \u2208 N (0, \u03c32Id) with \u03c32 de\ufb01ned in (3), and if we choose the iteration number as\n\n, y(j)\n\ni\n\ni\n\n(cid:32)\n\nT = (cid:101)O\n\n(cid:32) m2n2\n\n(1)\u00012\n\n(cid:33)(cid:33)\n\n,\n\nlog\n\ndG2 log(1/\u03b4)\n\nthen we have a bound on excess empirical risk:\nE[J(\u03b8T )] \u2212 J(\u03b8\u2217) \u2264 C1\n\nG2Ld log2(mn(1)) log(1/\u03b4)\n\nm2n2\n\n(1)\u03bb2\u00012\n\n,\n\nwhere the expectation is taking with respect to the noise \u03b7, n(1) is the size of the smallest data set\namong the m parties, C1 is an absolute constant.\n\nAppendix A.4 provides the proof.\nBased on the excess empirical risk, we next derive the true excess risk.\nTheorem 3.7. Given a centralized model estimator \u03b8T obtained by minimizing JD(\u03b8) after T\niterations of a gradient descent algorithm executed jointly by m parties each having dataset D(j) of\n) \u2208 D(j) lie in a unit ball and (cid:96)(\u03b8) is G-Lipschitz and\nsize nj where each data instance (x(j)\nL-smooth over \u03b8 \u2208 C. If we choose the learning rate, noise level, and iteration number as suggested\nin Theorem 3.6, with probability at least 1 \u2212 \u03b3, we have the following bound on true excess risk:\n\n, y(j)\n\ni\n\ni\n\nE[(cid:101)J(\u03b8T )] \u2212 min\u03b8(cid:101)J(\u03b8) \u2264 C1\n\nG2Ld log2(mn(1)) log(1/\u03b4)\n\nm2n2\n\n(1)\u03bb2\u00012\n\n+ C2\n\nG2 log(1/\u03b3)\n\n\u03bbn\n\n,\n\nwhere n is the size of the centralized data set, n(1) is the size of the smallest data set among the m\nparties and C1, C2 are absolute constants.\n\nTheorem 3.7 (proof in Appendix A.5) suggests that the output of our iterative gradient perturbation\nmethod converges to the population optimum at the order of 1/n. Note that our true excess risk\nbound is comparable to that of Wang et al. [51] in centralized setting.\n\n4 Experiments\n\nWe report on experiments for both classi\ufb01cation and regression tasks. For classi\ufb01cation, we use a\nregularized logistic regression model over the KDDCup99 [25] data set (additional experiments on\nthe Adult [2] data set yield similar results, described in Appendix B.3). The KDDCup99 data set\ncontains around 5,000,000 network instances. The task is to predict whether a network connection\nis a denial-of-service attack or not. We randomly sample 70,000 records and divide it into training\nset of 50,000 records and test set of 20,000 records. We pre-processed the data according to the\nprocedure of Chaudhuri et al. [11], resulting in records with 122 features. For regression, we train a\nridge regression model over the KDDCup98 [40] data set, consisting of demographic and other related\ninformation of approximately 200,000 American veterans. The task is to predict the donation amount\nof an individual in dollars. We randomly sample 70,000 records and divide it into a training set of\n50,000 records and test set of 20,000 records. We perform the same pre-processing as in the case of\nprevious data sets and additionally perform feature selection using PCA to retain around 100 features.\nAfter pre-processing, each record consists of 95 features.\n\n7\n\n\fTable 1: Comparison of noise magnitudes for various multi-party differential privacy methods.\n\nPathak\n\nMPC Grad P\n\nLocal Out P\n\nLocal Obj P\n\nLocal Grad P MPC Out P\n\nAnalytical Bound (L - Laplace, N - Gaussian)\n\nL(\n\n\u221a\nL( 2G\nmn(1)\u0001)\nn(1)\u03bb\u0001)\nNoise Generation Input (m = 100, n(1) = 500, \u03bb = 0.01, \u0001 = 0.5, G = 1 and T = 100)\n0.57 \u00d710\u22123\n800 \u00d710\u22123\n1150 \u00d710\u22123\n0.572 \u00d710\u22123\n\n2G\u221a\nmn(1)\u03bb\u0001)\nmn(1)\u03bb\u0001)\n8.00 \u00d710\u22123\n80.0 \u00d710\u22123\nGenerated Noise (standard deviation over 1000 samples)\n112 \u00d710\u22123\n12.2 \u00d710\u22123\n\n\u221a\n\u221a\nmn(1)\u0001)\n5.66 \u00d710\u22123\n5.63 \u00d710\u22123\n\nL( 2G\nn(1)\u0001)\n8.00 \u00d710\u22123\n11.6 \u00d710\u22123\n\nN (\n\nN (\n\nL(\n\n2T G\n\n2G\n\n2T G\n\nFor all the experiments, we set Lipschitz constant G = 1, learning rate \u03b7 = 1, regularization\ncoef\ufb01cient \u03bb = 0.001, privacy budget \u0001 = 0.5, failure probability \u03b4 = 0.001 and total number of\niterations T = 1, 500 for gradient descent. We compare our methods with the baselines in terms of\noptimality gap and relative accuracy loss. Optimality gap is the measure of empirical risk bound\nJ(\u03b8) \u2212 J(\u03b8\u2217) over the training data, where \u03b8\u2217 is the optimal non-private model in the centralized\nsetting. Relative accuracy loss is the difference in the accuracy (mean square error in case of\nregression) of \u03b8 and \u03b8\u2217 over the test data. We measure the optimality gap and relative accuracy loss\nof all the models up to 1,500 iterations of gradient descent training and report the results for different\npartitioning of training data sets. We vary the number of parties m from 100 (where each party has\n500 data instances) to 1,000 parties (with each party having 50 data instances) and up to 50,000\nparties (each having only one data instance).\n\nBaselines for comparison. For the model aggregation method, we compare to the method of Pathak\net al. [41] (denoted as Pathak), and the other differential privacy baselines are obtained by applying\nthe output perturbation (denoted as Local Out P) and objective perturbation (denoted as Local Obj P)\n\nmethod of Chaudhuri et al. [11] on each local model estimator(cid:98)\u03b8(j) to obtain a differentially private\naggregate model(cid:98)\u03b8priv. For the iterative learning method, we consider the baseline of aggregation of\n\nlocal model estimator and then the model aggregation is performed to obtain the differentially private\n\nlocally perturbed gradients similar to that of Shokri and Shmatikov [46] (denoted as Local Grad P),\nexcept that we improve the noise bound by using zCDP. We also include the method of Rajkumar\nand Agarwal [43] (denoted as Rajkumar and Agarwal) in our comparison, though note that their\nmethod does not provide the same level of privacy as our method. Our output perturbation based\nmodel aggregation method and gradient perturbation based iterative learning method are denoted as\nMPC Out P and MPC Grad P respectively. All the above methods consume a total privacy budget\nof \u0001 = 0.5, except Rajkumar and Agarwal which consumes \u0001 = 0.5 budget each iteration. Table 1\nsummarizes the amount of noise each method needs to preserve differential privacy. As the table\nshows, our methods add the least amount of noise. Though Local Obj P adds noise in the same\nrange as our methods, it uses the noise in a fundamentally different way. While the other methods\nadd the sampled noise (either via output perturbation or via gradient perturbation) to the optimal\nnon-private model that minimizes the required objective function J(\u03b8), Local Obj P adds the sampled\nnoise directly to the objective function J(\u03b8) and hence optimizes an altogether different objective\nfunction J(cid:48)(\u03b8) = J(\u03b8) + Lap( 2G\nn1\u0001 ), which explains why its optimality gap increases with decreasing\nvalue of local data set size n(1).\n\nResults. Figures 1 and 2 show the results for m = 1, 000; Appendix B includes plots for other\nnumbers of parties. For both the classi\ufb01cation and regression tasks, our proposed methods perform\nbetter than the baselines both in terms of optimality gap and relative accuracy loss. For the classi\ufb01-\ncation task (Figure 1), MPC Grad P achieves optimality gap in the order of 10\u22123 in 500 iterations\nand relative accuracy loss in the order of 10\u22124 within 200 iterations, and MPC Out P also achieves\nvalues in the same range. Rajkumar and Agarwal adds noise of the same order as our methods and\nhence achieves performance close to ours, but as noted earlier, their method consumes \u0001 budget per\niteration. Our methods perform order of magnitudes better than the other baselines.\nFor the regression task (Figure 2), MPC Grad P gradually converges to an optimality gap in the\norder of 10\u22123 and relative accuracy loss in the order of 10\u22122. MPC Out P incurs loss due to data\npartitioning (which is unavoidable even for non-private aggregation methods) but still outperforms\nthe baselines of model aggregation by orders of magnitude.\n\n8\n\n\fFigure 1: Optimality Gap and Relative Accuracy Loss Comparison on KDDCup99 (m = 1, 000). (All\nmodels have privacy budget \u0001 = 0.5, except Rajkumar and Agarwal which consumers \u0001 = 0.5 privacy\nbudget each iteration.)\n\nFigure 2: Optimality Gap and Relative Accuracy Loss Comparison on KDDCup98 (m = 1, 000). (As\nin Figure 1, all models have privacy budget \u0001 = 0.5, except Rajkumar and Agarwal.)\n\n5 Conclusions\n\nOur work shows how the noise required for a distributed-learning setting can be reduced by gen-\nerating and adding noise within a secure computation. Our output perturbation model aggregation\nmethod achieves \u0001-differential privacy, and our iterative gradient perturbation method provides (\u0001, \u03b4)-\ndifferential privacy. Both methods improve on the best previous utility bounds for privacy-preserving\ndistributed learning. While our model aggregation method requires only a single secure aggregation\n(and hence is ef\ufb01cient), our iterative learning method maintains accuracy regardless of the data\npartitioning. Our approach of secure aggregation using MPC is general enough to support any\nmachine learning algorithm. Our long-term goal is to improve understanding of the utility-privacy\ntrade-off in distributed learning, and provide mechanisms for maximizing utility while satisfying\nprivacy requirements.\n\nCode: https://github.com/bargavj/distributedMachineLearning.git\n\nAcknowledgements. This work was partially supported by the National Science Foundation (Awards\n#1111781, #1717950, and #1804603) and research awards from Google, Intel, and Amazon.\n\n9\n\n05001000T103102101100101102103Optimality GapMPC Grad PRajkumar and Agarwal*MPC Out PLocal Grad PLocal Obj PLocal Out PPathak05001000T104103102101100Relative Accuracy LossMPC Grad PRajkumar and Agarwal*MPC Out PLocal Grad PLocal Obj PLocal Out PPathak05001000T103102101100101102103Optimality GapMPC Grad PRajkumar and Agarwal*MPC Out PLocal Grad PLocal Obj PLocal Out PPathak05001000T102101100101102103Relative Accuracy LossMPC Grad PRajkumar and Agarwal*MPC Out PLocal Grad PLocal Obj PLocal Out PPathak\fReferences\n[1] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar,\nIn ACM SIGSAC Conference on\n\nand Li Zhang. Deep learning with differential privacy.\nComputer and Communications Security, 2016.\n\n[2] A. Asuncion and D. J. Newman. UCI machine learning repository, 2007.\n\n[3] Vincent Bindschaedler, Shantanu Rane, Alejandro E Brito, Vanishree Rao, and Ersin Uzun.\nAchieving differential privacy in secure multiparty data aggregation protocols on star networks.\nIn Seventh ACM on Conference on Data and Application Security and Privacy, 2017.\n\n[4] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan,\nSarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation\nIn ACM SIGSAC Conference on Computer and\nfor privacy-preserving machine learning.\nCommunications Security, 2017.\n\n[5] George E. P. Box and Mervin E. Muller. A note on the generation of random normal deviates.\n\nThe Annals of Mathematical Statistics, 29(2):610\u2013611, 1958.\n\n[6] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simpli\ufb01cations, extensions,\n\nand lower bounds. In Theory of Cryptography Conference, 2016.\n\n[7] Martin Burkhart, Mario Strasser, Dilip Many, and Xenofontas Dimitropoulos. SEPIA: Privacy-\npreserving aggregation of multi-domain network events and statistics. In USENIX Security\nSymposium, 2010.\n\n[8] Nicholas Carlini, Chang Liu, Jernej Kos, \u00dalfar Erlingsson, and Dawn Song. The secret sharer:\nMeasuring unintended neural network memorization & extracting secrets. arXiv preprint\n1802.08232, 2018.\n\n[9] Melissa Chase, Ran Gilad-Bachrach, Kim Laine, Kristin Lauter, and Peter Rindal. Private\ncollaborative neural network learning. Technical report, Cryptology ePrint Archive, Report\n2017/762, 2017, 2017.\n\n[10] Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In Advances\n\nin Neural Information Processing Systems, 2009.\n\n[11] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical\n\nrisk minimization. Journal of Machine Learning Research, 2011.\n\n[12] Yi-Ruei Chen, Amir Rezapour, and Wen-Guey Tzeng. Privacy-preserving ridge regression on\n\ndistributed data. Information Sciences, 451:34\u201349, 2018.\n\n[13] Ivan Damg\u00e5rd, Martin Geisler, Mikkel Kr\u00f8igaard, and Jesper Buus Nielsen. Asynchronous\nmultiparty computation: Theory and implementation. In International Workshop on Public Key\nCryptography, 2009.\n\n[14] Ivan Damg\u00e5rd, Valerio Pastro, Nigel Smart, and Sarah Zakarias. Multiparty computation from\n\nsomewhat homomorphic encryption. In Advances in Cryptology\u2014CRYPTO. 2012.\n\n[15] Jack Doerner. Absentminded Crypto Kit. https://bitbucket.org/jackdoerner/absentminded-crypto-kit, 2017.\n\n[16] Cynthia Dwork. Differential Privacy: A Survey of Results. In International Conference on\n\nTheory and Applications of Models of Computation, 2008.\n\n[17] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our\ndata, ourselves: Privacy via distributed noise generation. In Annual International Conference\non the Theory and Applications of Cryptographic Techniques (EuroCrypt), 2006.\n\n[18] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to\n\nSensitivity in Private Data Analysis. In Theory of Cryptography Conference, 2006.\n\n[19] Cynthia Dwork and Kobbi Nissim. Privacy-Preserving Datamining on Vertically Partitioned\n\nDatabases. In Advances in Cryptology\u2014CRYPTO, 2004.\n\n10\n\n\f[20] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart.\nPrivacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In\n23rd USENIX Security Symposium, 2014.\n\n[21] Adri\u00e0 Gasc\u00f3n, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee\nZahur, and David Evans. Privacy-preserving distributed linear regression on high-dimensional\ndata. Proceedings on Privacy Enhancing Technologies, 2017(4):345\u2013364, 2017.\n\n[22] Sha\ufb01 Goldwasser, Silvio M. Micali, and Avi Wigderson. How to play any mental game, or\na completeness theorem for protocols with an honest majority. In 19th ACM Symposium on\nTheory of Computing, 1987.\n\n[23] Trinabh Gupta, Henrique Fingler, Lorenzo Alvisi, and Michael Wal\ufb01sh. Pretzel: Email encryp-\ntion and provider-supplied functions are compatible. In Conference of the ACM Special Interest\nGroup on Data Communication (SIGCOMM), 2017.\n\n[24] Mikko Heikkil\u00e4, Yusuke Okimoto, Samuel Kaski, Kana Shimizu, and Antti Honkela. Dif-\nferentially private bayesian learning on distributed data. arXiv preprint arXiv:1703.01106,\n2017.\n\n[25] S. Hettich and S. D Bay. UCI machine learning repository, 1999.\n\n[26] Andreas Holzer, Martin Franz, Stefan Katzenbeisser, and Helmut Veith. Secure two-party\ncomputations in ANSI C. In ACM Conference on Computer and Communications Security,\n2012.\n\n[27] Yan Huang, David Evans, Jonathan Katz, and Lior Malka. Faster secure two-party computation\n\nusing garbled circuits. In 20th USENIX Security Symposium, 2011.\n\n[28] Yan Huang, Jonathan Katz, and David Evans. Quid-pro-quo-tocols: Strengthening semi-honest\n\nprotocols with dual execution. In IEEE Symposium on Security and Privacy, 2012.\n\n[29] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning.\n\nIn 25th Annual Conference on Learning Theory, 2012.\n\n[30] Prateek Jain and Abhradeep Thakurta. Differentially private learning with kernels. International\n\nConference on Machine Learning, 2013.\n\n[31] Yehuda Lindell and Benny Pinkas. Privacy Preserving Data Mining. In Advances in Cryptology\u2013\n\nCRYPTO, 2000.\n\n[32] Yehuda Lindell and Benny Pinkas. An Ef\ufb01cient Protocol for Secure Two-Party Computation in\n\nthe Presence of Malicious Adversaries. In Advances in Cryptology\u2014EUROCRYPT. 2007.\n\n[33] Yehuda Lindell and Benny Pinkas. Secure Multiparty Computation for Privacy-Preserving Data\n\nMining. Journal of Privacy and Con\ufb01dentiality, 2009.\n\n[34] Xu Ma, Fangguo Zhang, Xiaofeng Chen, and Jian Shen. Privacy preserving multi-party\n\ncomputation delegation for deep learning in cloud computing. Information Sciences, 2018.\n\n[35] Dahlia Malkhi, Noam Nisan, Benny Pinkas, and Yaron Sella. Fairplay-secure two-party\n\ncomputation system. In USENIX Security Symposium, 2004.\n\n[36] H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. Learning differentially\nprivate language models without losing accuracy. In 6th International Conference on Learning\nRepresentations, 2018.\n\n[37] Jesper Buus Nielsen, Peter Sebastian Nordholt, Claudio Orlandi, and Sai Sheshank Burra. A\nNew Approach to Practical Active-Secure Two-Party Computation. In Advances in Cryptology\u2014\nCRYPTO. 2012.\n\n[38] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and N. Taft. Privacy-preserving\nridge regression on hundreds of millions of records. In IEEE Symposium on Security and\nPrivacy, 2013.\n\n11\n\n\f[39] Valeria Nikolaenko, Udi Weinsberg, Stratis Ioannidis, Marc Joye, Dan Boneh, and Nina Taft.\nPrivacy-preserving ridge regression on hundreds of millions of records. In IEEE Symposium on\nSecurity and Privacy, 2013.\n\n[40] Ismail Parsa and Ken Howes. UCI machine learning repository, 1998.\n\n[41] Manas Pathak, Shantanu Rane, and Bhiksha Raj. Multiparty Differential Privacy via Aggregation\nof Locally Trained Classi\ufb01ers. In Advances in Neural Information Processing Systems, 2010.\n\n[42] Benny Pinkas, Thomas Schneider, Nigel P Smart, and Stephen C Williams. Secure Two-\nParty Computation Is Practical. In International Conference on the Theory and Application of\nCryptology and Information Security, 2009.\n\n[43] Arun Rajkumar and Shivani Agarwal. A differentially private stochastic gradient descent\n\nalgorithm for multiparty classi\ufb01cation. In Arti\ufb01cial Intelligence and Statistics, 2012.\n\n[44] Aseem Rastogi, Matthew A Hammer, and Michael Hicks. Wysteria: A programming language\nfor generic, mixed-mode multiparty computations. In 35th IEEE Symposium on Security and\nPrivacy, 2014.\n\n[45] Elaine Shi, T.-H. Hubert Chan, Eleanor Rieffel, and Dawn Song. Distributed private data analy-\nsis: Lower bounds and practical constructions. ACM Transactions on Algorithms, 13(4):50:1\u2013\n50:38, December 2017.\n\n[46] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In ACM Conference on\n\nComputer and Communications Security, 2015.\n\n[47] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference\nattacks against machine learning models. In IEEE Symposium on Security and Privacy, 2017.\n\n[48] Karthik Sridharan, Shai Shalev-shwartz, and Nathan Srebro. Fast rates for regularized objectives.\n\nIn Advances in Neural Information Processing Systems. 2009.\n\n[49] Lu Tian, Bargav Jayaraman, Quanquan Gu, and David Evans. Aggregating private sparse\nlearning models using multi-party computation. In NIPS Workshop on Private Multi-Party\nMachine Learning, 2016.\n\n[50] Jaideep Vaidya, Murat Kantarc\u0131o\u02d8glu, and Chris Clifton. Privacy-preserving na\u00efve bayes classi\ufb01-\n\ncation. The VLDB Journal, 2008.\n\n[51] Di Wang, Minwei Ye, and Jinhui Xu. Differentially private empirical risk minimization revisited:\n\nFaster and more general. In Advances in Neural Information Processing Systems. 2017.\n\n[52] Qian Wang, Minxin Du, Xiuying Chen, Yanjiao Chen, Pan Zhou, Xiaofeng Zhou, and Xinyi\nHuang. Privacy-preserving collaborative model learning: The case of word vector training.\nIEEE Transactions on Knowledge and Data Engineering, 2018.\n\n[53] Xiao Wang, Alex J. Malozemoff, and Jonathan Katz. EMP-toolkit: Ef\ufb01cient multiparty compu-\n\ntation toolkit. https://github.com/emp-toolkit, 2016.\n\n[54] Xiao Wang, Samuel Ranellucci, and Jonathan Katz. Global-scale secure multiparty computation.\n\nIn ACM SIGSAC Conference on Computer and Communications Security, 2017.\n\n[55] Zhiqiang Yang, Sheng Zhong, and Rebecca N. Wright. Privacy-preserving classi\ufb01cation of\ncustomer data without loss of accuracy. In SIAM International Conference on Data Mining,\n2005.\n\n[56] Andrew C Yao. Protocols for secure computations. In Symposium on Foundations of Computer\n\nScience, 1982.\n\n[57] Samee Zahur and David Evans. Obliv-C: A language for extensible data-oblivious computation.\n\nCryptology ePrint Archive, Report 2015/1153, 2015.\n\n[58] Jiaqi Zhang, Kai Zheng, Wenlong Mou, and Liwei Wang. Ef\ufb01cient private erm for smooth\n\nobjectives. In 26th International Joint Conference on Arti\ufb01cial Intelligence, 2017.\n\n12\n\n\f", "award": [], "sourceid": 3133, "authors": [{"given_name": "Bargav", "family_name": "Jayaraman", "institution": "University of Virginia"}, {"given_name": "Lingxiao", "family_name": "Wang", "institution": "University of California, Los Angeles"}, {"given_name": "David", "family_name": "Evans", "institution": "University of Virginia"}, {"given_name": "Quanquan", "family_name": "Gu", "institution": "UCLA"}]}