{"title": "Differentially Private k-Means with Constant Multiplicative Error", "book": "Advances in Neural Information Processing Systems", "page_first": 5431, "page_last": 5441, "abstract": "We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error guarantees than the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of interaction rounds.\n\nAlthough the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present, for the first time, efficient private algorithms for the problem with constant multiplicative error. Furthermore, we show how to modify our algorithms so they compute private coresets for k-means clustering in both models.", "full_text": "Differentially Private k-Means with Constant\n\nMultiplicative Error\n\nHaim Kaplan\n\nTel Aviv University and Google\n\nhaimk@post.tau.ac.il\n\nUri Stemmer\u2217\n\nBen-Gurion University\n\nu@uri.co.il\n\nAbstract\n\nWe design new differentially private algorithms for the Euclidean k-means problem,\nboth in the centralized model and in the local model of differential privacy. In both\nmodels, our algorithms achieve signi\ufb01cantly improved error guarantees than the\nprevious state-of-the-art. In addition, in the local model, our algorithm signi\ufb01cantly\nreduces the number of interaction rounds.\nAlthough the problem has been widely studied in the context of differential privacy,\nall of the existing constructions achieve only super constant approximation factors.\nWe present\u2014for the \ufb01rst time\u2014ef\ufb01cient private algorithms for the problem with\nconstant multiplicative error. Furthermore, we show how to modify our algorithms\nso they compute private coresets for k-means clustering in both models.\n\n1\n\nIntroduction\n\nClustering, and in particular center based clustering, are central problems in unsupervised learning.\nSeveral cost objectives have been intensively studied for center based clustering, such as minimizing\nthe sum or the maximum of the distances of the input points to the centers. Most often the data\nis embedded in Euclidean space and the distances that we work with are Euclidean distances. In\nparticular, probably the most studied center based clustering problem is the Euclidean k-means\nproblem. In this problem we are given a set of n input points in Rd and our goal is to \ufb01nd k centers\nthat minimize the sum of squared distances between each input point to its nearest center.2 When\nprivacy is not a concern one usually solves this problem by running Lloyd\u2019s algorithm [25] initialized\nby k-means++ [4]. This produces k-centers of cost that is no worse than O(log k) times the cost of\nthe optimal centers and typically much lower in practice.\nThe huge applicability of k-means clustering, together with the increasing awareness and demand\nfor user privacy, motivated the study of privacy preserving k-means algorithms. It is especially\ndesirable to achieve differential privacy [12], a privacy notion which has been widely adopted by\nthe academic community as well as big corporations like Google, Apple, and Microsoft. Indeed,\nconstructions of differentially private k-means algorithms have received a lot of attention over the last\n14 years [8, 28, 14, 18, 27, 33, 31, 32, 17, 6, 29, 21]. In this work we design new differentially private\nk-means algorithms, both for the centralized model (where a trusted curator collects the sensitive\ninformation and analyzes it with differential privacy) and for the local model (where each respondent\nrandomizes her answers to the data curator to protect her privacy). In both models, our algorithms\noffer signi\ufb01cant improvements over the previous state-of-the-art.\n\n\u2217Work done while the second author was a postdoctoral researcher at the Weizmann Institute of Science,\n\nsupported by a Koshland fellowship, and by the Israel Science Foundation (grants 950/16 and 5219/17).\n\n2The sum of squares is nice to work with since we do not have to compute square roots. Furthermore, for a\ngiven cluster its center of mass is the minimizer of the sum of the squared distances. These properties make\nk-means to be the favorite cost objective for center based clustering.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fReference\n\nFeldman et al. (2009) [14]\n\nNock et al. (2016) [31]\n\nFeldman et al. (2017) [17]\n\nBalcan et al. (2017) [6]\n\nNissim and Stemmer (2018) [29]\n\nThis work\n\nMultiplicative Error\n\n\u221a\n\nO(\n\nd)\n\nO(log k)\n\nO(k log n)\n\nO(log3 n)\n\nO(k)\n\nO(1)\n\nAdditive Error\n\n\u02dcO(cid:0)(kd)2d(cid:1)\nO(cid:0)n/ log2 n(cid:1)\n(cid:16)\u221a\nd \u00b7 k1.5(cid:17)\n\u02dcO(cid:0)d + k2(cid:1)\n\u02dcO(cid:0)d0.51 \u00b7 k1.51(cid:1)\n\u02dcO(cid:0)k1.01 \u00b7 d0.51 + k1.5(cid:1)\n\n\u02dcO\n\nTable 1: Private algorithms for k-means. Here n is the number of input points, k is the number of\ncenters, and d is the dimension. For simplicity, we assume that input points come from the unit ball,\nand omit the dependency in \u03b5, as well as logarithmic factors in k, n, d, \u03b2, \u03b4, from the additive error.\n\nBefore describing our new results, we de\ufb01ne our setting more precisely. Consider an input database\nS = (x1, . . . , xn) \u2208 (Rd)n containing n points in Rd, where every point xi \u2208 S is the (sensitive)\ninformation of one individual. The goal is to identify a set of k centers C = {c1, . . . , ck} in Rd\napproximately minimizing the following quantity, referred to as the cost of the centers\n\nn(cid:88)\n\ni=1\n\ncostS(C) =\n\n(cid:107)xi \u2212 cj(cid:107)2\n2.\n\nmin\nj\u2208[k]\n\nThe privacy requirement is that the output of our algorithm (the set of centers) does not reveal\ninformation that is speci\ufb01c to any single individual. Formally,\nDe\ufb01nition 1.1 ([12]). A randomized algorithm A : X n \u2192 Y is (\u03b5, \u03b4) differentially private if\nfor every two databases S, S(cid:48) \u2208 X n that differ in one point, and every set T \u2286 Y , we have\nPr[A(S) \u2208 T ] \u2264 e\u03b5 \u00b7 Pr[A(S(cid:48)) \u2208 T ] + \u03b4.\nCombining the utility and privacy requirements, we are seeking for a computationally ef\ufb01cient\ndifferentially private algorithm that identi\ufb01es a set of k centers C such that w.h.p. costS(C) \u2264\n\u03b3 \u00b7 OPTS +\u03b7, where OPTS is the optimal cost. We want \u03b3 and \u03b7 to be as small as possible, as\na function of the number of input points n, the dimension d, the number of centers k, the failure\nprobability \u03b2, and the privacy parameters \u03b5, \u03b4.\nWe remark that a direct consequence of the de\ufb01nition of differential privacy is that, unlike in the\nnon-private literature, every private algorithm for this problem must have additive error \u03b7 > 0. In\nfact, if all points reside with the d-dimensional ball, B(0, \u039b), of radius \u039b around the origin (as we\nassume in this paper) then \u03b7 must be at least \u039b2. To see this, consider k + 1 locations p1, . . . , pk+1\nat pairwise distances \u039b, and consider the following two neighboring datasets. The \ufb01rst dataset S1\ncontains n \u2212 k + 1 copies of p1, and (one copy of) p2, . . . , pk. The second dataset S2 is obtained\nfrom S1 by replacing pk with pk+1. Since in both cases there are only k distinct input points, the\noptimal cost for each of these datasets is zero. On the other hand, by the constraint of differential\nprivacy, the set of centers we compute essentially cannot be affected by this change. Therefore we\nmust have expected error of \u2126(\u039b2) on at least one of these inputs. To simplify the presentation we\nassume that \u039b = 1 in rest of the introduction.\nTraditionally, in the non-private literature, the goal is to minimize the multiplicative error \u03b3, with\nthe current state-of-the-art (non-private) algorithm achieving multiplicative error of \u03b3 = 6.357 (with\nno additive error) [2]. In contrast, in spite of the long line of works on private k-means [8, 28,\n14, 18, 27, 33, 31, 32, 17, 6, 29, 21], all of the existing polynomial time private algorithms for the\nproblem obtained only a super constant multiplicative error. We present the \ufb01rst polynomial time\ndifferentially private algorithm for the Euclidean k-means problem with constant multiplicative error,\nwhile essentially keeping the additive error the same as in previous state-of-the-art results. See Table 1\nfor a comparison.\n\n1.1 Locally private k-means\n\nIn the local model of differential privacy (LDP), there are n users and an untrusted server. Each user\ni is holding a private input item xi (a point in Rd in our case), and the server\u2019s goal is to compute\n\n2\n\n\fsome function of the inputs (approximate the k-means in our case). However, in this model, the users\ndo not send their data as is to the server. Instead, every user randomizes her data locally, and sends\na differentially private report to the server, who aggregates all the reports. Informally, the privacy\nrequirement is that the input of user i has almost no effect on the distribution of the messages that\nuser i sends to the server. This is the model used by Apple, Google, and Microsoft in practice to\nensure that private data never reaches their servers in the clear.\nWith increasing demand from the industry, the local model of differential privacy is now becoming\nmore and more popular. Nevertheless, the only currently available k-means algorithm in this model\n(with provable utility guarantees) is that of Nissim and Stemmer [29], with O(k) multiplicative error.\nWe present a new LDP algorithm for the k-means achieving constant multiplicative error. In addition,\nthe protocol of [29] requires O(k log n) rounds of interactions between the server and the users,\nwhereas our protocol uses only O(1) such rounds.\n\n1.2 Classical algorithms are far from being private\n\nWe highlight some of the challenges that arise when trying to construct private variants for existing\n(non-private) algorithms. Recall for example the classical (non-private) Lloyd\u2019s algorithm, where in\nevery iteration the input points are grouped by their proximity to the current centers, and the points in\nevery group are averaged to obtain the centers for the next round. One barrier for constructing a private\nanalogue of this algorithm is that, with differential privacy, the privacy parameters deteriorate with\nnumber of (private) computations that we apply to the dataset. So, even if we were able to construct\na private analogue for every single iteration, our approximation guarantees would not necessarily\nimprove with every iteration. In more details, composition theorems for differential privacy [13]\nallow for applying O(n2) private computations before exhausting the privacy budget completely.\nLloyd\u2019s algorithm, however, might perform a much larger number of iterations (exponential in n in\nworst case). Even the bounds on its smoothed complexity are much larger than n2 (currently \u2248 n32\nis known [3]). In addition, classical techniques for reducing the number of iterations often involve\ncomputations which are highly sensitive to a change of a small number of input points. For example,\nrecall that in k-means++ [4] the initial k centers (with which Lloyd\u2019s algorithm is typically initiated)\nare chosen from the data points themselves, an operation which cannot be applied as is when the data\npoints are private.\nThese challenges are re\ufb02ected in the recent work of Nock et al. [31], who constructed a private variant\nfor the k-means++ algorithm. While their private algorithm achieves a relatively low multiplicative\nerror of O(log k), their additive error is \u02dcO(n). In this work we are aiming for additive error at most\npolylogarithmic in n. Note that having additive error of n is meaningless, since if points come from\nthe unit ball then every choice of k centers have error at most O(n).\n\n1.3 On the evolution of private k-means algorithms\n\nThe starting point of our work is the observation that by combining ideas from three previous\nworks [18, 6, 29] we can obtain a differentially private k-means algorithm (in the centralized model)\nwith constant multiplicative error, but with a relatively large additive error which is polynomial in n\n(as we will see in Section 1.4). Most of our technical efforts (in the centralized model) are devoted\nto reducing the additive error while keeping the multiplicative error constant. We now describe the\nresults of [18, 6, 29].\nGupta et al. [18] constructed a private variant for the classical local search heuristic [5, 24] for\nk-medians and k-means. In this local search heuristic, we start with an arbitrary choice of k centers,\nand then proceed in iterations, where in every iteration we replace one of our current centers with\na new one, so as to reduce the k-means cost. Gupta et al. [18] constructed a private variant of the\nlocal search heuristic by using the (generally inef\ufb01cient) exponential mechanism of McSherry and\nTalwar [26] in order to privately choose a replacement center in every step. While the algorithm\nof Gupta et al. [18] obtains superb approximation guarantees3, its runtime is exponential in the\nrepresentation length of domain elements. Speci\ufb01cally, it is designed for a discrete version of the\nproblem, in which centers come from a \ufb01nite set Y , and the runtime of their algorithm is at least linear\nin |Y |. In particular, when applying their algorithm to the Euclidean space, one must \ufb01rst discretize\nthe space of possible centers, and account for the error introduced by this discretization. For example,\n\n3The algorithm of [18] obtains O(1) multiplicative error and \u02dcO(k2d) additive error.\n\n3\n\n\fGupta et al. mentions that one can take Y to be a discretization net of the unit d-dimensional ball.\nHowever, to ensure small discretization error, such a net would need to be of size |Y | \u2248 nd, and\nhence, would result in an inef\ufb01cient algorithm (since the runtime is linear in |Y |).\nBalcan et al. [6] suggested the following strategy in order to adopt the techniques of Gupta et al. [18]\nto the Euclidean space while maintaining ef\ufb01ciency. Instead of having a \ufb01xed (data independent)\ndiscretization of the unit ball, Balcan et al. suggested to \ufb01rst identify (in a differentially private\nmanner) a small set Y \u2286 Rd of candidate centers such that Y contains a subset of k candidate centers\nwith low k-means cost. Then, apply the techniques of Gupta et al. in order to choose k centers from\nY . If |Y | = poly(n), then the resulting algorithm would be ef\ufb01cient. As the algorithm of Gupta et al.\nhas very good approximation guarantees, the bottleneck for the approximation error in the algorithm\nof Balcan et al. is in the construction of Y . Namely, the overall error is dominated by the error of the\nbest choice of k centers out of Y (compared to the cost of the best choice of k centers from Rd). At\n\ufb01rst glance, this might seem easy to achieve, since for non-private k-means, one can simply take the\ninput points themselves as the set of candidate centers (this is of size n and has an error of at most 2\ncompared to centers from Rd). However, for private k-means clustering, this is not possible \u2013 the\ncenters cannot be a subset of the input points, because otherwise, removing a point may signi\ufb01cantly\nchange the computed centers.\nBalcan et al. then constructed a differentially private algorithm for identifying a set of candidate\ncenters Y based on the Johnson\u2013Lindenstrauss transform [23]. However, their construction gives a\nset of candidate centers such that the best choice of k centers from these candidates is only guaranteed\nto have a multiplicative error of O(log3 n), leading to a private k-means algorithm with O(log3 n)\nmultiplicative error.\nA different approach to obtain a good k-means clustering privately is via algorithms for the 1-cluster\nproblem, where given a set of n input points in Rd and a parameter t \u2264 n, the goal is to identify a ball\nof the smallest radius that encloses at least t of the input points. It was shown by Feldman et al. [17]\nthat the Euclidean k-means problem can be reduced to the 1-cluster problem, by iterating the 1-cluster\nalgorithm multiple times to \ufb01nd several balls that cover most of the data points. Feldman et al. then\napplied their reduction to the private 1-cluster algorithm of [30], and obtained a private k-means\nalgorithm with multiplicative error O(k log n). Following that work, Nissim and Stemmer [29]\npresented an improved algorithm for the 1-cluster problem which, when combined with the reduction\nof Feldman et al., gives a private k-means algorithm with multiplicative error O(k).\n\n(cid:16)\n\n({yj}) = O\n\nj\n\ncostS\u2217\n\nj\n\n1, . . . , u\u2217\n\n(cid:17)\nj})\n({u\u2217\n\nj , i.e., S\u2217\n\nj = {x \u2208 S : j = argmin(cid:96)(cid:107)x \u2212 u\u2217\n\n(cid:96)(cid:107)}.\n\nj , in the sense that the cost of yj w.r.t. S\u2217\n\nj \u2286 S to denote the cluster induced by u\u2217\n\nk \u2208 Rd denote an optimal set of centers for S.\n\n1.4 Our techniques\nLet S \u2208 (Rd)n be an input database and let u\u2217\nWe use S\u2217\nWe observe that the techniques that Nissim and Stemmer [29] applied to the 1-cluster problem can be\nextended to privately identify a set of candidate centers Y that \u201ccaptures\u201d every \u201cbig enough\u201d cluster\nj. Informally, let j be such that |S\u2217\nj | \u2265 na (for some constant a > 0). We will construct a set of\ncandidate centers Y such that there is a candidate center yj \u2208 Y that is \u201cclose enough\u201d to the optimal\ncenter u\u2217\nj is at most a constant times bigger than the cost of\nu\u2217\nj . That is, costS\u2217\n. By simply ignoring clusters of smaller sizes, this\nmeans that Y contains a subset D of k candidate centers such that costS(D) \u2264 O(1)\u00b7 OPTS +k\u00b7 na.\nThere are two reasons for the poly(n) additive error incurred here. First, this technique effectively\nignores every cluster of size less than na, and we pay na additive error for every such cluster. Second,\nthis technique only succeeds with polynomially small probability, and boosting the con\ufb01dence using\nrepetitions causes the privacy parameters to degrade.\nWe show that it is possible to boost the success probability of the above strategy without degrading the\nprivacy parameters. To that end, we apply the repetitions to disjoint samples of the input points, and\nshow that the sampling process will not incur a poly(n) error. In order to \u201ccapture\u201d smaller clusters,\nwe apply the above strategy repeatedly, where in every iteration we exclude from the computation the\nclosest input points to the set of centers that we have already identi\ufb01ed. We show that this technique\nallows to \u201ccapture\u201d much smaller clusters. By combining this with the techniques of Balcan et al.\nand Gupta et al. for privately choosing k centers out of Y , we get our new construction for k-means\nin the centralized model of differential privacy (see Table 1).\n\n4\n\n\fA construction for the local model. Recall that the algorithm of Gupta et al. (the private variant\nof the local search) applies the exponential mechanism of McSherry and Talwar [26] in order to\nprivately choose a replacement center in every step. This use of the exponential mechanism is tailored\nto the centralized model, and it is not clear if the algorithm of Gupta et al. can be implemented in the\nlocal model. In addition, since the local search algorithm is iterative with a relatively large number of\niterations (roughly k log n iterations), a local implementation of it, if exists, may have a large number\nof rounds of interactions between the users and the untrusted server.\nTo overcome these challenges, in our locally private algorithm for k-means we \ufb01rst identify a set of\ncandidate centers Y (in a similar way to the centralized construction). Afterwards, we estimate the\nweight of every candidate center, where the weight of a candidate center y is the number of input\npoints x \u2208 S s.t. y is the nearest candidate center to x. We show that the weighted set of candidate\ncenters can be post-processed to obtain an approximation to the k-means of the input points. In order\nto estimate the weights we de\ufb01ne a natural extension of the well-studied heavy-hitters problem under\nLDP, which reduces our incurred error.\n\nPrivate coresets. A coreset [1] of set of input points S is a small (weighted) set of points P that\ncaptures some geometric properties of S. Coresets can be used to speed up computations, since if the\ncoreset P is much smaller than S, then optimization problems can be solved much faster by running\nalgorithms on P instead of S. In the context of k-means, the geometric property that we want P\nto preserve is the k-means cost of every possible set of centers. That is, for every set of k centers\nD \u2286 Rd we want that costP (D) \u2248 costS(D) (where in costP (D) we multiply each distance by the\nweight of the corresponding point). Coresets for k-means and k-medians have been the subject of\nmany recent papers, such as [10, 16, 19, 20, 7, 11, 15]. Private coresets for k-means and k-medians\nhave been considered in [14] and in [17]. We show that our techniques result in new constructions\nfor private coresets for k-means and k-medians, both for the centralized and for the local model of\ndifferential privacy. In the local model, this results in the \ufb01rst private coreset scheme with provable\nutility guarantees. In the centralized model, our new construction achieves signi\ufb01cantly improved\nerror rates over the previous state-of-the-art. We omit our results for private coresets due to space\nrestrictions. See the full version of this work for more details.\n\n2 Preliminaries from [18, 6]\n\nAs we described in the introduction, we use a private variant of the local search algorithm by\nGupta et al. and Balcan et al. We now state its guarantees. Let Y \u2286 Rd be our precomputed\nset of candidate centers. Given a set of points S \u2208 (Rd)n consider the task of identifying a\nsubset C \u2286 Y of size k with the lowest possible cost. That is, instead of searching for k centers\nin Rd, we are searching for k centers in Y , and our runtime is allowed to depend polynomially\non |Y |. We write OPTS(Y ) to denote the lowest possible cost of k centers from Y . That is,\nOPTS(Y ) = minC\u2286Y, |C|=k{costS(C)}. Recall that we denote the lowest cost of k centers out of\nRd as OPTS, i.e., OPTS = OPTS(Rd).\nTheorem 2.1 ([18, 6]). Let \u03b2, \u03b5, \u03b4 > 0 and k \u2208 N, and let Y \u2286 Rd be a \ufb01nite set of centers. There\nexists an (\u03b5, \u03b4)-differentially private algorithm that takes a database S containing n points from the\nd-dimensional ball B(0, \u039b), and outputs a subset D \u2286 Y of size |D| = k s.t. with probability at least\n(1 \u2212 \u03b2) we have that\n\ncostS(D) \u2264 O(1) \u00b7 OPTS(Y ) + O\n\nk1.5\u039b2\n\n\u03b5\n\nlog\n\n\u03b2\n\nlog(n) \u00b7 log\n\n(cid:32)\n\n(cid:18) n|Y |\n\n(cid:19)(cid:115)\n\n(cid:19)(cid:33)\n\n.\n\n(cid:18) 1\n\n\u03b4\n\nIn light of Theorem 2.1, in order to privately identify an approximation to the k-means of the input\nset S, it suf\ufb01ces to privately identify a set of candidate centers Y \u2286 Rd such that |Y | = poly(n), and\nin addition, Y contains a subset with low k-means cost (that is OPTS(Y ) is comparable to OPTS).\nWe remark that Y must be computed using a differentially private algorithm, and that in particular,\ntaking Y = S will not lead to a differentially private algorithm (even though Y = S is an excellent\nset of candidate centers in terms of utility). To see this, let us denote the algorithm from Theorem 2.1\nas A. Its inputs are the database S and the set of candidate centers Y , and the differential privacy\nguarantee is only with respect to the database S. In other words, for every \ufb01xed set Y , the algorithm\n\n5\n\n\fAY (S) = A(S, Y ) is differentially private as a function of S. Known composition theorems for\ndifferential privacy [13] show that for every differentially private algorithm B that takes a database S\nand outputs a set of centers Y , we have that the composition A(S,B(S)) satis\ufb01es differential privacy.\nOn the other hand, there is no guarantee that A(S, S) is differentially private, and in general it is not.\n\n3 Private k-means \u2013 the centralized setting\n\n1, . . . , u\u2217\n\nIn this section we present some of the components of our algorithm for approximating the k-means\nin the centralized model of differential privacy. All of the missing details appear in the full version of\nthis work, as well as our algorithm for the local model, and our construction of a private coreset.\nk \u2208 Rd denote an optimal set of k centers for S. Our\nConsider an input database S, and let u\u2217\nstarting point is the observation that, extending the techniques of Nissim and Stemmer [29], we can\nidentify a set of candidate centers that contains a \u201cclose enough\u201d candidate center to every optimal\ncenter u\u2217\nj is \u201cbig enough\u201d. We call this algorithm\nPrivate-Centers and the following lemma speci\ufb01es its properties precisely.\nLemma 3.1 (Algorithm Private-Centers). There exists an (\u03b5, \u03b4)-differentially private algorithm\nsuch that the following holds. Assume we apply the algorithm to a database S containing n points in\nthe d-dimensional ball B(0, \u039b), with parameters \u03b2, \u03b5, \u03b4. Let P \u2286 S be a \ufb01xed subset (unknown to\nthe algorithm) s.t. for a global constant \u0393 we have\n\nj , provided that the optimal cluster induced by u\u2217\n\n\u221a\n\nd \u00b7 n0.1 \u00b7 ln\n\n|P| \u2265 \u0393\n\u03b5\n\n\u00b7\n\n(cid:19)(cid:115)\n\n(cid:18) 1\n\n\u03b2\n\nln\n\n(cid:18) 1\n\n(cid:19)\n\n.\n\n\u03b4\n\nThe algorithm outputs a set of at most \u03b5n centers, s.t. with probability at least 1 \u2212 \u03b2 a ball of radius\nO(diam(P ) + \u039b\n\nn ) around one of these centers contains all of P .\n\nThe idea behind Algorithm Private-Centers is to use locality sensitive hashing [22] in order to\nisolate clustered points, and then to average clustered points with differential privacy. Algorithm\nPrivate-Centers captures all large clusters whereas the algorithm of [29] only captures one large\ncluster. We omit the proof of Lemma 3.1 due to space restrictions. In the next section we use this\nlemma iteratively in order to capture much smaller clusters.\n\n3.1 Capturing smaller and smaller clusters\n\n1, . . . , u\u2217\n\n1 , . . . , S\u2217\n\nWe are now ready to present the main component of our construction for the centralized model \u2013\nAlgorithm Private-k-Means. The algorithm privately identi\ufb01es set of polynomially many candidate\ncenters that contains a subset of k candidate centers with low k-means cost. For readability, we have\nadded inline comments throughout the description of Private-k-Means, which will be helpful for\nthe analysis. These comments are not part of the algorithm. Recall that u\u2217\nk denote an optimal\nk \u2286 S denote the clusters induced by\nset of centers w.r.t. the set of input points S, and let S\u2217\nthese optimal centers. (These optimal centers and clusters are unknown to the algorithm; they are\nonly used in the inline comments and in the analysis.)\nThroughout the execution, we use the inline comments in order to prescribe a feasible (but not\nnecessarily optimal) assignment of the data points to (a subset of k of) the current candidate centers.\nSpeci\ufb01cally, we maintain an array ASSIGN, where we write ASSIGN[j] = u (for some center u in our\ncurrent set of candidate centers) to denote that all of the points in the optimal cluster S\u2217\nj are assigned\nto the candidate center u. We write ASSIGN[j] = \u22a5 to denote that points in S\u2217\nj have not been assigned\nto a center yet. For every j we have that ASSIGN[j] = \u22a5 at the beginning of the execution, and that\nASSIGN[j] is changed exactly once during the execution, at which point the jth cluster is assigned to\na center. In the analysis we argue that at the end of the execution the resulting assignment has low\nk-means cost.\nNotation. For a point x \u2208 S, we write ASSIGN(x) to denote the candidate center to which x is\nassigned at a given moment of the execution. That is, ASSIGN(x) = ASSIGN[j], where j is s.t. x \u2208 S\u2217\nj .\n\nConsider the execution of the Algorithm Private-k-Means. For readability, we have summarized\nsome of the notations that are speci\ufb01ed in the algorithm in Table 2. We \ufb01rst show that the number of\nunassigned points reduces quickly in every iteration.\n\n6\n\n\fAlgorithm Private-k-Means\nInput: Database S containing n points in the d-dimensional ball B(0, \u039b), failure probability \u03b2,\nprivacy parameters \u03b5, \u03b4.\n% Let u\u2217\n\nj be the cluster induced by u\u2217\n\n1, . . . , u\u2217\n\nk denote an optimal set of centers for S, and let S\u2217\nj =\n\n(cid:96)(cid:107)}. For j \u2208 [k] let r\u2217\n\nj , i.e.,\nj(cid:107)2, and let\n\n(cid:113) 2|S\u2217\n\n(cid:107)x \u2212 u\u2217\n\n(cid:80)\n\nj |\n\nx\u2208S\u2217\n\nj\n\nj = {x \u2208 S : j = argmin(cid:96)(cid:107)x \u2212 u\u2217\nS\u2217\nj = B(u\u2217\nP \u2217\n\nj ) \u2229 S\u2217\nj .\n\nj , r\u2217\n\n1. Initiate C = \u2205, and denote S1 = S and n1 = n.\n% Initiate ASSIGN[j] = \u22a5 for every j \u2208 [k].\n2. For i = 1 to log log n do\n\n(a) Run\n\u03b5\n\nalgorithm Private-Centers\n\non\n\nthe\n\ndatabase Si with\n\nparameters\n\nlog log n ,\n\n\u03b4\n\nlog log n , \u03b2\n\nk , and add the returned set of centers to C.\n\nASSIGN[j] = uj.\n\n% For every j \u2208 [k]: if ASSIGN[j] = \u22a5 and if \u2203uj \u2208 C s.t. (cid:107)uj \u2212 u\u2217\nj(cid:107) \u2264 O(r\u2217\n(b) Let Si+1 \u2286 Si be a subset of Si containing ni+1 = 2(T + 1)wk \u00b7 n0.1\n\npoints with the\nlargest distance to the centers in C, where w = w(n, d, k, \u03b2, \u03b5, \u03b4) and T = T (n) will be\nspeci\ufb01ed in the analysis.\nj \\ Si+1, let uj =\nargminu\u2208C(cid:107)pj \u2212 u(cid:107), and set ASSIGN[j] = uj.\n\n% For every j \u2208 [k]: if ASSIGN[j] = \u22a5 and if P \u2217\n\n(cid:54)\u2286 Si+1, then let pj \u2208 P \u2217\n\nn ), then set\n\nj + \u039b\n\ni\n\nj\n\n3. Output C.\n% For every j \u2208 [k]: if ASSIGN[j] = \u22a5, then arbitrarily choose uj \u2208 C and set ASSIGN[j] = uj.\n\n(cid:113) 2|S\u2217\n\nk \u2208 Rd\nk \u2286 S\nk \u2208 R\u22650\n\nS\nu\u2217\n1, . . . , u\u2217\nS\u2217\n1 , . . . , S\u2217\n1, . . . , r\u2217\nr\u2217\nP \u2217\n1 , . . . , P \u2217\nSi \u2286 S, i \u2208 [log log n]\nni = |Si|, i \u2208 [log log n] The number of remaining input points during the ith iteration.\nC\nASSIGN[j], j \u2208 [k]\n\nThe input database.\n(cid:80)\nAn optimal set of centers for S.\n1, . . . , u\u2217\nThe clusters induced by u\u2217\nk.\n(cid:107)x \u2212 u\u2217\nj(cid:107)2.\nr\u2217\nj =\nx\u2208S\u2217\nj |\nj ) \u2229 S\u2217\nj = B(u\u2217\nP \u2217\nj , r\u2217\nj .\nThe set of remaining input points during the ith iteration.\n\nThe current set of candidate centers.\nThe assignment constructed in the inline comments.\nTable 2: Notations for the analysis of algorithm Private-k-Means\n\nk\n\nj\n\n(cid:17)(cid:114)\n\n(cid:16) k\n\n(cid:16) log log n\n\n(cid:17)\n\ni\n\ni\n\n\u03b4\n\n\u03b5\n\nd\n\n\u03b2\n\nlog\n\n\u00b7 log log(n) \u00b7 log\n\nj with smallest distances to u\u2217\n\nj ). If during some iteration i we have that all of P \u2217\n\nj \u2286 S be an optimal cluster, and let P \u2217\nj as the subset of the |S\u2217\n\nClaim 3.2. Denote w = \u0393\u00b7\u221a\n, where \u0393 is the constant from\nLemma 3.1. With probability at least 1\u2212 \u03b2, for every i \u2208 [log log n], before Step 2b of the ith iteration\nthere are at most 2kw \u00b7 n0.1\nunassigned points in S, i.e., |{x \u2208 S : ASSIGN(x) = \u22a5}| \u2264 2kw \u00b7 n0.1\n.\nj \u2286 S\u2217\nThe intuition behind Claim 3.2 is as follows. Let S\u2217\nj be\nj |/2 points\nde\ufb01ned as in the \ufb01rst comment in the algorithm (we can think of P \u2217\nin S\u2217\nj is contained in\nour current set of input points, Si, and if |P \u2217\nj is discovered in the ith\niteration by the properties of Private-Centers. Moreover, by construction, if even a single point\nfrom P \u2217\nj must have already been assigned to a center before the ith iteration. See\nthe full version of this work for more details.\nNotation. For i \u2208 [log log n] we denote by Ai \u2286 S and Bi \u2286 S the subset of input points whose\ncluster is assigned to a center during the ith iteration in the comments after Step 2a and after Step 2b,\nrespectively. Observe that A1, B1, . . . , Alog log n, Blog log n are mutually disjoint.\nLet r\u2217\nk as de\ufb01ned in the \ufb01rst comment in algorithm\nPrivate-k-Means. For a point x \u2208 Rd, let u\u2217(x) denote x\u2019s nearest optimal center, and r\u2217(x) its\ncorresponding radius. The next observation is immediate from the construction.\n\nk be the radii of the centers u\u2217\n\nj is missing, then S\u2217\n\n, then a center for S\u2217\n\nj | \u2265 w \u00b7 n0.1\n\n1, . . . , u\u2217\n\n1, . . . , r\u2217\n\ni\n\n7\n\n\fObservation 3.3. For every i \u2208 [log log n] and for every x \u2208 Ai, at the end of the execution we have\n\n(cid:18)\n\n(cid:19)\n\n.\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2 \u2264 O\n\n(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 +\n\n\u039b2\nn2\n\nWe charge the cost of points x \u2208 Bi to points that were already assigned to centers in some iteration\nj \u2264 i in the sense speci\ufb01ed by the following lemma.\nLemma 3.4. With probability at least 1 \u2212 \u03b2, for every iteration i \u2208 [log log n] and for every x \u2208 Bi\nthere exists a set of input points Q(x) \u2286 S such that\n1. For every i \u2208 [log log n] and for every x \u2208 Bi it holds that |Q(x)| = T , where T = O(log log n).\n2. For every i \u2208 [log log n] and for every x, y \u2208 Bi, if x (cid:54)= y then Q(x) \u2229 Q(y) = \u2205.\n3. For every i \u2208 [log log n] and for every x \u2208 Bi, at the end of the execution it holds that\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2 \u2264 O\n\n(cid:107)q \u2212 ASSIGN(q)(cid:107)2\n\n\uf8eb\uf8ed(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 +\n\n(cid:88)\n\n1\nT\n\nq\u2208Q(x)\n\n\uf8f6\uf8f8 .\n\nIntuitively, Lemma 3.4 follows from the fact in every iteration i, for every unassigned point in S\nthere are at least T assigned points in Si. We omit the proof due to space restrictions.\nLemma 3.5. If Algorithm Private-k-Means is applied to a database S containing n points in the\nd-dimensional ball B(0, \u039b), then it outputs a set C of at most \u03b5n log( k\n\u03b2 ) centers, s.t. with probability\nat least 1 \u2212 \u03b2\n\n{costS(D)} \u2264 O(1) \u00b7 OPTS +O(cid:0)(T wk)1.12(cid:1) \u00b7 \u039b2,\n\nOPTS(C) = min\nD\u2286C\n|D|=k\n\nwhere w is de\ufb01ned in Claim 3.2, and T = \u0398(log log n). The exponent 1.12 is arbitrary and can be\nreduced to any constant a > 1.\n\nProof. We show that the stated bound holds for the assignment described in the inline comments\nthroughout the algorithm (the array ASSIGN) at the end of the execution. First observe that by\nClaim 3.2 and by the fact that there are log log n iterations, at the end of the execution there could\n\nbe at most O(cid:0)(2(T + 1)wk)1.12(cid:1) unassigned input points. Let us denote the set of unassigned\n\uf8f6\uf8f8 .\n\npoints as H. The distance from each unassigned point to an arbitrary center is trivially at most \u039b.\nFor every assigned point x, by Observation 3.3 and by Lemma 3.4, either (cid:107)x \u2212 ASSIGN(x)(cid:107)2 =\nO((cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 + \u039b2\n\nn2 ), or\n\n\uf8eb\uf8ed(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 +\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2 \u2264 O\n\n(cid:107)q \u2212 ASSIGN(q)(cid:107)2\n\n(cid:88)\n\n1\nT\n\nq\u2208Q(x)\n\ncostS ({ASSIGN[j] : j \u2208 [k]}) =\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2\n\nHence,\n\n(cid:88)\n\nx\u2208H\n\n=\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2 +\n\n(cid:88)\n(cid:88)\n\nx\u2208S\n\ni\u2208[log log n]\n\nx\u2208Ai\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2 +\n\n8\n\n(cid:88)\n\ni\u2208[log log n]\n\nx\u2208Bi\n\n(cid:107)x \u2212 ASSIGN(x)(cid:107)2\n\n\f(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 +\n\n(cid:19)\n\n\u039b2\nn2\n\n\uf8f6\uf8f8\n\n\u2264 O(cid:0)(2(T + 1)wk)1.12(cid:1) \u00b7 \u039b2 +\n\n(cid:88)\n\n(cid:18)\n\uf8eb\uf8ed(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2 +\n\ni\u2208[log log n]\n\nx\u2208Ai\n\nO\n\n(cid:88)\n\n+\n\nO\n\ni\u2208[log log n]\n\nx\u2208Bi\n\n\u2264 O(cid:0)(2(T + 1)wk)1.12(cid:1) \u00b7 \u039b2 +\n\n(cid:88)\nO(cid:0)(cid:107)q \u2212 ASSIGN(q)(cid:107)2(cid:1)\n\nx\u2208S\n\n(cid:88)\n\n+\n\n1\nT\n\ni\u2208[log log n]\n\nx\u2208Bi\nq\u2208Q(x)\n\n\u2264 O(cid:0)(2(T + 1)wk)1.12(cid:1) \u00b7 \u039b2 + O(1) \u00b7 OPTS +\n\n(cid:88)\n\n1\nT\n\n(cid:107)q \u2212 ASSIGN(q)(cid:107)2\n\nq\u2208Q(x)\n\nO(cid:0)(cid:107)x \u2212 u\u2217(x)(cid:107)2 + (r\u2217(x))2(cid:1)\n\nO(cid:0)(cid:107)q \u2212 ASSIGN(q)(cid:107)2(cid:1)\n\n(1)\n\n(cid:88)\n\ni\u2208[log log n]\n\nx\u2208Bi\nq\u2208Q(x)\n\n1\nT\n\nNow recall that for every i \u2208 [log log n] and for every x (cid:54)= y \u2208 Bi it holds that Q(x) \u2229 Q(y) = \u2205.\nHence, every point q \u2208 S contributes at most log log n times to the last summation above. So,\n\n(1) \u2264 O(cid:0)(2(T + 1)wk)1.12(cid:1) \u00b7 \u039b2 + O(1) \u00b7 OPTS +\n\n(cid:88)\n\nO(cid:0)(cid:107)q \u2212 ASSIGN(q)(cid:107)2(cid:1)\n\nlog log n\n\nT\n\nq\u2208S\n\nFor T = \u0398(log log n) (large enough) we get that the last term above is at most half of the left hand\nside of the inequality, and hence,\n\ncostS ({ASSIGN[j] : j \u2208 [k]}) \u2264 O(cid:0)(2(T + 1)wk)1.12(cid:1) \u00b7 \u039b2 + O(1) \u00b7 OPTS\n\nLemma 3.6. Algorithm Private-k-Means is (\u03b5, \u03b4)-differentially private.\n\nThe privacy analysis of Algorithm Private-k-Means is standard, and is omitted due to space\nrestrictions. Intuitively, in every iteration, Step 2a satis\ufb01es differential privacy by the properties of\nAlgorithm Private-Centers, and we use the following technique for arguing about Step 2b: Let X\nbe an ordered data domain and let A be a differentially private algorithm that operates on a multiset\nof m elements from X. Then for any n \u2265 m, the algorithm that takes a multiset S of n elements\nfrom X and runs A on the smallest (or largest) m elements in S is differentially private. The intuition\nis that changing at most one element in S can change at most one element of the multiset that we\ngive to A, and this change is \u201chidden\u201d by the privacy properties of A. See [9] for more details and\napplications of this technique.\nCombining Lemmas 3.5 and 3.6 with Theorem 2.1 yields the following theorem.\nTheorem 3.7. There is an (\u03b5, \u03b4)-differentially private algorithm that, given a database S containing\nn points in the d-dimensional ball B(0, \u039b), identi\ufb01es with probability 1 \u2212 \u03b2 a (\u03b3, \u03b7)-approximation\nfor the k-means of S, where \u03b3 = O(1) and \u03b7 = poly\n\n(cid:17) \u00b7 \u039b2.\n\nlog(n), log( 1\n\n(cid:16)\n\n\u03b2 ), log( 1\n\n\u03b4 ), d, 1\n\n\u03b5 , k\n\nAcknowledgments. We would like to thank Moni Naor for helpful discussions, and the anonymous\nreviewers for useful suggestions and corrections.\n\n9\n\n\fReferences\n[1] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points.\n\nJ. ACM, 51(4):606\u2013635, July 2004.\n\n[2] S. Ahmadian, A. Norouzi-Fard, O. Svensson, and J. Ward. Better guarantees for k-means and\neuclidean k-median by primal-dual algorithms. In 58th IEEE Annual Symposium on Foundations\nof Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 61\u201372,\n2017.\n\n[3] D. Arthur, B. Manthey, and H. R\u00f6glin. k-means has polynomial smoothed complexity. In\nProceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science,\nFOCS \u201909, pages 405\u2013414, Washington, DC, USA, 2009. IEEE Computer Society.\n\n[4] D. Arthur and S. Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings\nof the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201907, pages\n1027\u20131035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.\n\n[5] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search\nheuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544\u2013562, 2004.\n\n[6] M.-F. Balcan, T. Dick, Y. Liang, W. Mou, and H. Zhang. Differentially private clustering in\nhigh-dimensional Euclidean spaces. In Proceedings of the 34th International Conference on\nMachine Learning, volume 70 of Proceedings of Machine Learning Research, pages 322\u2013331,\nInternational Convention Centre, Sydney, Australia, 06\u201311 Aug 2017. PMLR.\n\n[7] A. Barger and D. Feldman. k-means for streaming and distributed big sparse data. In Proceedings\nof the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, May 5-7,\n2016, pages 342\u2013350, 2016.\n\n[8] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In\n\nC. Li, editor, PODS, pages 128\u2013138. ACM, 2005.\n\n[9] M. Bun, K. Nissim, U. Stemmer, and S. P. Vadhan. Differentially private release and learning of\n\nthreshold functions. In FOCS, pages 634\u2013649, 2015.\n\n[10] K. Chen. On k-median clustering in high dimensions. In Proceedings of the Seventeenth Annual\nACM-SIAM Symposium on Discrete Algorithm, SODA \u201906, pages 1177\u20131185, Philadelphia, PA,\nUSA, 2006. Society for Industrial and Applied Mathematics.\n\n[11] E. Cohen, S. Chechik, and H. Kaplan. Clustering small samples with quality guarantees:\nAdaptivity with one2all PPS. In Proceedings of the Thirty-Second AAAI Conference on Arti\ufb01cial\nIntelligence, 2018.\n\n[12] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private\ndata analysis. In TCC, volume 3876 of Lecture Notes in Computer Science, pages 265\u2013284.\nSpringer, 2006.\n\n[13] C. Dwork, G. N. Rothblum, and S. P. Vadhan. Boosting and differential privacy. In FOCS,\n\npages 51\u201360. IEEE Computer Society, 2010.\n\n[14] D. Feldman, A. Fiat, H. Kaplan, and K. Nissim. Private coresets. In Proceedings of the 41st\nAnnual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 -\nJune 2, 2009, pages 361\u2013370, 2009.\n\n[15] D. Feldman and M. Langberg. A uni\ufb01ed framework for approximating and clustering data. In\nProceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA,\nUSA, 6-8 June 2011, pages 569\u2013578, 2011.\n\n[16] D. Feldman, M. Monemizadeh, and C. Sohler. A ptas for k-means clustering based on weak\ncoresets. In Proceedings of the Twenty-third Annual Symposium on Computational Geometry,\nSCG \u201907, pages 11\u201318, New York, NY, USA, 2007. ACM.\n\n10\n\n\f[17] D. Feldman, C. Xiang, R. Zhu, and D. Rus. Coresets for differentially private k-means clustering\nand applications to privacy in mobile sensor networks. In Proceedings of the 16th ACM/IEEE\nInternational Conference on Information Processing in Sensor Networks, IPSN \u201917, pages 3\u201315,\nNew York, NY, USA, 2017. ACM.\n\n[18] A. Gupta, K. Ligett, F. McSherry, A. Roth, and K. Talwar. Differentially private combinatorial\noptimization. In Proceedings of the Twenty-\ufb01rst Annual ACM-SIAM Symposium on Discrete\nAlgorithms, SODA \u201910, pages 1106\u20131125, Philadelphia, PA, USA, 2010. Society for Industrial\nand Applied Mathematics.\n\n[19] S. Har-Peled and A. Kushal. Smaller coresets for k-median and k-means clustering. Discrete &\n\nComputational Geometry, 37(1):3\u201319, Jan 2007.\n\n[20] S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In Proceed-\nings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC \u201904, pages\n291\u2013300, New York, NY, USA, 2004. ACM.\n\n[21] Z. Huang and J. Liu. Optimal differentially private algorithms for k-means clustering. In\nProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database\nSystems, Houston, TX, USA, June 10-15, 2018, pages 395\u2013408, 2018.\n\n[22] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse\nIn Proceedings of the Thirtieth Annual ACM Symposium on Theory of\n\nof dimensionality.\nComputing, STOC \u201998, pages 604\u2013613, New York, NY, USA, 1998. ACM.\n\n[23] W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. 1984.\n[24] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. A local\nsearch approximation algorithm for k-means clustering. Computational Geometry, 28(2):89\n\u2013 112, 2004. Special Issue on the 18th Annual Symposium on Computational Geometry -\nSoCG2002.\n\n[25] S. P. Lloyd. Least squares quantization in pcm. IEEE Trans. Information Theory, 28:129\u2013136,\n\n1982.\n\n[26] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94\u2013103.\n\nIEEE, Oct 20\u201323 2007.\n\n[27] P. Mohan, A. Thakurta, E. Shi, D. Song, and D. Culler. Gupt: Privacy preserving data analysis\nmade easy. In Proceedings of the 2012 ACM SIGMOD International Conference on Management\nof Data, SIGMOD \u201912, pages 349\u2013360, New York, NY, USA, 2012. ACM.\n\n[28] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data\n\nanalysis. In STOC, pages 75\u201384. ACM, 2007.\n\n[29] K. Nissim and U. Stemmer. Clustering algorithms for the centralized and local models. In\nF. Janoos, M. Mohri, and K. Sridharan, editors, Proceedings of Algorithmic Learning Theory,\nvolume 83 of Proceedings of Machine Learning Research, pages 619\u2013653. PMLR, 07\u201309 Apr\n2018.\n\n[30] K. Nissim, U. Stemmer, and S. P. Vadhan. Locating a small cluster privately. In Proceedings of\nthe 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS\n2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 413\u2013427, 2016.\n\n[31] R. Nock, R. Canyasse, R. Boreli, and F. Nielsen. k-variates++: more pluses in the k-means++.\nIn Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New\nYork City, NY, USA, June 19-24, 2016, pages 145\u2013154, 2016.\n\n[32] D. Su, J. Cao, N. Li, E. Bertino, and H. Jin. Differentially private k-means clustering. In\nProceedings of the Sixth ACM Conference on Data and Application Security and Privacy,\nCODASPY \u201916, pages 26\u201337, New York, NY, USA, 2016. ACM.\n\n[33] Y. Wang, Y.-X. Wang, and A. Singh. Differentially private subspace clustering. In Proceedings\nof the 28th International Conference on Neural Information Processing Systems - Volume 1,\nNIPS\u201915, pages 1000\u20131008, Cambridge, MA, USA, 2015. MIT Press.\n\n11\n\n\f", "award": [], "sourceid": 2600, "authors": [{"given_name": "Uri", "family_name": "Stemmer", "institution": "Ben-Gurion University"}, {"given_name": "Haim", "family_name": "Kaplan", "institution": null}]}