{"title": "Differentially Private Robust Low-Rank Approximation", "book": "Advances in Neural Information Processing Systems", "page_first": 4137, "page_last": 4145, "abstract": "In this paper, we study the following robust low-rank matrix approximation problem: given a matrix $A \\in \\R^{n \\times d}$, find a rank-$k$ matrix $B$, while satisfying differential privacy, such that \n$ \\norm{  A - B }_p \\leq \\alpha \\mathsf{OPT}_k(A) + \\tau,$ where \n$\\norm{  M }_p$ is the entry-wise $\\ell_p$-norm \nand $\\mathsf{OPT}_k(A):=\\min_{\\mathsf{rank}(X) \\leq k} \\norm{  A - X}_p$. \nIt is well known that low-rank approximation w.r.t. entrywise $\\ell_p$-norm, for $p \\in [1,2)$, yields robustness to gross outliers in the data.  We propose an algorithm that guarantees $\\alpha=\\widetilde{O}(k^2), \\tau=\\widetilde{O}(k^2(n+kd)/\\varepsilon)$, runs in $\\widetilde O((n+d)\\poly~k)$ time and uses $O(k(n+d)\\log k)$ space. We study extensions to the streaming setting where entries of the matrix arrive in an arbitrary order and output is produced at the very end or continually. We also study the related problem of differentially private robust principal component analysis (PCA), wherein we return a rank-$k$ projection matrix $\\Pi$ such that $\\norm{  A - A \\Pi }_p \\leq \\alpha \\mathsf{OPT}_k(A) + \\tau.$", "full_text": "Differentially Private Robust Low-Rank\n\nApproximation\n\nRaman Arora\n\nJohns Hopkins University\n\nBaltimore, MD-21201\narora@cs.jhu.edu\n\nVladimir Braverman\nJohns Hopkins University\n\nBaltimore, MD-21201\nvova@cs.jhu.edu\n\nAbstract\n\nJalaj Upadhyay\n\nJohns Hopkins University\n\nBaltimore, MD-21201\n\njalaj@jhu.edu\n\nIn this paper, we study the following robust low-rank matrix approximation prob-\nlem: given a matrix A 2 Rn\u21e5d, \ufb01nd a rank-k matrix M, while satisfying dif-\nferential privacy, such that kA  Mkp \uf8ff \u21b5 \u00b7 OPTk(A) + \u2327, where kBkp is the\nentry-wise `p-norm of B and OPTk(A) := minrank(X)\uf8ffk kA  Xkp. It is well\nknown that low-rank approximation w.r.t. entrywise `p-norm, for p 2 [1, 2), yields\nrobustness to gross outliers in the data. We propose an algorithm that guarantees\n\u21b5 = eO(k2),\u2327 = eO(k2(n + kd)/\"), runs in eO((n + d)poly k) time and uses\nO(k(n + d) log k) space. We study extensions to the streaming setting where\nentries of the matrix arrive in an arbitrary order and output is produced at the very\nend or continually. We also study the related problem of differentially private\nrobust principal component analysis (PCA), wherein we return a rank-k projection\nmatrix \u21e7 such that kA  A\u21e7kp \uf8ff \u21b5 \u00b7 OPTk(A) + \u2327.\n\n1\n\nIntroduction\n\nLow rank matrix approximation is a well studied problem, where given a data matrix A, the goal is\nto \ufb01nd a low-rank matrix B that approximates A in the sense that \u00b5(A  B) is small under some\nfunction \u00b5(\u00b7). It \ufb01nds application in numerous machine learning tasks, such as recommendation\nsystems [10], clustering [9, 25], and learning distributions [2].\nOften, the real-world data used in these applications is plagued with gross outliers, and it is desirable\nto impart robustness to low-rank approximation algorithms against such corruptions. Furthermore,\nthese applications increasingly rely on sensitive data which raises the need for preserving privacy of\nthe underlying data. The focus of this paper, therefore, is to compute a low-rank approximation of a\ngiven matrix under a strong privacy guarantee while being robust to outliers in data.\nFor robustness to outliers, we choose the measure \u00b5(\u00b7) to be the entrywise `p-norm for p 2 [1, 2),\nde\ufb01ned as kAkp = (Pi,j |Ai,j|p)1/p. It is well known that low-rank approximation w.r.t. entrywise\n`p-norm, for p 2 [1, 2), yields robustness to gross outliers in the data [5, 7, 22, 23, 24, 29]. To address\nthe need for privacy, we rely on the notion of differential privacy [11] that has become the de facto\nstandard for private data analysis in recent years. Formally, we de\ufb01ne differential privacy as follows.\nDe\ufb01nition 1. A randomized algorithm M is said to be (\", )-differentially private if for all neigh-\nboring datasets, A and A0, and all subsets S \u2713 range(M) in the range of M, we have that\nPr[M(A) 2 S] \uf8ff e\"Pr[M(A0) 2 S] + .\nThe notion of what makes two datasets neighboring determines the granularity of differential pri-\nvacy [13]. At the \ufb01nest scale, we consider two matrices as neighboring if they differ in at most\none entry by a unit value [17, 19, 20]; this corresponds to feature-level privacy. At the coarsest\ngranularity, two matrices are deemed neighboring if they differ in one row by a unit norm [18, 14]; this\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fcorresponds to the user-level privacy. Note that since we do not make any boundedness assumption\non the entries of the data-matrix, we need to establish a normalized scale to limit the in\ufb02uence of a\nsingle entry or a single row of a given matrix. In this paper, we say that two matrices A and A0 are\nneighboring if the matrices are within a unit (entrywise) `1 ball of each other, i.e., kA  A0k1 \uf8ff 1.\nThis notion of neighboring datasets provides stronger guarantees than the feature-level privacy.\nWe are interested in private robust data analysis, speci\ufb01cally, robust low-rank approximation of a\nmatrix with respect to entrywise `p-norm for p 2 [1, 2), under the constraints of differential privacy.\nEven without privacy, low-rank matrix approximation with respect to entrywise `p-norm for p 6= 2 is\na non-trivial problem: it does not have a closed form solution and computing the optimal low-rank\napproximation with respect to `1-norm is known to be NP-hard [16]. A natural question then is\nwhether we can compute a good enough approximation to the best rank-k approximation. This\nquestion has formed the basis for many recent results [5, 7, 22, 23, 24, 29]. However, prior to this\nwork, differentially private low-rank approximation with respect to entrywise `p-norm has been an\nopen problem. We give the \ufb01rst time- and space-ef\ufb01cient differentially private algorithm for low-rank\nmatrix approximation with respect to entrywise `p-norm.\n\n1.1 Formal Problem Statement and Contributions\nIn this section, we formally de\ufb01ne the problem of differentially private robust low-rank matrix\napproximation, and state our main results. For the ease of presentation, we assume that  =\n\n\u21e5(n log n). We use the notation eO(\u00b7) to hide poly log factors.\nDe\ufb01nition 2 (Robust low-rank approximation). Given a matrix A 2 Rn\u21e5d, and p 2 [1, 2), output a\nrank-k matrix M such that with probability at least 1  ,\n(1)\n\nkA  Mkp \uf8ff \u21b5OPTk(A) + \u2327, where OPTk(A) := min\n\nrank(X)\uf8ffk kA  Xkp .\n\nOur \ufb01rst contribution is Algorithm 1, ROBUST-LRA, which given an input matrix A 2 Rn\u21e5d returns\na differentially private rank-k approximation to A with a multiplicative approximation factor of \u21b5 =\n\nO((k log k)2(2p)/p log d log n) and an additive approximation error of \u2327 = eO\"1k2(n + kd). In\nparticular, for p = 1, we have \u21b5 = O(k2 log2 k log d log n) and \u2327 = eO\"1k2(n + kd). We note\n\nthat the best known algorithm in a non-private setting [29] achieves the same multiplicative factor,\nalbeit with no additive error. Therefore, the price we pay for privacy is in terms of an additional\nadditive error.\nIn many machine learning problems, e.g. feature selection and representation learning, all we are\ninterested in is recovering the low-dimensional subspace spanned by the data. One such example\nis principal component analysis using data with gross outliers or corruptions (e.g. face recognition\nin the presence of occlusions). Of course, the proposed Algorithm 1 can also output the projection\nmatrix associated with the right singular vectors of matrix M with the same accuracy guarantee as for\nrobust low-rank approximation (see Remark 1 for more details). However, the additive error we incur\nstill scales with n whereas intuitively making the basis for a k-dimensional subspace in Rd should\nrequire only adding noise proportional to k \u2327 d. This motivates a slightly different treatment for the\nrobust principal component analysis problem, which can be formulated as follows.\nDe\ufb01nition 3 (Robust principal component analysis). Given a matrix A 2 Rn\u21e5d, output a rank-k\northonormal projection matrix \u21e7 such that with probability at least 1  ,\n(2)\nkA  A\u21e7kp \uf8ff \u21b5OPTk(A) + \u2327, where OPTk(A) := min\n\nThe second main contribution of this paper is an algorithm that returns a differentially private rank-k\n\nrank(X)\uf8ffk kA  Xkp .\northonormal projection matrix with \u21b5 = O((kd log k)(2p)/p log3 d log n) \u2327 = eOk2d/\u270f.\n\nMany variants of differentially private low-rank approximation have been studied in the literature [14,\n18, 19, 17, 20, 21, 31, 32] for both the Frobenius norm and spectral norm. We give the \ufb01rst (\", )-\ndifferentially private algorithm for robust PCA. Unlike PCA under Frobenius and spectral norm,\ncomputing an exact robust PCA is a computationally hard problem (NP-hard when p = 1).\nBesides the objective function, our work differs from existing work also in terms of the privacy\ngranularity and ef\ufb01ciency. A detailed comparison and review of previous works is presented in\nTable 1.\n\n2\n\n\fTable 1: Comparison of Models for Differentially Private k-Rank Approximation (u and v are unit\nvectors, es is the s-th standard basis, \u2318 is an arbitrary constant, !k := k(A)  k+1(A) is the\nsingular value separation, \u00b5 is coherence of the matrix A, and p 2 [1, 2)).\nMetric\nAccuracy (\u21b5, \u2327)\n`p-norm\n\nTheorem 10\n\n\"\n\n\"\n\npkn\n\n\u21e3eO(k2p(2p)/2 log k log d),eO\u21e3 k2(n+kd)\n\u2318\u2318\n\u23181/2 d\n\" + k\u21e3 \u00b5kAkF\n(p2,eO(\nn1/4))\n\u21e3(1 + \u2318),eO\u21e3\"1(pkn + pkd)\u2318\u2318\n\u21e31,eOnk3 \"1\u2318\n(1,eO( 1pk\u00b5 log(log dk/(!k))\n\u21e31,eO\"1kpn\u2318\n\u21e31,eOn\"1\u2318\n\n\"!k\n\n))\n\nFrobenius\n\nSpectral\n\nHardt-Roth [18]\n\nUpadhyay [32]\n\nKapralov-Talwar [21]\n\nHardt-Price [17]\n\nDwork et al. [14]\nJiang et al. [20]\n\nAssumptions\nkA  A0k1 = 1\nA  A0 = esv>\n\u00b5-coherence\nA  A0 = uv>\n\nkAkop  kA0kop = 1\n-value separation\nA  A0 = ese>t\n\u00b5-coherence\nA  A0 = esv>\nA  A0 = ese>t\n\n2 Basic Preliminaries\n\nOne of the key features of differential privacy is that it is preserved under arbitrary post-processing,\ni.e., an analyst, without additional information about the private database, cannot compute a function\nthat makes an output less differentially private. This is formalized in the form of following lemma:\nLemma 4 (Dwork et al. [11])). Let M(D) be an (\", )-differential private algorithm for a data matrix\nD , and let h be any function, then any mechanism M0 := h(M(D)) is also (\", )-differentially\nprivate.\n\np\n\np\n\nA key ingredient in our algorithms is a p-stable distribution which can be de\ufb01ned in terms of a limit\nof normalized sums of i.i.d. random variables [33].\nDe\ufb01nition 5 (p-stable distirbution). A distribution Dp over R is called p-stable, if there exists p  0,\nsuch that for any (v1,\u00b7\u00b7\u00b7 , vn) 2 Rn, and n i.i.d. random variables X1,\u00b7\u00b7\u00b7 , Xn with distribution Dp,\nthe random variablePi viXi has the same distribution as the variable kvkp X, where X \u21e0D p.\nWe use the notation D(r,c)\nto denote a distribution over r \u21e5 c random matrices, where every entry\nof the matrix is sampled from the distribution Dp. It is known that p-stable distributions exist for\nall p 2 (0, 2] [33], and that Gaussian distribution is 2-stable and the Cauchy distribution is 1-stable.\nMoreover, one can use the method of Chambers et al. [8] to sample from Dp (1 < p < 2).\nOur analysis uses the fact that S \u21e0D (r,c)\nsatis\ufb01es the no-dilation and no-contraction property [28].\nDe\ufb01nition 6 (No-dilation [28]). Given a matrix A 2 Rn\u21e5d, if a matrix S 2 Rm\u21e5n satis\ufb01es\nkSAkp \uf8ff c1 kAkp , then S has at most c1 dilation on A with respect to entrywise `p-norm.\nDe\ufb01nition 7 (No-contraction [28]). Given a matrix A 2 Rn\u21e5d, a matrix S 2 Rm\u21e5n has c2-\ncontraction on A with respect to the entrywise `p-norm if 8x 2 Rd,kSAxkp  c21 kAxkp .\nOur analysis uses recent results from matrix sketching. In particular, we use the fact that we can\napproximately solve `p-regression problem using random matrix sketches [29].\nLemma 8 (Song et al. [29]). Let  2 R\u21e5n be a projection matrix that preserves `p-\nnorm of a vector for p 2 [1, 2) and let B 2 Rn\u21e5d, C 2 Rn\u21e5c be any matrix.\nLet\neX := argminX2Rd\u21e5c k(BX  C)kp , bX := argminX2Rd\u21e5c kBX  Ckp , then kBeX  Ckp \uf8ff\nCkBbX  Ckp for some constant C that depends only on log d.\nLemma 9 (Song et al. [29]). Given matrices L, N, A of appropriate dimension, let X\u21e4 :=\nargminX kLXN  Akp. Suppose S and T satis\ufb01es c1-dilation on LX\u21e4N  A and c2-contraction\nproperty on L. Further if bX be such that kS(LbXNA)Tkp \uf8ff c\u00b7minrank(X)\uf8ffk kS(LXN  A)Tkp ,\nthen, we have that kLbXN  Akp \uf8ff O(c1c2c) \u00b7 minrank(X)\uf8ffk kLXN  Akp .\n\n3\n\n\f7: `2-LRA: Compute bX = Vc\u2303\u2020c[U T\n8: Output: M = YcbXYr.\n\nc ZV T\n\nr ]k\u2303\u2020rU T\n\nr , where [B]k = argminr(X)\uf8ffk kB  XkF .\n\nAlgorithm 1 ROBUST-LRA\nInput: Input data matrix A 2 Rn\u21e5d, target rank k\nOutput: Rank-k matrix M 2 Rn\u21e5d\n1: Initialization: Set the variables , , s, t, C, C , Cs, Ct as in Table 2.\n2: Initialization: Sample  2 R\u21e5n, 2 Rd\u21e5 , S 2 Rs\u21e5n, and T 2 Rd\u21e5t from distributions\n3: Sample: N1 2 R\u21e5d, N2 2 Rn\u21e5 , N3 2 Rs\u21e5t such that N1 \u21e0 Lap(0, C /\")n\u21e5 , N2 \u21e0\n4: Sketch: Compute Yr = A + N1, Yc = A + N2.\n5: Sketch: Compute Zr = YrT , Zc = SYc, Z = SAT +N3.\n6: SVD: Compute [Uc, \u2303c, Vc] = SVD(Zc), [Ur, \u2303r, Vr] = SVD(Zr).\n\nD(,n)\nLap(0, C/\")\u21e5d, and N3 \u21e0 Lap(0, CsCt/\")s\u21e5t. Keep N1, N2, N3 private.\n\n, respectively. All these matrices are made public.\n\n, and D(d,t)\n\n, D(d, )\n\np\n\n, D(s,n)\n\np\n\np\n\np\n\nTable 2: Values of different variables.\n\nC, Cs\nO(log d) O(log n) O(k log k log(1/))\n\n, , s, t\n\nC , Ct\n\n3 Differentially private robust LRA\n\nIn this section, we give an (\", )-differentially private polynomial-time algorithm for robust low-rank\napproximation. We \ufb01rst discuss algorithmic challenges in extending known techniques and analyses\nto our problem. We present the proposed algorithm and main results in Section 3.1, and discuss\nextensions to the general turnstile model and the continual release model in Section 3.2. Proofs of all\nresults are deferred to the supplementary material of this paper.\nTwo common approaches to preserve privacy are output perturbation [11] and input perturbation [3,\n30] of the objective function. In output perturbation, we \ufb01rst compute the output (e.g. rank-k\napproximation of a given matrix) non-privately and then add appropriately scaled noise to preserve\nprivacy. In input perturbation, we add noise to the private matrix and then compute the output on the\nnoisy matrix. Both these approaches require adding noise to every entry of the given input matrix or\nto every entry of the non-private output matrix. Consequently, both of these methods would incur an\nadditive error of O(nd). On the other hand, most existing non-private algorithms for robust low-rank\napproximation either use heuristics and do not have provable guarantees, or they make additional\nassumptions on the input matrix; the only exception is the work of Song et al. [29]. Again, a naive\nmechanism to make the algorithm of Song et al. [29] private would incur an additive error of O(nd).\n\n3.1 Proposed Algorithm\n\np\n\nand T \u21e0D (c,d)\n\nIt is somewhat tantalizing, from a computational perspective, to attempt approximating a solution to\nthe robust LRA problem using a low-rank approximation with respect to `2-norm; however, it is well\nunderstood that the latter is quite sensitive to even a single outlier. A key idea behind the proposed\nsolution then is based on the following key observation. We can approximate the output of robust low\nrank approximation using low rank approximation with respect to `2-norm after sketching the matrix\nusing S \u21e0D (r,n)\nfor some choice of r and s. In particular, p-stable distribution\nimparts robustness, and the effect of outliers is reduced in the lower dimensional space.\nIn summary, the proposed algorithms are based on the following three algorithmic primitives: (a)\nsketching the row-space and column-space of the input matrix, (b) formulating the low-rank matrix\napproximation problem as a regression problem, and (c) approximating the solution to `p regression\nproblem by corresponding `2 regression problem. The analysis, then, carefully bounds the error in\napproximation for each of the steps above as well as error resulting from the privacy mechanism.\nThe pseudo-code of the proposed algorithm (ROBUST-LRA) is presented as Algorithm 1. We present\nvalues of various variables used in the algorithm in Table 2. Our main result is as follows.\n\np\n\n4\n\n\fTheorem 10. Algorithm ROBUST-LRA (see Algorithm 1) is (\", )-differentially private. Further-\n\nk matrix M such that, with probability 9/10 over the randomness of the algorithm,\n\nRemark 1. Algorithm ROBUST-LRA (Figure 1) outputs a low-rank matrix. However, it is possible\nto output a low-rank factorization without any loss in ef\ufb01ciency. It can be done by computing the SVD\n\nmore, given a matrix A 2 Rn\u21e5d, it runs in poly(k, n, d) time, eO(k(n + d)) space, and outputs a rank\nkA  Mkp \uf8ff O((k log k log(1/))2(2p)/plog d log n)OPTk(A) + eO(k2(n+kd) log2(1/)/\"),\nwhere OPTk(A) := minrank(X)\uf8ffk kA  Xkp.\nIn particular, for p = 1, we get\nkA  Mkp \uf8ff O(k2 log2 k log2(1/)log d log n)OPTk(A) + eO(k2(n+kd) log2(1/)/\").\n[UbX, \u2303bX, VbX] of bX, the QR decomposition of Yc and Yr to get orthonormal bases U of column space\nof Yc and V of the row space of Yr. The algorithm then outputs [U UbX, \u2303bX, V VbX] as a low-rank\nfactorization. The extra running time of this algorithm is O(2d + 2n +  2) = eO(k2(n + d)).\n\nThis is smaller than O(nd2) time if one naively factorizes M.\nRemark 2 (Additive Error). The additive error in Theorem 10 has a quadratic dependence on k.\nThere is an implicit tradeoff between the additive and multiplicative error as k increases. When k is\nsmall, then error due to OPTk(A) is higher, and when k is larger, then the additive error is high. For\ninstance when k equals to the rank of the matrix, then we have zero multiplicative error, but additive\nerror is of order O(k2n). Note that O(kn) error is unavoidable because we are trying to hide every\nsingle entry of the matrix A. Without making additional strong assumptions such as (a) stochastic\ndata, and/or (b) incoherence, and/or (c) bounded norms, O(kn) additive error is perhaps the best we\ncan hope for. Intuitively, we have to privatize a k-dimensional latent representation of our data and\ntherefore at least add noise proportional to kn.\n\n3.2 Extension to Other Models of Differential Privacy\nROBUST LRA can be easily extended to the streaming model of computation [32] and the continual\nrelease model [12]. We \ufb01rst de\ufb01ne the basic streaming model of computation that we study.\nDe\ufb01nition 11 (General turnstile update model). In the general turnstile update model, a matrix\nA 2 Rn\u21e5d is streamed in the form of tuple (t, it, jt, ), where 1 \uf8ff it \uf8ff n, 1 \uf8ff jt \uf8ff d and t 2 R.\nAn update is of the form Ait,jt Ait1,jt1 + t. The curator is required to output a robust PCA\nor robust subspace for the matrix at the end of the stream.\n\n. . . \n\nu\n\ns\n\n. . .\n\n(1,5,0)\n\n(6,1,1)\n\n(5,4,0)\n\n(1,5,0)\n\n(3,4,1)\n\n(1,5,1)\n\n(3,4,1)\n\nMatrix as a stream\n\nFor example, in the \ufb01gure, the server receives an update of 6 to\nA1,1 and it updates the small sketch using an update function,\nU.\nAt the end of the stream, the server uses the small sketch\nand runs an algorithm S to compute the function (low-rank\napproximation in our context).\nWe call two streams neighboring if they are formed by neigh-\nboring matrices. Note that the private matrix is stored only in the form of linear sketches, therefore,\nto get an algorithm in the general turnstile streaming model, we initialize Yr = N1, Yc = N2,\nand Z = N3. Then when we receive (t, it, jt) 2 R \u21e5 [n] \u21e5 [d], we construct a matrix\nA(t) with all entries zero except for A(t)\nit,jt = t. We then update the sketches as follows:\nYc = Yc + A(t), Yr = Yc + A(t) , and Z = Z + SA(t)T . Once all the updates are made,\nwe simply run the last three steps of ROBUST-LRA. As a result, we get the following corollary.\nCorollary 12 (Informal). Algorithm ROBUST-LRA is an (\", )-differentially private that on input\na private matrix A in a general turnstile update model, outputs a rank k matrix M with the same\naccuracy guarantee as in Theorem 10.\n\nAnalyst\n\nServer\n\nSmall sketch\n\nROBUST-LRA can also be extended to the following continual release setting [12].\nDe\ufb01nition 13 (Continual release model). In a continual release model, a matrix A 2 Rn\u21e5d is\nstreamed in the form of tuple (t, it, jt), where 1 \uf8ff it \uf8ff n, 1 \uf8ff j \uf8ff d and t 2 R. An update\nis of the form Ait,jt Ait1,jt1 + t. The curator is required to output a robust PCA or robust\nsubspace for the matrix streamed up until any time t \uf8ff T .\n\n5\n\n\fp\n\np\n\np\n\nAlgorithm 2 ROBUST-PCA\nInput: Input data matrix A 2 Rd\u21e5n, target rank k\nOutput: Rank-k projection matrix \u21e7 2 Rd\u21e5d\n1: Initialization: Set the variables , , t, C, C , Ct as in Table 2.\n2: Initialization: Sample  2 R\u21e5d, 2 Rn\u21e5 , S 2 Rs\u21e5d, and T 2 Rn\u21e5t from distributions\n3: Sample: N1 2 R\u21e5t, N2 2 Rd\u21e5 such that N1 \u21e0 Lap(0, CCt/\")\u21e5t, N2 \u21e0\n4: Sketch: Compute Yr = AT T + N1 and Yc = AT + N2. Zc = Yc and Z = Yr\n5: SVD: Compute [Uc, \u2303c, Vc] = SVD(Zc),\n[Ur, \u2303r, Vr] = SVD(Yr).\n6:\n\nD(,d)\nLap(0, C /\")d\u21e5 . Keep N1, N2 private.\n\n, respectively. All these matrices are made public.\n\n, and D(n,t)\n\n,D(n, )\n\n,D(s,d)\n\np\n\n7: `2-LRA: Compute bX = Vc\u2303\u2020c[U T\nr , where [B]k = argminr(X)\uf8ffk kB  XkF .\nc ZV T\n8: Pick: a permutation matrix Q 2 R\u21e5.\n9: Compute: the full SVD of YcbX, [U0, \u23030, V 0]. Set U = U0Q, \u2303=\u2303 0Q, and P = \u2020(U \u2303)\u2020.\n10: Output: \u21e7= P U \u2303(P U \u2303)\u2020.\n\nr ]k\u2303\u2020rU T\n\nFor outputting a low-rank approximation in the continual release model, we can use the generic\ntransformation to store a binary tree that is constructed over the privatized sketches of the updates as\nits leaves [12]. When a new query for a range of updates is made, we accumulate the sketches of the\ndyadic partition of the range to compute the sketches for that range. We then compute the last three\nsteps of ROBUST-LRA. We have the following result.\nCorollary 14. Algorithm ROBUST-LRA is an (\", )-differentially private algorithm that on input\nmatrix A in a streaming manner, runs in time poly(k, n, d, log T ) and outputs a rank k matrix\nM (t) in the continual release model over T time epochs, such that, with probability at least 9/10,\n\nkA(t)  M (t)kp \uf8ff O((k log k log(1/))2(2p)/plog d log n)OPTk(A(t)) + eO(k2(n + kd) log T ),\n\nwhere OPTk(A) is as in Theorem 10, and A(t) is the matrix up to t time epochs.\n\n4 Differentially Private Robust Principal Component Analysis\n\nIn this section, we focus on the problem of robust PCA under the constraints of differential privacy.\nWe \ufb01rst present the proposed algorithm and then discuss extensions to the general turnstile model\ncontinual release model. Proofs of all results are deferred to the supplementary material of this paper.\nThe key ideas underlying the proposed algorithm, ROBUST-PCA (see Algorithm 2 for the pseu-\ndocode), and its analysis, essentially follow the techniques developed in the previous section for\nROBUST-LRA, but with a couple of small modi\ufb01cations to get a better additive error. First, we\nonly generate two sketches, Yr = AT T + N1 and Yc = AT + N2, where , , T are random\nsketching matrices and N1, N2 are noise matrices as de\ufb01ned in Algorithm 2. Second, we solve a\nslightly different optimization problem:\n\nmin\n\nrank(Y )\uf8ffkAT  (P U \u2303)Y (AT )F ,\n\nwhere P, U, \u2303 are as formed in Algorithm 2. We show that (U \u2303P )\u2020 is an approximate solution\n\nto minX(AT  P U \u2303XAT )Tp. The rest of the proof then follows the same steps as in the\n\nproof of Theorem 10. In addition, we also show that \u21e7 is an orthonormal rank-k projection matrix.\nThe above exposition focuses on the non-private setting for the sake of simplicity. The proof is more\ninvolved due to noise matrices added for privacy.\nWe show the following guarantee for the proposed algorithm.\nTheorem 15. Algorithm ROBUST-PCA, (see Algorithm 2), is (\", )-differentially private. Further,\ngiven a matrix A 2 Rn\u21e5d with OPTk(A) := minrank(X)\uf8ffk kA  Xkp, it runs in time poly(k, n, d),\nspace eO(k(n + d)), and outputs a rank k orthonormal projection matrix \u21e7 such that, with probability\n9/10 over the random coin tosses of the algorithm,\n\nkA  A\u21e7kp \uf8ff O((k log k log(1/))2(2p)/p log n log3 d)OPTk(A) + eO(k2d log n/\").\n\n6\n\n\fIn particular, when p = 1, we have the following guarantee:\n\nkA  A\u21e7kp \uf8ff O(k2 log n log3 d log2 k log2(1/))OPTk(A) + eO(k2d log n/\").\n\nWe note that ROBUST-PCA yields a smaller additive error than ROBUST-LRA by a factor of n/d,\nbut at the expense of an additional multiplicative factor of log2(d). Therefore, in settings where\nOPTk(A) is small (e.g. when A is nearly low rank), ROBUST-PCA enjoys a much better accuracy\nguarantee.\nExtension to Other Models of Differential Privacy. We can extend ROBUST-PCA to the streaming\nmodel of computation [32] and the continual release model [12] as in Section 3.2. We can also extend\nROBUST-PCA to the local model of differential privacy. Local differential privacy has gained a lot\nof attention recently [1, 15]. In the local privacy model, there is no central database of private data.\nInstead, each individual has its own data element (a database of size one), and sends a report based\non its own datum in a differentially private manner.\nFormally, we consider the database X = [x1,\u00b7\u00b7\u00b7 , xn]> as a collection of n elements (rows) from\nsome domain X\u2713 Rd, with each row held by a different individual. The ith individual has access\nto \"i-local randomizer, Ri : X! W , which is an \"i-differentially private algorithm that takes\nas input a database of size n = 1. We assume that the algorithms may interact with the database\nonly through local randomizers. We can then de\ufb01ne local differential privacy as follows [13]. An\nalgorithm is \"-locally differentially private if it accesses the database X via the local randomizers,\nR1(x1), . . . , Rn(xn), where max{\"1, . . . ,\" n}\uf8ff \".\nWe note that what we have de\ufb01ned above is a non-interactive local differential privacy algorithm\nwhere an individual only sends a single message to the server. It was argued in Smith et al. [27] that\nit is more desirable to have as few rounds of interactions as possible from an implementation point of\nview. In fact, existing large-scale deployments are limited to one that are non-interactive. Therefore,\nwe limit our study to what is possible in the non-interactive variant of local differential privacy.\nWe extend Algorithm 2 to an \"-locally-differentially private protocol, LOCAL-ROBUST-PCA, where\nevery user 1 \uf8ff i \uf8ff n has a row Ai: of the data matrix and sends only one message to the server.\nWe show that the output produced by the server after a run of LOCAL-ROBUST-PCA is a rank-k\northonormal projection matrix \u21e7 2 Rd\u21e5d such that\n\nkA  A\u21e7kp \uf8ff O(log n log3 d (k log k log(1/))2(2p)/p)OPTk(A) + eO(k2nd/\").\n\nThe above guarantee is non-trivial when kAkp  nd. Such an assumption is often valid in practical\nsettings with large corruption to data matrices.\n\n5 Discussion\n\nIn this paper, we present differentially private algorithms for robust low-rank approximation and for\nrobust principal component analysis. In addition, we study extensions of our algorithms to a continual\nrelease model, the streaming model of computation, and the local model of differential privacy.\nThe bounds we provide involve a multiplicative factor that depends on the target rank k. Such a\ndependence was deemed necessary in non-private settings. In particular, Song et al. [29] show that if\nthe exponential time hypothesis is true, then any linear-sketch based polynomial time algorithm for\nrobust rank-k factorization incurs an \u2326(k1/2) multiplicative approximation for some  2 (0, 0.5)\nthat can be arbitrarily small. It is not clear immediately if such a result still holds when we allow an\nadditive error in the approximation, as is the case here.\n\nAcknowledgements\n\nThis research was supported in part by NSF BIGDATA grant IIS-1546482, NSF BIGDATA grant\nIIS-1838139, NSF Career CCF-1652257, and ONR Award N00014-18-1-2364.\n\n7\n\n\fReferences\n[1] Apple tries to peek at user habits without violating privacy. The Wall Street Journal, 2016.\n\n[2] Dimitris Achlioptas and Frank McSherry. On spectral learning of mixtures of distributions. In\n\nLearning Theory, pages 458\u2013469. Springer, 2005.\n\n[3] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss\n\nTransform Itself Preserves Differential Privacy. In FOCS, pages 410\u2013419, 2012.\n\n[4] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive\n\ndatabase privacy. Journal of the ACM (JACM), 60(2):12, 2013.\n\n[5] J Paul Brooks, Jos\u00e9 H Dul\u00e1, and Edward L Boone. A pure l1-norm principal component analysis.\n\nComputational statistics &amp; data analysis, 61:83\u201398, 2013.\n\n[6] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy.\n\narXiv preprint arXiv:1711.04740, 2017.\n\n[7] Emmanuel J Cand\u00e8s, Xiaodong Li, Yi Ma, and John Wright. Robust principal component\n\nanalysis? Journal of the ACM (JACM), 58(3):11, 2011.\n\n[8] John M Chambers, Colin L Mallows, and BW Stuck. A method for simulating stable random\n\nvariables. Journal of the american statistical association, 71(354):340\u2013344, 1976.\n\n[9] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu.\nDimensionality reduction for k-means clustering and low rank approximation. In STOC, pages\n163\u2013172. ACM, 2015.\n\n[10] Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. Competitive recommendation\n\nsystems. In STOC, pages 82\u201390. ACM, 2002.\n\n[11] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to\n\nsensitivity in private data analysis. In TCC, pages 265\u2013284. Springer, 2006.\n\n[12] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under\n\ncontinual observation. In Leonard J. Schulman, editor, STOC, pages 715\u2013724. ACM, 2010.\n\n[13] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Founda-\n\ntions and Trends in Theoretical Computer Science, 9(3-4):211\u2013407, 2014.\n\n[14] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze Gauss: Optimal\nBounds for Privacy-Preserving Principal Component Analysis. In STOC, pages 11\u201320, 2014.\n\n[15] \u00dalfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable\nprivacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on\ncomputer and communications security, pages 1054\u20131067. ACM, 2014.\n\n[16] Nicolas Gillis and Stephen A Vavasis. On the complexity of robust pca and `1-norm low-rank\n\nmatrix approximation. arXiv preprint arXiv:1509.09236, 2015.\n\n[17] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In\nZ. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, editors, Advances\nin Neural Information Processing Systems 27, pages 2861\u20132869. Curran Associates, Inc., 2014.\n\n[18] Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices.\n\nHoward J. Karloff and Toniann Pitassi, editors, STOC, pages 1255\u20131268. ACM, 2012.\n\nIn\n\n[19] Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computa-\ntion. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, STOC, pages 331\u2013340.\nACM, 2013.\n\n[20] Wuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private\n\nprincipal components analysis. arXiv preprint arXiv:1511.05680, 2015.\n\n8\n\n\f[21] Michael Kapralov and Kunal Talwar. On differentially private low rank approximation. In\n\nSODA, volume 5, page 1. SIAM, 2013.\n\n[22] Qifa Ke and Takeo Kanade. Robust l1 norm factorization in the presence of outliers and missing\ndata by alternative convex programming. In Computer Vision and Pattern Recognition, 2005.\nCVPR 2005. IEEE Computer Society Conference on, volume 1, pages 739\u2013746. IEEE, 2005.\n\n[23] Eunwoo Kim, Minsik Lee, Chong-Ho Choi, Nojun Kwak, and Songhwai Oh. Ef\ufb01cient l_{1}-\nnorm-based low-rank matrix approximations for large-scale problems using alternating recti\ufb01ed\ngradient method. IEEE transactions on neural networks and learning systems, 26(2):237\u2013251,\n2015.\n\n[24] Panos P Markopoulos, Sandipan Kundu, Shubham Chamadia, and Dimitrios Pados. Ef\ufb01cient\nl1-norm principal-component analysis via bit \ufb02ipping. IEEE Transactions on Signal Processing,\n2017.\n\n[25] Frank McSherry. Spectral partitioning of random graphs. In FOCS, pages 529\u2013537. IEEE,\n\n2001.\n\n[26] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages\n\n94\u2013103. IEEE, 2007.\n\n[27] A. Smith, A. Thakurata, and J. Upadhyay. Is Interaction Necessary for Distributed Private\n\nLearning? To Appear in IEEE Symposium for Security & Privacy, 2017.\n\n[28] Christian Sohler and David P Woodruff. Subspace embeddings for the l 1-norm with applications.\nIn Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 755\u2013\n764. ACM, 2011.\n\n[29] Zhao Song, David P. Woodruff, and Peilin Zhong. Low rank approximation with entrywise\n`1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of\nComputing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 688\u2013701, 2017.\n[30] Jalaj Upadhyay. Random Projections, Graph Sparsi\ufb01cation, and Differential Privacy.\n\nIn\n\nASIACRYPT (1), pages 276\u2013295, 2013.\n\n[31] Jalaj Upadhyay. Differentially private linear algebra in the streaming model. arXiv preprint\n\narXiv:1409.5414, 2014.\n\n[32] Jalaj Upadhyay. The price of privacy for low-rank factorization.\n\nInformation Processing Systems, pages 4180\u20134191, 2018.\n\nIn Advances in Neural\n\n[33] Vladimir M Zolotarev. One-dimensional stable distributions, volume 65. American Mathemati-\n\ncal Soc., 1986.\n\n9\n\n\f", "award": [], "sourceid": 2051, "authors": [{"given_name": "Raman", "family_name": "Arora", "institution": "Johns Hopkins University"}, {"given_name": "Vladimir", "family_name": "braverman", "institution": "Johns Hopkins University"}, {"given_name": "Jalaj", "family_name": "Upadhyay", "institution": "Johns Hopkins University"}]}