{"title": "Controlling privacy in recommender systems", "book": "Advances in Neural Information Processing Systems", "page_first": 2618, "page_last": 2626, "abstract": "Recommender systems involve an inherent trade-off between accuracy of recommendations and the extent to which users are willing to release information about their preferences. In this paper, we explore a two-tiered notion of privacy where there is a small set of ``public'' users who are willing to share their preferences openly, and a large set of ``private'' users who require privacy guarantees. We show theoretically and demonstrate empirically that a moderate number of public users with no access to private user information already suffices for reasonable accuracy. Moreover, we introduce a new privacy concept for gleaning relational information from private users while maintaining a first order deniability. We demonstrate gains from controlled access to private user preferences.", "full_text": "Controlling privacy in recommender systems\n\nYu Xin\n\nCSAIL, MIT\n\nyuxin@mit.edu\n\nTommi Jaakkola\n\nCSAIL, MIT\n\ntommi@csail.mit.edu\n\nAbstract\n\nRecommender systems involve an inherent trade-off between accuracy of recom-\nmendations and the extent to which users are willing to release information about\ntheir preferences. In this paper, we explore a two-tiered notion of privacy where\nthere is a small set of \u201cpublic\u201d users who are willing to share their preferences\nopenly, and a large set of \u201cprivate\u201d users who require privacy guarantees. We\nshow theoretically and demonstrate empirically that a moderate number of public\nusers with no access to private user information already suf\ufb01ces for reasonable\naccuracy. Moreover, we introduce a new privacy concept for gleaning relational\ninformation from private users while maintaining a \ufb01rst order deniability. We\ndemonstrate gains from controlled access to private user preferences.\n\n1\n\nIntroduction\n\nRecommender systems exploit fragmented information available from each user. In a realistic sys-\ntem there\u2019s also considerable \u201cchurn\u201d, i.e., users/items entering or leaving the system. The core\nproblem of transferring the collective experience of many users to an individual user can be under-\nstood in terms of matrix completion ([13, 14]). Given a sparsely populated matrix of preferences,\nwhere rows and columns of the matrix correspond to users and items, respectively, the goal is to\npredict values for the missing entries.\nMatrix completion problems can be solved as convex regularization problems, using trace norm\nas a convex surrogate to rank. A number of algorithms are available for solving large-scale trace-\nnorm regularization problems. Such algorithms typically operate by iteratively building the matrix\nfrom rank-1 components (e.g., [7, 17]). Under reasonable assumptions (e.g., boundedness, noise,\nrestricted strong convexity), the resulting empirical estimators have been shown to converge to the\nunderlying matrix with high probability ([12, 8, 2]). Consistency guarantees have mostly involved\nmatrices of \ufb01xed dimension, i.e., generalization to new users is not considered. In this paper, we\nreformulate the regularization problem in a manner that depends only on the item (as opposed to\nuser) features, and characterize the error for out-of-sample users.\nThe completion accuracy depends directly on the amount of information that each user is will-\ning to share with the system ([1]).\nIt may be possible in some cases to side-step this statistical\ntrade-off by building Peer-to-Peer networks with homomorphic encryption that is computationally\nchallenging([3, 11]). We aim to address the statistical question directly.\nThe statistical trade-off between accuracy and privacy further depends on the notion of privacy we\nadopt. A commonly used privacy concept is Differential Privacy (DP) ([6]), \ufb01rst introduced to\nprotect information leaked from database queries. In a recommender context, users may agree to a\ntrusted party to hold and aggregate their data, and perform computations on their behalf. Privacy\nguarantees are then sought for any results published beyond the trusted party (including back to\nthe users). In this setting, differential privacy can be achieved through obfuscation (adding noise)\nwithout a signi\ufb01cant loss of accuracy ([10]).\n\n1\n\n\fIn contrast to [10], we view the system as an untrusted entity, and assume that users wish to guard\ntheir own data. We depart from differential privacy and separate computations that can be done\nlocally (privately) by individual users and computations that must be performed by the system (e.g.,\naggregation). For example, in terms of low rank matrices, only the item features have to be solved by\nthe system. The corresponding user features can be obtained locally by the users and subsequently\nused for ranking.\nFrom this perspective, we divide the set of users into two pools, the set of public users who openly\nshare their preferences, and the larger set of private users who require explicit privacy guarantees.\nWe show theoretically and demonstrate empirically that a moderate number of public users suf\ufb01ce\nfor accurate estimation of item features. The remaining private users can make use of these item\nfeatures without any release of information. Moreover, we propose a new 2nd order privacy concept\nwhich uses limited (2nd order) information from the private users as well, and illustrate how recom-\nmendations can be further improved while maintaining marginal deniability of private information.\n\n2 Problem formulation and summary of results\n\nRecommender setup without privacy Consider a recommendation problem with n users and\nm items. The underlying complete rating matrix to be recovered is \u02daX \u2208 Rn\u00d7m. If only a few\nlatent factors affect user preferences, \u02daX can be assumed to have low rank. As such, it is also\nrecoverable from a small number of observed entries. We assume that entries are observed with\nnoise. Speci\ufb01cally,\n\nYi,j = \u02daXi,j + \u0001i,j, (i, j) \u2208 \u2126\n\n(1)\n\nwhere \u2126 denotes the set of observed entries. Noise is assumed to be i.i.d and follows a zero-\nmean sub-Gaussian distribution with parameter (cid:107)\u0001(cid:107)\u03c82 = \u03c3. Following [16], we refer to the noise\ndistribution as Sub(\u03c32).\nTo bias our estimated rating matrix X to have low rank, we use convex relaxation of rank in the form\ni \u03c3i(X).\n\nof trace norm. The trace-norm is the sum of singular values of the matrix or (cid:107)X(cid:107)\u2217 = (cid:80)\n\nThe basic estimation problem, without any privacy considerations, is then given by\n\nmin\n\nX\u2208Rm\u00d7n\n\n1\nN\n\n(Yi,j \u2212 Xi,j)2 +\n\n\u03bb\u221a\nmn\n\n(cid:107)X(cid:107)\u2217\n\n(2)\n\n(cid:88)\n\n(i,j)\u2208\u2126\n\n\u221a\n\nmn ensures that the regularization does not grow with either dimension.\n\nwhere \u03bb is a regularization parameter and N = |\u2126| is the total number of observed ratings. The\nfactor\n\u221a\nThe above formulation requires the server to explicitly obtain predictions for each user, i.e., solve\nfor X. We can instead write X = U V T and \u03a3 = (1/\nmn)V V T , and solve for \u03a3 only. If the server\nthen communicates the resulting low rank \u03a3 (or just V ) to each user, the users can reconstruct the\nrelevant part of U locally, and reproduce X as it pertains to them. To this end, let \u03c6i = {j : (i, j) \u2208\n\u2126} be the set of observed entries for user i, and let Yi,\u03c6i be a column vector of user i\u2019s ratings. Then\nwe can show that Eq.(2) is equivalent to solving\n\nn(cid:88)\n\ni=1\n\nmin\n\u03a3\u2208S+\n\nY T\ni,\u03c6i\n\n(\u03bb(cid:48)I + \u03a3\u03c6i,\u03c6i)Yi,\u03c6i +\n\n\u221a\n\nnm(cid:107)\u03a3(cid:107)\u2217\n\n\u221a\nwhere S+ is the set of positive semi-de\ufb01nite m \u00d7 m matrices and \u03bb(cid:48) = \u03bbN/\nwe can predict ratings for unobserved items (index set \u03c6c\n\ni for user i) by\n\n\u02c6Xi,\u03c6c\n\ni\n\n= \u03a3\u03c6c\n\ni ,\u03c6i (\u03bb(cid:48)I + \u03a3\u03c6i,\u03c6i )\u22121Yi,\u03c6i\n\n(3)\n\nnm. By solving \u02c6\u03a3,\n\n(4)\n\nNote that we have yet to address any privacy concerns. The solution to Eq.(3) still requires access\nto full ratings Yi,\u03c6i for each user.\n\nRecommender setup with privacy Our privacy setup assumes an untrusted server. Any user\ninterested in guarding their data must therefore keep and process their data locally, releasing in-\nformation to the server only in a controlled manner. We will initially divide users into two broad\n\n2\n\n\fcategories, public and private. Public users are willing to share all their data with the server while\nprivate users are unwilling to share any. This strict division is removed later when we permit private\nusers to release, in a controlled manner, limited information pertaining to their ratings (2nd order\ninformation) so as to improve recommendations.\nAny data made available to the server enables the server to model the collective experience of users,\nfor example, to solve Eq.(3). We will initially consider the setting where Eq.(3) is solved on the\nbasis of public users only. We use an EM type algorithm for training. In the E-step, the current \u03a3\nis sent to public users to complete their rating vectors and send back to the server. In the M-step,\n\u03a3 is then updated based on these full rating vectors. The resulting \u02c6\u03a3 (or \u02c6V ) can be subsequently\nshared with the private users, enabling the private users (their devices) to locally rank candidate\nitems without any release of private information. The estimation of \u02c6\u03a3 is then improved by asking\nprivate users to share 2nd order relational information about their ratings without any release of\nmarginal selections/ratings.\nNote that we do not consider privacy beyond ratings. In other words, we omit any subsequent release\nof information due to users exploring items recommended to them.\n\nSummary of contributions We outline here our major contributions towards characterizing the\nrole of public users and the additional controlled release of information from private users.\n\n(cid:112) \u02daX T \u02daX/\n\n\u221a\n\n1) We show that \u02da\u03a3 =\nnm can be estimated in a consistent, accurate manner on the basis\nof public users alone. In particular, we express the error (cid:107) \u02c6\u03a3\u2212 \u02da\u03a3(cid:107)F as a function of the total number\nof observations. Moreover, if the underlying public user ratings can be thought of as i.i.d. samples,\nwe also bound (cid:107)\u02da\u03a3 \u2212 \u03a3\u2217(cid:107)F in terms of the number of public users. Here \u03a3\u2217 is the true limiting\nestimate. See section 3.1 for details.\n2) We show how the accuracy of predicted ratings \u02c6Xi,\u03c6c\nfor private users relates to the accuracy of\nestimating \u02c6\u03a3 (primarily from public users). Since the ratings for user i may not be related to the\nsubspace that \u02c6\u03a3 lies in, we can only characterize the accuracy when suf\ufb01cient overlap exists. We\nquantify this overlap, and show how (cid:107) \u02c6Xi,\u03c6c\n(cid:107) depends on this overlap, accuracy of \u02c6\u03a3, and\nthe observation noise. See section 3.2 for details.\n3) Having established the accuracy of predictions based on public users alone, we go one step further\nand introduce a new privacy mechanism and algorithms for gleaning additional relational (2nd order)\ninformation from private users. This 2nd order information is readily used by the server to estimate\n\u02c6\u03a3. The privacy concept constructively maintains \ufb01rst order (marginal) deniability for private users.\nWe demonstrate empirically the gains from the additional 2nd order information. See section 4.\n\n\u2212 \u02daXi,\u03c6c\n\ni\n\ni\n\ni\n\n3 Analysis\n\n3.1 Statistical Consistency of \u02c6\u03a3\n\nmn\n\n(cid:112) \u02c6X T \u02c6X we can \ufb01rst analyze errors in \u02c6X and then relate them to \u02c6\u03a3. To this end,\n\nLet \u02c6X be a solution to Eq.(2). We can write \u02c6X = \u02c6U \u02c6V T , where \u02c6U T \u02c6U = \u02c6Im with 0/1 diagonal.\nSince \u02c6\u03a3 = 1\u221a\nwe follow the restricted strong convexity (RSC) analysis[12]. However, their result depends on\nthe inverse of the minimum number of ratings of all users and items. In practice (see below), the\nnumber of ratings decays exponentially across sorted users, making such a result loose. We provide\na modi\ufb01ed analysis that depends only on the total number of observations N.\nThroughout the analysis, we assume that each row vector \u02daXi,\u00b7 belongs to a \ufb01xed r dimensional\nsubspace. We also assume that both noiseless and noisy entries are bounded, i.e. |Yi,j|,| \u02daXi,j| \u2264\n(i,j)\u2208\u2126(Yi,j \u2212 Xi,j)2 . The\n\u03b1,\u2200(i, j). For brevity, we use (cid:107)Y \u2212 X(cid:107)2\nrestricted strong convexity property (RSC) assumes that there exists a constant \u03ba > 0 such that\n\n\u2126 to denote the empirical loss(cid:80)\n\n\u03ba\nmn\n\n(cid:107) \u02c6X \u2212 \u02daX(cid:107)2\n\nF \u2264 1\nN\n\n(cid:107) \u02c6X \u2212 \u02daX(cid:107)2\n\n\u2126\n\n(5)\n\n3\n\n\ffor \u02c6X \u2212 \u02daX in a certain subset. RSC provides the step from approximating observations to ap-\nproximating the full underlying matrix.\nIt is satis\ufb01ed with high probability provided that N =\n(m + n) log(m + n)).\nAssume the SVD of \u02daX = \u02daP S \u02daQT , and let row(X) and col(X) denote the row and column spaces of\nX. We de\ufb01ne the following two sets,\n\nA(P, Q)\n\n:= {X, row(X) \u2286 \u02daP , col(X) \u2286 \u02daQ}\n:= {X, row(X) \u2286 \u02daP \u22a5, col(X) \u2286 \u02daQ\u22a5}\n\nB(P, Q)\n\n(6)\nLet \u03c0A(X) and \u03c0B(X) be the projection of X onto sets A and B, respectively, and \u03c0A = I \u2212 \u03c0A,\n\u03c0B = I \u2212 \u03c0B. Let \u2206 = \u02c6X \u2212 \u02daX be the difference between the estimated and the underlying rating\nmatrices. Our \ufb01rst lemma demonstrates that \u2206 lies primarily in a restricted subspace and the second\none guarantees that the noise remains bounded.\nLemma 3.1. Assume \u0001i,j for (i, j) \u2208 \u2126 are i.i.d. sub-gaussian with \u03c3 = (cid:107)\u0001i,j(cid:107)\u03c81. Then with\nprobability 1 \u2212 e\nlog2 N. Here h > 0 is an absolute\nconstant associated with the sub-gaussian noise.\n\nN 4ch , (cid:107)\u03c0B(\u2206)(cid:107)\u2217 \u2264 (cid:107)\u03c0B(\u2206)(cid:107)\u2217 + 2c2\u03c32\u221a\nN = b log N(cid:112) n\n(cid:112) mn\n, then c2\u03c32\u221a\n\nIf \u03bb = \u03bb0c\u03c3 log2 N\u221a\nN where we leave the de-\npendence on n explicit. Let D(b, n, N ) denote the set of difference matrices that satisfy lemma 3.1\nabove. By combining the lemma and the RSC property, we obtain the following theorem.\nTheorem 3.2. Assume RSC for the set D(b, n, N ) with parameter \u03ba > 0 where b = c\u03c3\n\u03bb = \u03bb0c\u03c3 log N\u221a\n, then we have\nwhere h, c > 0 are constants.\n\n. Let\nwith probability at least 1\u2212 e\n\nmn(cid:107)\u2206(cid:107)F \u2264 2c\u03c3( 1\u221a\n1\u221a\n\n\u221a\n\u03ba ) log N\u221a\n\nmn log N\nN \u03bb\n\n= c\u03c3 log N\n\n\u03ba +\n\nN 4ch\n\n\u221a\n\n\u03bb0\n\nm\n\nmn\n\nN \u03bb\n\n2r\n\nN\n\nN\n\n\u03bb0\n\nN\n\nThe bound in the theorem consists of two terms, pertaining to the noise and the regularization. In\ncontrast to [12], the terms only relate to the total number of observations N.\nWe now turn our focus on the accuracy of \u02c6\u03a3. First, we map the accuracy of \u02c6X to that of \u02c6\u03a3 using a\nperturbation bound for polar decomposition (see [9]).\nLemma 3.3. If\n\nmn(cid:107) \u02c6X \u2212 \u02daX(cid:107)F \u2264 \u03b4, then (cid:107) \u02c6\u03a3 \u2212 \u02da\u03a3(cid:107)F \u2264 \u221a\n\n1\u221a\n\n2\u03b4\n\nThis completes our analysis in terms of recovering \u02da\u03a3 for a \ufb01xed size underlying matrix \u02daX. As a\n\ufb01nal step, we turn to the question of how the estimation error changes as the number of users or n\ngrows. Let \u02daXi be the underlying rating vector for user i and de\ufb01ne \u0398n = 1\n\u02daXi. Then\nmn\n2 . We bound the distance between \u02da\u03a3 and \u03a3\u2217.\n\u02da\u03a3 = (\u0398n) 1\nTheorem 3.4. Assume \u02daXi are i.i.d samples from a distribution with support only in a subspace\nof dimension r and bounded norm (cid:107) \u02daXi(cid:107) \u2264 \u03b1\nm. Let \u03b21 and \u03b2r be the smallest and largest\neigenvalues of \u03a3\u2217. Then, for large enough n, with probability at least 1 \u2212 r\nn2 ,\n\n2 . If \u0398\u2217 is the limit of \u0398n, then \u03a3\u2217 = (\u0398\u2217) 1\n\n\u02daX T\ni\n\n\u221a\n\ni=1\n\n(cid:80)n\n\n(cid:115)\n\n\u221a\n(cid:107)\u02da\u03a3 \u2212 \u03a3\u2217(cid:107)F \u2264 2\n\nr\u03b1\n\n\u03b2r log n\n\n\u03b21n\n\n+ o(\n\nlog n\n\nn\n\n)\n\n(7)\n\nCombining the two theorems and using triangle inequality, we obtain a high probability bound on\n(cid:107) \u02c6\u03a3 \u2212 \u03a3\u2217(cid:107)F . Assuming the number of ratings for each user is larger than \u03bem, then N > \u03benm and\nn) with \u03b7 being a constant that depends on \u03be. For large\nthe bound grows in the rate of \u03b7(log n/\nenough \u03be, the required n to achieve a certain error bound is small. Therefore a few public users with\nlarge number of ratings could be enough to obtain a good estimate of \u03a3\u2217.\n\n\u221a\n\n3.2 Prediction accuracy\n\nWe are \ufb01nally ready to characterize the error in the predicted ratings \u02c6Xi,\u03c6c\nfor all users as de\ufb01ned in\nEq.(4). For brevity, we use \u03b4 to represent the bound on (cid:107) \u02c6\u03a3\u2212 \u03a3\u2217(cid:107) obtained on the basis of our results\nabove. We also use x\u03c6 and x\u03c6c as shorthands for Xi,\u03c6i and Xi,\u03c6c\nwith the idea that x\u03c6 typically\nrefers to a new private user.\n\ni\n\ni\n\n4\n\n\fThe key issue for us here is that the partial rating vector x\u03c6 may be of limited use. For example,\nif the number of observed ratings is less than rank r, then we would be unable to identify a rating\nvector in the r dimensional subspace even without noise. We seek to control this in our analysis by\nassuming that the observations have enough signal to be useful. Let SVD of \u03a3\u2217 be Q\u2217S\u2217(Q\u2217)T ,\nand \u03b21 be its minimum eigenvalue. We constrain the index set of observations \u03c6 such that it belongs\nto the set\n\n(cid:26)\n\n(cid:27)\nF ,\u2200x \u2208 row((Q\u2217)T )\n\nD(\u03b3) =\n\n\u03c6 \u2286 {1, . . . , m}, s.t.(cid:107)x(cid:107)2\n\nF \u2264 \u03b3\n\nm\n\n|\u03c6|(cid:107)x\u03c6(cid:107)2\n\nThe parameter \u03b3 depends on how the low dimensional sub-space is aligned with the coordinate axes.\nWe are only interested in characterizing prediction errors for observations that are in D(\u03b3). This is\nquite different from the RSC property. Our main result is then\nTheorem 3.5. Suppose (cid:107)\u03a3 \u2212 \u03a3\u2217(cid:107)F \u2264 \u03b4 (cid:28) \u03b21, \u03c6 \u2208 D(\u03b3). For any \u02dax \u2208 row((Q\u2217)T ), our\nobservation x\u03c6 = \u02dax\u03c6 + \u0001\u03c6 where \u0001\u03c6 \u223c Sub(\u03c32) is the noise vector. The predicted ratings over\nthe remaining entries are given by \u02c6x\u03c6c = \u03a3\u03c6c,\u03c6(\u03bb(cid:48)I + \u03a3\u03c6,\u03c6)\u22121x\u03c6. Then, with probability at least\n1 \u2212 exp(\u2212c2 min(c4\n\n1,(cid:112)|\u03c6|c2\n\n1)),\n\n(cid:114)\n\n\u221a\n(cid:107)x\u03c6c \u2212 \u02dax\u03c6c(cid:107)F \u2264 2\n\n\u03bb(cid:48) + \u03b4(\n\n\u03b3\n\nm\n|\u03c6| + 1)(\n\n(cid:107)\u02dax(cid:107)F\u221a\n\n\u03b21\n\n2c1\u03c3|\u03c6| 1\n4\u221a\n\u03bb(cid:48)\n\n)\n\n+\n\nwhere c1, c2 > 0 are constants.\n\u221a\nAll the proofs are provided in the supplementary material. The term proportional to (cid:107)\u02dax(cid:107)F /\ndue to the estimation error of \u03a3\u2217, while the term proportional to 2c1\u03c3|\u03c6| 1\n4 /\nnoise in the observed ratings.\n\n\u03b21 is\n\u03bb(cid:48) comes from the\n\n\u221a\n\n4 Controlled privacy for private users\n\nOur theoretical results already demonstrate that a relatively small number of public users with many\nratings suf\ufb01ces for a reasonable performance guarantee for both public and private users. Empirical\nresults (next section) support this claim. However, since public users enjoy no privacy guarantees,\nwe would like to limit the required number of such users by requesting private users to contribute in\na limited manner while maintaining speci\ufb01c notions of privacy.\nDe\ufb01nition 4.1. : Privacy preserving mechanism. Let M : Rm\u00d71 \u2192 Rm\u00d71 be a random mecha-\nnism that takes a rating vector r as input and outputs M(r) of the same dimension with jth element\nM(r)j. We say that M(r) is element-wise privacy preserving if Pr(M(r)j = z) = p(z) for\nj = 1, ..., m, and some \ufb01xed distribution p.\nFor example, a privacy preserving mechanism M(r) is element-wise private if its coordinates fol-\nlow the same marginal distribution such as uniform. Note that such a mechanism can still release\ninformation about how different ratings interact (co-vary) which is necessary for estimation.\nDiscrete values. Assume that each element in r and M(r) belongs to a discrete set S with |S| = K.\nA natural privacy constraint is to insist that the marginal distribution of M(r)j is uniform, i.e.,\nPr(M(r)j = z) = 1/K, for z \u2208 S. A trivial mechanism that satis\ufb01es the privacy constraint is to\nselect each value uniformly at random from S. In this case, the returned rating vector contributes\nnothing to the server model. Our goal is to design a mechanism that preserves useful 2nd order\ninformation.\nWe assume that a small number of public user pro\ufb01les are available, from which we can learn\nan initial model parameterized by (\u00b5, V ), where \u00b5 is the item mean vector and V is a low rank\ncomponent of \u03a3. The server provides each private user the pair (\u00b5, V ) and asks, once, for a response\nM(r). Here r is the user\u2019s full rating vector, completed (privately) with the help of the server model\n(\u00b5, V ).\nThe mechanism M(r) is assumed to be element-wise privacy preserving, thus releasing nothing\nabout a single element in isolation. What information should it carry? If each user i provided their\nfull rating vector ri, the server could estimate \u03a3 according to\n2 . Thus,\n\ni=1(ri\u2212\u00b5)(ri\u2212\u00b5)T ) 1\n\nnm ((cid:80)n\n\n1\u221a\n\n5\n\n\fif M(r) preserves the second order statistics to the extent possible, the server could still obtain an\naccurate estimate of \u03a3.\nConsider a particular user and their completed rating vector r. Let P(x) = Pr(M(r) = x). We\nselect distribution P(x) by solving the following optimization problem geared towards preserving\ninteractions between the ratings under the uniform marginal constraint.\n\nmin\n\nP\ns.t.\n\nEx\u223cP(cid:107)(x \u2212 \u00b5)(x \u2212 \u00b5)T \u2212 (r \u2212 \u00b5)(r \u2212 \u00b5)T(cid:107)2\nP(xi = s) = 1/K, \u2200i, \u2200s \u2208 S.\n\nF\n\ni , x2\n\ni , ..., xK\n\ni and xq\n\ni } forms a permutation of S.\n\n(8)\nwhere K = |S|. The exact solution is dif\ufb01cult to obtain because the number of distinct assignments\nof x is K m. Instead, we consider an approximate solution. Let x1, ..., xK \u2208 Rm\u00d71 be K different\nvectors such that, for each i, {x1\nIf we choose x with\nPr(x = xj) = 1/K, then the marginal distribution of each element is uniform therefore maintaining\nelement-wise privacy.\nIt remains to optimize the set x1, ..., xK to capture interactions between\nratings.\nWe use a greedy coordinate descent algorithm to optimize x1, ..., xK. For each coordinate i, we\nrandomly select a pair xp and xq, and switch xp\ni if the objective function in (8) is reduced.\nThe process is repeated a few times before we move on to the next coordinate. The algorithm can\nbe implemented ef\ufb01ciently because each operation deals only with a single coordinate.\nFinally, according to the mechanism, the private user selects one of xj, j = 1, . . . , K, uniformly\nat random and sends the discrete vector back to the server. Since the resulting rating vectors from\nprivate users are noisy, the server decreases their weight relative to the information from public users\nas part of the overall M-step for estimating \u03a3.\nContinuous values. Assuming the rating values are continuous and unbounded, we require instead\nthat the returned rating vectors follow the marginal distributions with a given mean and variance.\nSpeci\ufb01cally, E[M(r)i] = 0 and Var[M(r)i] = v where v is a constant that remains to be deter-\nmined. Note that, again, any speci\ufb01c element of the returned vector will not, in isolation, carry any\ninformation speci\ufb01c to the element.\nAs before, we search for the distribution P so as to minimize the L2 error of the second order\nstatistics under marginal constraints. For simplicity, denote r(cid:48) = r\u2212 \u00b5 where r is the true completed\nrating vector, and ui = M(r)i. The objective is given by\n\nmin\nP,v\ns.t.\n\nEu\u223cP(cid:107)uuT \u2212 r(cid:48)r(cid:48)T(cid:107)2\nE[ui] = 0, Var[ui] = v, \u2200i.\n\nF\n\ni=1 |r(cid:48)\n\ni) and h = ((cid:80)m\n\n(9)\nNote that the formulation does not directly constrain that P has identical marginals, only that the\nmeans and variances agree. However, the optimal solution does, as shown next.\ni|)/m. The minimizing distribution of (9) is\nTheorem 4.2. Let zi = sign(r(cid:48)\ngiven by Pr(u = zh) = Pr(u = \u2212zh) = 1/2.\nWe leave the proof in the supplementary material. A few remarks are in order. The mechanism in this\ncase is a two component mixture distribution, placing a probability mass on sign(r(cid:48))h (vectorized)\nand \u2212sign(r(cid:48))h with equal probability. As a result, the server, knowing the algorithm that private\nusers follow, can reconstruct sign(r(cid:48)) up to an overall randomly chosen sign. Note also that the\nvalue of h is computed from user\u2019s private rating vector and therefore releases some additional\ninformation about r(cid:48) = r \u2212 \u00b5 albeit weakly. To remove this information altogether, we could use\nthe same h for all users and estimate it based on public users.\nThe privacy constraints will clearly have a negative impact on the prediction accuracy in comparison\nto having direct access to all the ratings. However, the goal is to improve accuracy beyond the public\nusers alone by obtaining limited information from private users. While improvements are possible,\nthe limited information surfaces in several ways. First, since private users provide no \ufb01rst order\ninformation, the estimation of mean rating values cannot be improved beyond public users. Second,\nthe sampling method we use to enforce element-wise privacy adds noise to the aggregate second\norder information from which V is constructed. Finally, the server can run the M-step with respect to\nthe private users\u2019 information only once, whereas the original EM algorithm could entertain different\ncompletions for user ratings iteratively. Nevertheless, as illustrated in the next section, the algorithm\ncan still achieve a good accuracy, improving with each additional private user.\n\n6\n\n\f5 Experiments\n\nWe perform experiments on the Movielens 10M dataset which contains 10 million ratings from\n69878 users on 10677 movies. The test set contains 10 ratings for each user. We begin by demon-\nstrating that indeed a few public users suf\ufb01ce for making accurate recommendations. Figure 1 left\nshows the test performance of both weighted (see [12]) and unweighted (uniform) trace norm regu-\nlarization as we add users. Here users with most ratings are added \ufb01rst.\n\nFigure 1: Left: Test RMSE as a function of the percentage of public users; Right: Performance\nchanges with different rating numbers.\n\nWith only 1% of public users added, the test RMSE of unweighted trace norm regularization is\n0.876 which is already close to the optimal prediction error. Note that the loss of weighted trace\nnorm regularization actually starts to go up when the number of users increases. The reason is that\nthe weighted trace norm penalizes less for users with few ratings. As a result, the resulting low\ndimensional subspace used for prediction is in\ufb02uenced more by users with few ratings.\nThe statistical convergence bound in section 3.1 involves both terms that decrease as a function of\nthe number of ratings N and the number of public users n. The two factors are usually coupled. It\nis interesting to see how they impact performance individually. Given a number of total ratings, we\ncompare two different methods of selecting public users. In the \ufb01rst method, users with most ratings\nare selected \ufb01rst, whereas the second method selects users uniformly at random. As a result, if we\nequalize the total number of ratings from each method, the second method selects a lot more users.\nFigure 1 Right shows that the second method achieves better performance. An interpretation, based\non the theory, is that the right side of error bound (7) decreases as the number of users increases.\nWe also show how performance improves based on controlled access to private user preferences.\nFirst, we take the top 100 users with the most ratings as the public users, and learn the initial\nprediction model from their ratings. To highlight possible performance gains, private users with\nmore ratings are selected \ufb01rst. The results remain close if we select private users uniformly.\nThe rating values are from 0.5 to 5 with totally 10 different discrete values. Following the privacy\nmechanism for discrete values, each private user generates ten different candidate vectors and returns\none of them uniformly at random. In the M-step, the weight for each private user is set to 1/2\ncompared to 1 for public users. During training, after processing w = 20 private users, we update\nparameters (\u00b5, V ), re-complete the rating vectors of public users, making predictions for next batch\nof private users more accurate. The privacy mechanism for continuous values is also tested under\nthe same setup. We denote the two privacy mechanism as PMD and PMC, respectively.\nWe compare \ufb01ve different scenarios. First, we use a standard DP method that adds Laplace noise to\nthe completed rating vector. Let the DP parameter be \u0001, because the maximum difference between\nrating values is 4.5, the noise follows Lap(0, 4.5/\u0001). As before, we give a smaller weight to the\nnoisy rating vectors and this is determined by cross validation. Second, [5] proposed a notion of\n\u201clocal privacy\u201d in which differential privacy is guaranteed for each user separately. An optimal\nstrategy for d-dimensional multinomial distribution in this case reduces effective samples from n to\nn\u00012/d where \u0001 is the DP parameter. In our case the dimension corresponds to the number of items\n\n7\n\n00.20.40.60.810.860.870.880.890.90.910.920.930.940.950.96Percentage of UsersTest RMSE Uniform(cid:9)Weighted2004006008001000120014001600180020000.80.911.11.21.31.41.5Number of ratings (k)Test RMSE Most ratingsRandom\fFigure 2: Test RMSE as a function of private user numbers. PMC: the privacy mechanism for\ncontinuous values; PMD: the privacy mechanism for discrete values; Lap eps=1: DP with Laplace\nnoise, \u0001 = 1; Lap eps=5: same as before except \u0001 = 5; SSLP eps=5: sampling strategy described\nin [4] with DP parameter \u0001 = 5; Exact 2nd order: with exact second order statistics from private\nusers (not a valid privacy mechanism); Full EM: EM without any privacy protection.\n\nmaking estimation challenging under DP constraints. We also compare to this method and denote it\nas SSLP (sampling strategy for local privacy).\nIn addition, to understand how our approximation to second order statistics affects the performance,\nwe also compare to the case that r(cid:48)a is given to the server directly where a = {\u22121, 1} with proba-\nbility 1/2. In this way, the server can obtain the exact second order statistics using r(cid:48)r(cid:48)T . Note that\nthis is not a valid privacy preserving mechanism. Finally, we compare to the case that the algorithm\ncan access private user rating vectors multiple times and update the parameters iteratively. Again,\nthis is not a valid mechanism but illustrates how much could be gained.\nFigure 2 shows the performance as a function of the number of private users. The standard Laplace\nnoise method performs reasonably well when \u0001 = 5, however the corresponding privacy guarantee\nis very weak. SSLP improves the accuracy mildly.\nIn contrast, with the privacy mechanism we de\ufb01ned in section 4 the test RMSE decreases signi\ufb01-\ncantly as more private users are added. If we use the exact second order information without the\nsampling method, the \ufb01nal test RMSE would be reduced by 0.07 compared to PMD. Lastly, full\nEM without privacy protection reduces the test RMSE by another 0.08. These performance gaps are\nexpected because there is an inherent trade-off between accuracy and privacy.\n\n6 Conclusion\n\nOur contributions in this paper are three-fold. First, we provide explicit guarantees for estimating\nitem features in matrix completion problems. Second, we show how the resulting estimates, if shared\nwith new users, can be used to predict their ratings depending on the degree of overlap between\ntheir private ratings and the relevant item subspace. The empirical results demonstrate that only a\nsmall number of public users with large number of ratings suf\ufb01ces for a good performance. Third,\nwe introduce a new privacy mechanism for releasing 2nd order information needed for estimating\nitem features while maintaining 1st order deniability. The experiments show that this mechanism\nindeed performs well in comparison to other mechanisms. We believe that allowing different levels\nof privacy is an exciting research topic. An extension of our work would be applying the privacy\nmechanism to the learning of graphical models in which 2nd or higher order information plays an\nimportant role.\n\n7 Acknowledgement\n\nThe work was partially supported by Google Research Award and funding from Qualcomm Inc.\n\n8\n\n0501001502002503003504000.870.8750.880.8850.890.8950.90.9050.910.9150.92Number of \u2019\u2019private\u2019\u2019 usersTest RMSE PMCPMDLap eps=1Lap eps=5SSLP eps=5Exact 2nd orderFull EM\fReferences\n[1] M\u00b4ario S Alvim, Miguel E Andr\u00b4es, Konstantinos Chatzikokolakis, Pierpaolo Degano, and\nCatuscia Palamidessi. Differential privacy: on the trade-off between utility and information\nleakage. In Formal Aspects of Security and Trust, pages 39\u201354. Springer, 2012.\n\n[2] E. Candes and Y. Plan. Matrix completion with noise. In Proceedings of the IEEE, 2010.\n[3] J. Canny. Collaborative \ufb01ltering with privacy via factor analysis. In SIGIR, 2002.\n[4] John Duchi, Martin J Wainwright, and Michael Jordan. Local privacy and minimax bounds:\nSharp rates for probability estimation. In Advances in Neural Information Processing Systems,\npages 1529\u20131537, 2013.\n\n[5] John C Duchi, Michael I Jordan, and Martin J Wainwright. Privacy aware learning. In NIPS,\n\npages 1439\u20131447, 2012.\n\n[6] C. Dwork. Differential privacy: A survey of results. In Theory and Applications of Models of\n\nComputation, 2008.\n\n[7] M. Jaggi and M. Sulovsk. A simple algorithm for nuclear norm regularized problems.\n\nICML, 2010.\n\nIn\n\n[8] R. Keshavan, A. Montanari, and Sewoong Oh. Matrix completion from noisy entries. JMLR,\n\n2010.\n\n[9] R. Mathias. Perturbation bounds for the polar decomposition. BIT Numerical Mathematics,\n\n1997.\n\n[10] F. McSherry and I. Mironov. Differentially private recommender systems: Building privacy\n\ninto the net\ufb02ix prize contenders. In SIGKDD, 2009.\n\n[11] B. N. Miller, J. A. Konstan, and J. Riedl. Pocketlens: Toward a personal recommender system.\n\nACM Trans. Inf. Syst., 2004.\n\n[12] S. Negahban and M. J. Wainwright. Restricted strong convexity and weighted matrix comple-\n\ntion: optimal bounds with noise. JMLR, 2012.\n\n[13] R. Salakhutdinov and N. Srebro. Collaborative \ufb01ltering in a non-uniform world: Learning with\n\nthe weighted trace norm. In NIPS, 2010.\n\n[14] N. Srebro, J. Rennie, and T. Jaakkola. Maximum margin matrix factorization. In NIPS, 2004.\n[15] J. A. Tropp. User-friendly tail bounds for sums of random matrices. Found. Comput. Math.,\n\n2012.\n\n[16] R. Vershynin.\n\narXiv:1011.3027.\n\nIntroduction to the non-asymptotic analysis of\n\nrandom matrices.\n\n[17] Y. Xin and T. Jaakkola. Primal-dual methods for sparse constrained matrix completion. In\n\nAISTATS, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1368, "authors": [{"given_name": "Yu", "family_name": "Xin", "institution": "CSAIL MIT"}, {"given_name": "Tommi", "family_name": "Jaakkola", "institution": "Massachusetts Institute of Technology"}]}