{"title": "On the Power of Truncated SVD for General High-rank Matrix Estimation Problems", "book": "Advances in Neural Information Processing Systems", "page_first": 445, "page_last": 455, "abstract": "We show that given an estimate $\\widehat{\\mat A}$ that is close to a general high-rank positive semi-definite (PSD) matrix $\\mat A$ in spectral norm (i.e., $\\|\\widehat{\\mat A}-\\mat A\\|_2 \\leq \\delta$), the simple truncated Singular Value Decomposition of $\\widehat{\\mat A}$ produces a multiplicative approximation of $\\mat A$ in Frobenius norm. This observation leads to many interesting results on general high-rank matrix estimation problems: 1.High-rank matrix completion: we show that it is possible to recover a {general high-rank matrix} $\\mat A$ up to $(1+\\varepsilon)$ relative error in Frobenius norm from partial observations, with sample complexity independent of the spectral gap of $\\mat A$. 2.High-rank matrix denoising: we design algorithms that recovers a matrix $\\mat A$ with relative error in Frobenius norm from its noise-perturbed observations, without assuming $\\mat A$ is exactly low-rank. 3.Low-dimensional estimation of high-dimensional covariance: given $N$ i.i.d.~samples of dimension $n$ from $\\mathcal N_n(\\mat 0,\\mat A)$, we show that it is possible to estimate the covariance matrix $\\mat A$ with relative error in Frobenius norm with $N\\approx n$,improving over classical covariance estimation results which requires $N\\approx n^2$.", "full_text": "On the Power of Truncated SVD for General\n\nHigh-rank Matrix Estimation Problems\n\nSimon S. Du\n\nCarnegie Mellon University\n\nssdu@cs.cmu.edu\n\nYining Wang\n\nCarnegie Mellon University\nyiningwa@cs.cmu.edu\n\nAarti Singh\n\nCarnegie Mellon University\naartisingh@cmu.edu\n\nAbstract\n\nWe show that given an estimate \ufffdA that is close to a general high-rank positive semi-\nde\ufb01nite (PSD) matrix A in spectral norm (i.e., \ufffd\ufffdA\u2212A\ufffd2 \u2264 \u03b4), the simple truncated\nSingular Value Decomposition of \ufffdA produces a multiplicative approximation of A\n\nin Frobenius norm. This observation leads to many interesting results on general\nhigh-rank matrix estimation problems:\n1. High-rank matrix completion: we show that it is possible to recover a general\nhigh-rank matrix A up to (1 + \u03b5) relative error in Frobenius norm from partial\nobservations, with sample complexity independent of the spectral gap of A.\n\n2. High-rank matrix denoising: we design an algorithm that recovers a matrix A\nwith error in Frobenius norm from its noise-perturbed observations, without\nassuming A is exactly low-rank.\n\n3. Low-dimensional approximation of high-dimensional covariance: given N\ni.i.d. samples of dimension n from Nn(0, A), we show that it is possible to\napproximate the covariance matrix A with relative error in Frobenius norm with\nN \u2248 n, improving over classical covariance estimation results which requires\nN \u2248 n2.\n\n1\n\nIntroduction\n\nLet A be an unknown general high-rank n \u00d7 n PSD data matrix that one wishes to estimate. In many\nmachine learning applications, though A is unknown, it is relatively easy to obtain a crude estimate\n\ufffdA that is close to A in spectral norm (i.e., \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4). For example, in matrix completion\n\na simple procedure that \ufb01lls all unobserved entries with 0 and re-scales observed entries produces\nan estimate that is consistent in spectral norm (assuming the matrix satis\ufb01es a spikeness condition,\nstandard assumption in matrix completion literature). In matrix de-noising, an observation that is\ncorrupted by Gaussian noise is close to the underlying signal, because Gaussian noise is isotropic and\nhas small spectral norm. In covariance estimation, the sample covariance in low-dimensional settings\nis close to the population covariance in spectral norm under mild conditions [Bunea and Xiao, 2015].\nHowever, in most such applications it is not suf\ufb01cient to settle for a spectral norm approximation. For\nexample, in recommendation systems (an application of matrix completion) the zero-\ufb01lled re-scaled\nrating matrix is close to the ground truth in spectral norm, but it is an absurd estimator because\nmost of the estimated ratings are zero. It is hence mandatory to require a more stringent measure of\n\nwhich ensures that (on average) the estimate is close to the ground truth in an element-wise sense. A\n\nperformance. One commonly used measure is the Frobenius norm of the estimation error \ufffd\ufffdA \u2212 A\ufffdF ,\nspectral norm approximation \ufffdA is in general not a good estimate under Frobenius norm, because in\nhigh-rank scenarios \ufffd\ufffdA \u2212 A\ufffdF can be \u221an times larger than \ufffd\ufffdA \u2212 A\ufffd2.\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fIn this paper, we show that in many cases a powerful multiplicative low-rank approximation in\nFrobenius norm can be obtained by applying a simple truncated SVD procedure on a crude, easy-\nto-\ufb01nd spectral norm approximate. In particular, given the spectral norm approximation condition\n\nin Frobenius and spectral norm. To our knowledge, the best existing result under the assumption\n\n\ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4, the top-k SVD of \ufffdAk of \ufffdA multiplicatively approximates A in Frobenius norm; that\nis, \ufffd\ufffdAk \u2212 A\ufffdF \u2264 C(k, \u03b4, \u03c3k+1(A))\ufffdA \u2212 Ak\ufffdF , where Ak is the best rank-k approximation of A\n\ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4 is due to Achlioptas and McSherry [2007], who showed that \ufffd\ufffdAk \u2212 A\ufffdF \u2264\n\ufffdA \u2212 Ak\ufffdF + \u221ak\u03b4 + 2k1/4\ufffd\u03b4\ufffdAk\ufffdF , which depends on \ufffdAk\ufffdF and is not multiplicative in\n\ufffdA \u2212 Ak\ufffdF .\nBelow we summarize applications in several matrix estimation problems.\n\nHigh-rank matrix completion Matrix completion is the problem of (approximately) recovering a\ndata matrix from very few observed entries. It has wide applications in machine learning, especially\nin online recommendation systems. Most existing work on matrix completion assumes the data matrix\nis exactly low-rank [Candes and Recht, 2012, Sun and Luo, 2016, Jain et al., 2013]. Candes and Plan\n[2010], Keshavan et al. [2010] studied the problem of recovering a low-rank matrix corrupted by\nstochastic noise; Chen et al. [2016] considered sparse column corruption. All of the aforementioned\nwork assumes that the ground-truth data matrix is exactly low-rank, which is rarely true in practice.\nNegahban and Wainwright [2012] derived minimax rates of estimation error when the spectrum\nof the data matrix lies in an \ufffdq ball. Zhang et al. [2015], Koltchinskii et al. [2011] derived oracle\ninequalities for general matrix completion; however their error bound has an additional O(\u221an)\nmultiplicative factor. These results also require solving computationally expensive nuclear-norm\npenalized optimization problems whereas our method only requires solving a single truncated\nsingular value decomposition. Chatterjee et al. [2015] also used the truncated SVD estimator for\nmatrix completion. However, his bound depends on the nuclear norm of the underlying matrix\nwhich may be \u221an times larger than our result. Hardt and Wootters [2014] used a \u201csoft-de\ufb02ation\u201d\ntechnique to remove condition number dependency in the sample complexity; however, their error\nbound for general high-rank matrix completion is additive and depends on the \u201cconsecutive\u201d spectral\ngap (\u03c3k(A) \u2212 \u03c3k+1(A)), which can be small in practical settings [Balcan et al., 2016, Anderson\net al., 2015]. Eriksson et al. [2012] considered high-rank matrix completion with additional union-of-\nsubspace structures.\nIn this paper, we show that if the n \u00d7 n data matrix A satis\ufb01es \u00b50-spikeness condition, 1 then for\nany \ufffd \u2208 (0, 1), the truncated SVD of zero-\ufb01lled matrix \ufffdAk satis\ufb01es \ufffd\ufffdAk \u2212 A\ufffdF \u2264 (1 + O(\ufffd))\ufffdA \u2212\nAk\ufffdF if the sample complexity is lower bounded by \u03a9( n max{\ufffd\u22124,k2}\u00b52\n) ,which can be\n0 max{\ufffd\u22124, k2}\u03b3k(A)2 \u00b7 nrs(A) log n), where \u03b3k(A) = \u03c31(A)/\u03c3k+1(A)\nfurther simpli\ufb01ed to \u03a9(\u00b52\nis the kth-order condition number and rs(A) = \ufffdA\ufffd2\n2 \u2264 rank(A) is the stable rank of\nA. Compared to existing work, our error bound is multiplicative, gap-free, and the estimator is\ncomputationally ef\ufb01cient. 2\n\nF /\ufffdA\ufffd2\n\n\u03c3k+1(A)2\n\n0\ufffdA\ufffd2\n\nF log n\n\nGaussian noise matrix with zero mean and \u03bd2/n variance on each entry. By simple concentration\n\nHigh-rank matrix de-noising Let \ufffdA = A + E be a noisy observation of A, where E is a PSD\nresults we have \ufffd\ufffdA \u2212 A\ufffd2 = \u03bd with high probability; however, \ufffdA is in general not a good estimator\nof A in Frobenius norm when A is high-rank. Speci\ufb01cally, \ufffd\ufffdA \u2212 A\ufffdF can be as large as \u221an\u03bd.\nApplying our main result, we show that if \u03bd < \u03c3k+1(A) for some k \ufffd n, then the top-k SVD \ufffdAk of\n\ufffdA satis\ufb01es \ufffd\ufffdAk \u2212 A\ufffdF \u2264 (1 + O(\ufffd\u03bd/\u03c3k+1(A)))\ufffdA \u2212 Ak\ufffdF + \u221ak\u03bd. This suggests a form of\nbias-variance decomposition as larger rank threshold k induces smaller bias \ufffdA \u2212 Ak\ufffdF but larger\nvariance k\u03bd2. Our results generalize existing work on matrix de-noising [Donoho and Gavish, 2014,\nDonoho et al., 2013, Gavish and Donoho, 2014], which focus primarily on exact low-rank A.\n\n1n\ufffdA\ufffdmax \u2264 \u00b50\ufffdA\ufffdF ; see also De\ufb01nition 2.1.\n2We remark that our relative-error analysis does not, however, apply to exact rank-k matrix where \u03c3k+1 = 0.\nThis is because for exact rank-k matrix a bound of the form (1 + O(\ufffd))\ufffdA \u2212 Ak\ufffdF requires exact recovery of\nA, which truncated SVD cannot achieve. On the other hand, in the case of \u03c3k+1 = 0 a weaker additive-error\nbound is always applicable, as we show in Theorem 2.3.\n\n2\n\n\fLow-rank estimation of high-dimensional covariance The (Gaussian) covariance estimation\nproblem asks to estimate an n \u00d7 n PSD covariance matrix A, either in spectral or Frobenius\nnorm, from N i.i.d. samples X1,\u00b7\u00b7\u00b7 , XN \u223c N (0, A). The high-dimensional regime of covariance\nestimation, in which N \u2248 n or even N \ufffd n, has attracted enormous interest in the mathematical\nstatistics literature [Cai et al., 2010, Cai and Zhou, 2012, Cai et al., 2013, 2016]. While most existing\nwork focus on sparse or banded covariance matrices, the setting where A has certain low-rank\nstructure has seen rising interest recently [Bunea and Xiao, 2015, Kneip and Sarda, 2011]. In\nparticular, Bunea and Xiao [2015] shows that if n = O(N \u03b2) for some \u03b2 \u2265 0 then the sample\ncovariance estimator \ufffdA = 1\n\nN\ufffdN\nN \ufffd ,\n\ufffd\ufffdA \u2212 A\ufffdF = OP\ufffd\ufffdA\ufffd2re(A)\ufffd log N\n\ni=1 XiX\ufffdi satis\ufb01es\n\n(1)\n\npaper we\n\nwhere re(A) = tr(A)/\ufffdA\ufffd2 \u2264 rank(A) is the effective rank of A. For high-rank matrices where\nre(A) \u2248 n, Eq. (1) requires N = \u03a9(n2 log n) to approximate A consistently in Frobenius norm.\nIn\nre(A) max{\ufffd\u22124,k2}\u03b3k(A)2 log N\n\u2264 c for some small universal constant c > 0, then \ufffd\ufffdAk \u2212 A\ufffdF admits\na relative Frobenius-norm error bound (1+O(\ufffd))\ufffdA\u2212Ak\ufffdF with high probability. Our result allows\nreasonable approximation of A in Frobenius norm under the regime of N = \u03a9(npoly(k) log n)\nif \u03b3k = O (poly (k)), which is signi\ufb01cantly more \ufb02exible than N = \u03a9(n2 log n), though the\ndependency of \ufffd is worse than [Bunea and Xiao, 2015]. The error bound is also agnostic in nature,\nmaking no assumption on the actual or effective rank of A.\n\nreduced-rank\n\nshow that,\n\n\ufffdAk\n\nconsider\n\na\n\nthis\n\nN\n\nestimator\n\nand\n\nif\n\n1 + \u00b7\u00b7\u00b7 + \u03c32\n\nNotations For an n\u00d7n PSD matrix A, denote A = U\u03a3U\ufffd as its eigenvalue decomposition, where\nU is an orthogonal matrix and \u03a3 = diag(\u03c31,\u00b7\u00b7\u00b7 , \u03c3n) is a diagonal matrix, with eigenvalues sorted in\ndescending order \u03c31 \u2265 \u03c32 \u2265 \u00b7\u00b7\u00b7 \u2265 \u03c3n \u2265 0. The spectral norm and Frobenius norm of A are de\ufb01ned\nas \ufffdA\ufffd2 = \u03c31 and \ufffdA\ufffdF =\ufffd\u03c32\nn, respectively. Suppose u1,\u00b7\u00b7\u00b7 , un are eigenvectors as-\nsociated with \u03c31,\u00b7\u00b7\u00b7 , \u03c3n. De\ufb01ne Ak =\ufffdk\ni=1 \u03c3iuiu\ufffdi = Uk\u03a3kU\ufffdk , An\u2212k =\ufffdn\ni=k+1 \u03c3iuiu\ufffdi =\nUn\u2212k\u03a3n\u2212kU\ufffdn\u2212k and Am1:m2 =\ufffdm2\ni=m1+1 \u03c3iuiu\ufffdi = Um1:m2 \u03a3m1:m2 U\ufffdm1:m2. For a tall matrix\nU \u2208 Rn\u00d7k, we use U = Range(U) to denote the linear subspace spanned by the columns of U. For\ntwo linear subspaces U and V, we write W = U\u2295V if U\u2229V = {0} and W = {u+v : u \u2208 U, v \u2208 V}.\nFor a sequence of random variables {Xn}\u221en=1 and real-valued function f : N \u2192 R, we say\nXn = OP(f (n)) if for any \ufffd > 0, there exists N \u2208 N and C > 0 such that Pr[|Xn| \u2265 C \u00b7|f (n)|] \u2264 \ufffd\nfor all n \u2265 N.\n2 Multiplicative Frobenius-norm Approximation and Applications\n\nWe \ufb01rst state our main result, which shows that truncated SVD on a weak estimator with small\napproximation error in spectral norm leads to a strong estimator with multiplicative Frobenius-norm\nerror bound. We remark that truncated SVD in general has time complexity\n\nO\ufffdmin\ufffdn2k, nnz\ufffd\ufffdA\ufffd + npoly (k)\ufffd\ufffd ,\n\nwhere nnz(\ufffdA) is the number of non-zero entries in \ufffdA, and the time complexity is at most linear in\nmatrix sizes when k is small. We refer readers to [Allen-Zhu and Li, 2016] for details.\nTheorem 2.1. Suppose A is an n \u00d7 n PSD matrix with eigenvalues \u03c31(A) \u2265 \u00b7\u00b7\u00b7 \u2265 \u03c3n(A) \u2265 0,\nand a symmetric matrix \ufffdA satis\ufb01es \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4 = \ufffd2\u03c3k+1(A) for some \ufffd \u2208 (0, 1/4]. Let Ak and\n\ufffdAk be the best rank-k approximations of A and \ufffdA. Then\nRemark 2.1. Note when \ufffd = O(1/\u221ak) we obtain an (1 + O (\ufffd)) error bound.\nRemark 2.2. This theorem only studies PSD matrices. Using similar arguments in the proof, we\nbelieve similar results for general asymmetric matrices can be obtained as well.\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 (1 + 32\ufffd)\ufffdA \u2212 Ak\ufffdF + 102\u221a2k\ufffd2\ufffdA \u2212 Ak\ufffd2.\n\n(2)\n\n3\n\n\f(3)\n\nAchlioptas and McSherry [2007], who showed that\n\nTo our knowledge, the best existing bound for \ufffd\ufffdAk \u2212 A\ufffdF assuming \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4 is due to\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 \ufffdA \u2212 Ak\ufffdF + \ufffd(\ufffdA \u2212 A)k\ufffdF + 2\ufffd\ufffd(\ufffdA \u2212 A)k\ufffdF\ufffdAk\ufffdF\n\u2264 \ufffdA \u2212 Ak\ufffdF + \u221ak\u03b4\ufffdA \u2212 Ak\ufffd2 + 2k1/4\u221a\u03b4\ufffd\ufffdAk\ufffdF .\n\nCompared to Theorem 2.1, Eq. (3) is not relative because the third term 2k1/4\ufffd\ufffdAk\ufffdF depends on\nthe k largest eigenvalues of A, which could be much larger than the remainder term \ufffdA \u2212 Ak\ufffdF . In\ncontrast, Theorem 2.1, together with Remark 2.1, shows that \ufffd\ufffdAk \u2212 A\ufffdF could be upper bounded\nby a small factor multiplied with the remainder term \ufffdA \u2212 Ak\ufffdF .\nWe also provide a gap-dependent version.\nTheorem 2.2. Suppose A is an n \u00d7 n PSD matrix with eigenvalues \u03c31(A) \u2265 \u00b7\u00b7\u00b7 \u2265 \u03c3n(A) \u2265 0,\nand a symmetric matrix \ufffdA satis\ufb01es \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4 = \ufffd (\u03c3k(A) \u2212 \u03c3k+1(A)) for some \ufffd \u2208 (0, 1/4].\nLet Ak and \ufffdAk be the best rank-k approximations of A and \ufffdA. Then\nIf A is an exact rank-k matrix, Theorem 2.2 implies that truncated SVD gives an \ufffd\u221a2k\u03c3k error\napproximation in Frobenius norm, which has been established by many previous works [Yi et al.,\n2016, Tu et al., 2015, Wang et al., 2016].\nBefore we proceed to the applications and proof of Theorem 2.1, we \ufb01rst list several examples of A\nwith classical distribution of eigenvalues and discuss how Theorem 2.1 could be applied to obatin\ngood Frobenius-norm approximations of A. We begin with the case where eigenvalues of A have a\npolynomial decay rate (i.e., power law). Such matrices are ubiquitous in practice [Liu et al., 2015].\nCorollary 2.1 (Power-law spectral decay). Suppose \ufffd \u02c6A \u2212 A\ufffd2 \u2264 \u03b4 for some \u03b4 \u2208 (0, 1/2] and\n\u03c3j(A) = j\u2212\u03b2 for some \u03b2 > 1/2. Set k = \ufffdmin{C1\u03b4\u22121/\u03b2, n} \u2212 1\ufffd. If k \u2265 1 then\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 \ufffdA \u2212 Ak\ufffdF + 102\u221a2k\ufffd (\u03c3k(A) \u2212 \u03c3k+1(A)) .\n\n(4)\n\nwhere C1, C\ufffd1 > 0 are constants that only depend on \u03b2.\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 C\ufffd1 \u00b7 max\ufffd\u03b4\n\n2\u03b2\u22121\n2\u03b2 , n\u2212 2\u03b2\u22121\n\n2\u03b2 \ufffd ,\n\nWe remark that the assumption \u03c3j(A) = j\u2212\u03b2 implies that the eigenvalues lie in an \ufffdq ball for\nj=1 \u03c3j(A)q = O(1). The error bound in Corollary 2.1 matches the minimax\nrate (derived by Negahban and Wainwright [2012]) for matrix completion when the spectrum is\n\nq = 1/\u03b2; that is,\ufffdn\nconstrained in an \ufffdq ball, by replacing \u03b4 with\ufffdn/N where N is the number of observed entries.\nNext, we consider the case where eigenvalues satisfy a faster decay rate.\nCorollary 2.2 (Exponential spectral decay). Suppose \ufffd \u02c6A \u2212 A\ufffd2 \u2264 \u03b4 for some \u03b4 \u2208 (0, e\u221216) and\n\u03c3j(A) = exp{\u2212cj} for some c > 0. Set k = \ufffdmin{c\u22121 log(1/\u03b4) \u2212 c\u22121 log log(1/\u03b4), n} \u2212 1\ufffd. If\nk \u2265 1 then\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 C\ufffd2 \u00b7 max\ufffd\u03b4\ufffdlog(1/\u03b4)3, n1/2 exp(\u2212cn)\ufffd ,\n\nwhere C\ufffd2 > 0 is a constant that only depends on c.\n\nBoth corollaries are proved in the appendix. The error bounds in both Corollaries 2.1 and 2.2 are\n\nsigni\ufb01cantly better than the trivial estimate \ufffdA, which satis\ufb01es \ufffd\ufffdA \u2212 A\ufffdF \u2264 n1/2\u03b4. We also remark\n\nthat the bound in Corollary 2.1 cannot be obtained by a direct application of the weaker bound Eq. (3),\nwhich yields a \u03b4\nWe next state results that are consequences of Theorem 2.1 in several matrix estimation problems.\n\n2\u03b2\u22121 bound.\n\n\u03b2\n\n2.1 High-rank Matrix Completion\nSuppose A is a high-rank n \u00d7 n PSD matrix that satis\ufb01es \u00b50-spikeness condition de\ufb01ned as follows:\n\n4\n\n\fDe\ufb01nition 2.1 (Spikeness condition). An n \u00d7 n PSD matrix A satis\ufb01es \u00b50-spikeness condition if\nn\ufffdA\ufffdmax \u2264 \u00b50\ufffdA\ufffdF , where \ufffdA\ufffdmax = max1\u2264i,j\u2264n |Aij| is the max-norm of A.\nSpikeness condition makes uniform sampling of matrix entries powerful in matrix completion\nproblems. If A is exactly low rank, the spikeness condition is implied by an upper bound on\nmax1\u2264i\u2264n \ufffde\ufffdi Uk\ufffd2, which is the standard incoherence assumption on the top-k space of A [Candes\nand Recht, 2012]. For general high-rank A, the spikeness condition is implied by a more restrictive\nincoherence condition that imposes an upper bound on max1\u2264i\u2264n \ufffde\ufffdi Un\u2212k\ufffd2 and \ufffdAn\u2212k\ufffdmax,\nwhich are assumptions adopted in [Hardt and Wootters, 2014].\nSuppose \ufffdA is a symmetric re-scaled zero-\ufb01lled matrix of observed entries. That is,\n\n(5)\n\n[\ufffdA]ij =\ufffd Aij/p, with probability p;\n\nwith probability 1 \u2212 p;\n\n0,\n\nHere p \u2208 (0, 1) is a parameter that controls the probability of observing a particular entry in A,\ncorresponding to a sample complexity of O(n2p). Note that both \ufffdA and A are symmetric so we only\nspecify the upper triangle of \ufffdA. By a simple application of matrix Bernstein inequality [Mackey\net al., 2014], one can show \ufffdA is close to A in spectral norm when A satis\ufb01es \u00b50-spikeness. Here we\ncite a lemma from [Hardt, 2014] to formally establish this observation:\nLemma 2.1 (Corollary of [Hardt, 2014], Lemma A.3). Under the model of Eq. (5) and \u00b50-spikeness\ncondition of A, for t \u2208 (0, 1) it holds with probability at least 1 \u2212 t that\n\n\u22001 \u2264 i \u2264 j \u2264 n.\n\n\ufffd\ufffdA \u2212 A\ufffd2 \u2264 O\uf8eb\uf8edmax\uf8f1\uf8f2\uf8f3\n\n\ufffd \u00b52\n\n0\ufffdA\ufffd2\nF log(n/t)\nnp\n\n,\n\n\u00b50\ufffdA\ufffdF log(n/t)\n\nnp\n\n\uf8f6\uf8f8 .\n\n\uf8fc\uf8fd\uf8fe\n\nif\n\nF log(n/t)\n\n0\ufffdA\ufffd2\n\nF log(n/t)\n\nn\u03c3k+1(A)2 \ufffd .\n\n0 max{\ufffd\u22124, k2}\ufffdA\ufffd2\nn\u03c3k+1(A)2\n\nLet \ufffdAk be the best rank-k approximation of \ufffdA in Frobenius/spectral norm. Applying Theorem 2.1\nand 2.2 we obatin the following result:\nTheorem 2.3. Fix t \u2208 (0, 1). Then with probability 1 \u2212 t we have\np = \u03a9\ufffd \u00b52\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 O(\u221ak) \u00b7 \ufffdA \u2212 Ak\ufffdF\nFurthermore, for \ufb01xed \ufffd \u2208 (0, 1/4], with probability 1 \u2212 t we have\nif p = \u03a9\ufffd \u00b52\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 (1 + O(\ufffd))\ufffdA \u2212 Ak\ufffdF\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 \ufffdA \u2212 Ak\ufffdF + \ufffd (\u03c3k (A) \u2212 \u03c3k+1 (A)) if p = \u03a9\ufffd \u00b52\n\n\ufffd\nn\ufffd2 (\u03c3k(A) \u2212 \u03c3k+1(A))2\ufffd .\nAs a remark, because \u00b50 \u2265 1 and \ufffdA\ufffdF /\u03c3k+1(A) \u2265 \u221ak always hold, the sample complexity is\nFor example, if A has stable rank rs(A) = r then \ufffd\ufffdAk \u2212 A\ufffdF has an O(\u221ak) multiplicative error\n\nlower bounded by \u03a9(nk log n), the typical sample complexity in noiseless matrix completion. In\nthe case of high rank A, the results in Theorem 2.3 are the strongest when A has small stable rank\n2 and the top-k condition number \u03b3k(A) = \u03c31(A)/\u03c3k+1(A) is not too large.\nrs(A) = \ufffdA\ufffd2\nbound with sample complexity \u03a9(\u00b52\n0\u03b3k(A)2 \u00b7 nr log n); or an (1 + O(\ufffd)) relative error bound with\nsample complexity \u03a9(\u00b52\n0 max{\ufffd\u22124, k2}\u03b3k(A)2 \u00b7 nr log n). Finally, when \u03c3k+1(A) is very small\nand the \u201cgap\u201d \u03c3k(A) \u2212 \u03c3k+1(A) is large, a weaker additive-error bound is applicable with sample\ncomplexity independent of \u03c3k+1(A)\u22121.\nComparing with previous works, if\u2018 the gap (1 \u2212 \u03c3k+1/\u03c3k) is of order \ufffd, then sample complexity\nof[Hardt, 2014] Theorem 1.2 and [Hardt and Wootters, 2014] Theorem 1 scale with 1/\ufffd7. Our result\nimproves their results to the scaling of 1/\ufffd4 with a much simpler algorithm (truncated SVD).\n\nF /\ufffdA\ufffd2\n\n0k\ufffdA\ufffd2\n\nF log(n/t)\n\n5\n\n\fWe refer the readers to [Gavish and Donoho, 2014] for a list of references that shows the ubiquitous\napplication of matrix de-noising in scienti\ufb01c \ufb01elds.\n\n2.2 High-rank matrix de-noising\nLet A be an n\u00d7 n PSD signal matrix and E a symmetric random Gaussian matrix with zero mean and\ni.i.d.\u223c N (0, \u03bd2/n) for 1 \u2264 i \u2264 j \u2264 n and Eij = Eji. De\ufb01ne \ufffdA = A + E.\n\u03bd2/n variance. That is, Eij\nThe matrix de-noising problem is then to recover the signal matrix A from noisy observations \ufffdA.\nIt is well-known by concentration results of Gaussian random matrices, that \ufffd\ufffdA \u2212 A\ufffd2 = \ufffdE\ufffd2 =\nOP(\u03bd). Let \ufffdAk be the best rank-k approximation of \ufffdA in Frobenius/spectral norm. Applying\nTheorem 2.1, we immediately have the following result:\nTheorem 2.4. There exists an absolute constant c > 0 such that, if \u03bd < c \u00b7 \u03c3k+1(A) for some\n1 \u2264 k < n, then with probability at least 0.8 we have that\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264\ufffd1 + O\ufffd\ufffd \u03bd\n\n\u03c3k+1(A)\ufffd\ufffd\ufffdA \u2212 Ak\ufffdF + O(\u221ak\u03bd).\n\n(6)\n\nEq. (6) can be understood from a classical bias-variance tradeoff perspective:\nthe \ufb01rst (1 +\nO(\ufffd\u03bd/\u03c3k+1(A)))\ufffdA \u2212 Ak\ufffdF acts as a bias term, which decreases as we increase cut-off rank k,\ncorresponding to a more complicated model; on the other hand, the second O(\u221ak\u03bd) term acts as the\n(square root of) variance, which does not depend on the signal A and increases with k.\n\n2000]), its statistical power in high-dimensional regimes where n and N are comparable are highly\n\n2.3 Low-rank estimation of high-dimensional covariance\nSuppose A is an n \u00d7 n PSD matrix and X1,\u00b7\u00b7\u00b7 , XN are i.i.d. samples drawn from the multivariate\nGaussian distribution Nn(0, A). The question is to estimate A from samples X1,\u00b7\u00b7\u00b7 , XN . A\ncommon estimator is the sample covariance \ufffdA = 1\ni=1 XiX\ufffdi . While in low-dimensional\nregimes (i.e., n \ufb01xed and N \u2192 \u221e) the asymptotic ef\ufb01ciency of \ufffdA is obvious (cf. [Van der Vaart,\nnon-trivial. Below we cite results by Bunea and Xiao [2015] for estimation error \ufffd\ufffdA\u2212 A\ufffd\u03be, \u03be = 2/F\nwhen n is not too large compared to N:\nLemma 2.2 (Bunea and Xiao [2015]). Suppose n = O(N \u03b2) for some \u03b2 \u2265 0 and let re(A) =\ntr(A)/\ufffdA\ufffd2 denote the effective rank of the covariance A. Then the sample covariance \ufffdA =\nN\ufffdN\n\ni=1 XiX\ufffdi satis\ufb01es\n\nN\ufffdN\n\n1\n\n(7)\n\nand\n\nN \ufffd\n\ufffd\ufffdA \u2212 A\ufffdF = OP\ufffd\ufffdA\ufffd2re(A)\ufffd log N\n\ufffd\ufffdA \u2212 A\ufffd2 = OP\ufffd\ufffdA\ufffd2 max\ufffd\ufffd re(A) log(N n)\n\nN\n\n,\n\nre(A) log(N n)\n\nN\n\n\ufffd\ufffd .\n\n(8)\n\nLet \ufffdAk be the best rank-k approximation of \ufffdA in Frobenius/spectral norm. Applying Theorem 2.1\nand 2.2 together with Eq. (8), we immediately arrive at the following theorem.\nTheorem 2.5. Fix \ufffd \u2208 (0, 1/4] and 1 \u2264 k < n. Recall that re(A) = tr(A)/\ufffdA\ufffd2 and \u03b3k(A) =\n\u03c31(A)/\u03c3k+1(A). There exists a universal constant c > 0 such that, if\n\nre(A) max{\ufffd\u22124, k2}\u03b3k(A)2 log(N )\n\nN\n\n\u2264 c\n\nthen with probability at least 0.8,\n\nand if\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 (1 + O(\ufffd))\ufffdA \u2212 Ak\ufffdF\nN \ufffd2 (\u03c3k (A) \u2212 \u03c3k+1 (A))2 \u2264 c\n\nre(A)k\ufffdA\ufffd2\n\n2 log(N )\n\n6\n\n\fthen with probability at least 0.8,\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 \ufffdA \u2212 Ak\ufffdF + \ufffd (\u03c3k (A) \u2212 \u03c3k+1 (A)) .\n\nTheorem 2.5 shows that it is possible to obtain a reasonable Frobenius-norm approximation of \ufffdA\n\nby truncated SVD in the asymptotic regime of N = \u03a9(re(A)poly(k) log N ), which is much more\n\ufb02exible than Eq. (7) that requires N = \u03a9(re(A)2 log N ).\n\n3 Proof Sketch of Theorem 2.1\n\nIn this section we give a proof sketch of Theorem 2.1. The proof of Theorem 2.2 is similar and less\nchallenging so we defer it to appendix. We defer proofs of technical lemmas to Section A.\n\nBecause both \ufffdAk and Ak are low-rank, \ufffd\ufffdAk \u2212 Ak\ufffdF is upper bounded by an O(\u221ak) factor of\n\ufffd\ufffdAk \u2212 Ak\ufffd2. From the condition that \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \u03b4, a straightforward approach to upper\nbound \ufffd\ufffdAk \u2212 Ak\ufffd2 is to consider the decomposition \ufffd\ufffdAk \u2212 Ak\ufffd2 \u2264 \ufffd\ufffdA \u2212 A\ufffd2 + 2\ufffdUkU\ufffdk \u2212\n\ufffdUk\ufffdU\ufffdk \ufffd2\ufffd\ufffdAk\ufffd2, where UkU\ufffdk and \ufffdUk\ufffdU\ufffdk are projection operators onto the top-k eigenspaces\nof A and \ufffdA, respectively. Such a naive approach, however, has two major disadvantages. First,\nthe upper bound depends on \ufffd\ufffdAk\ufffd2, which is additive and may be much larger than \ufffd\ufffdA \u2212 A\ufffd2.\nPerhaps more importantly, the quantity \ufffdUkU\ufffdk \u2212 \ufffdUk\ufffdU\ufffdk \ufffd2 depends on the \u201cconsecutive\u201d sepctral\ngap (\u03c3k(A) \u2212 \u03c3k+1(A)), which could be very small for large matrices.\nThe key idea in the proof of Theorem 2.1 is to \ufb01nd an \u201cenvelope\u201d m1 \u2264 k \u2264 m2 in the spectrum of\nA surrounding k, such that the eigenvalues within the envelope are relatively close. De\ufb01ne\n\nm1 = argmax0\u2264j\u2264k{\u03c3j(A) \u2265 (1 + 2\ufffd)\u03c3k+1(A)};\nm2 = argmaxk\u2264j\u2264n{\u03c3j(A) \u2265 \u03c3k(A) \u2212 2\ufffd\u03c3k+1(A)},\n\nW = argmaxdim(W)=k\u2212m1,W\u2208Um1 :m2\n\n\u03c3k\u2212m1\ufffdW\ufffd\ufffdUk\ufffd .\n\nNow de\ufb01ne\n\ninequality we can obtain the following result.\n\nwhere we let \u03c30 (A) = \u221e for convenience. Let Um, \ufffdUm be basis of the top m-dimensional linear\nsubspaces of A and \ufffdA, respectively. Also denote Un\u2212m and \ufffdUn\u2212m as basis of the orthogonal\ncomplement of Um and \ufffdUm. By asymmetric Davis-Kahan inequality (Lemma C.1) and Wely\u2019s\nLemma 3.1. If \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A) for \ufffd \u2208 (0, 1) then \ufffd\ufffdU\ufffdn\u2212kUm1\ufffd2,\ufffd\ufffdU\ufffdk Un\u2212m2\ufffd2 \u2264 \ufffd.\nLet Um1:m2 be the linear subspace of A associated with eigenvalues \u03c3m1+1(A),\u00b7\u00b7\u00b7 , \u03c3m2 (A).\nIntuitively, we choose a (k \u2212 m1)-dimensional linear subspace in Um1:m2 that is \u201cmost aligned\u201d with\nthe top-k subspace \ufffdUk of \ufffdA. Formally, de\ufb01ne\nW is then a d \u00d7 (k \u2212 m1) matrix with orthonormal columns that corresponds to a basis of W. W is\ncarefully constructed so that it is closely aligned with \ufffdUk, yet still lies in Uk. In particular, Lemma\n3.2 shows that sin \u2220(W,\ufffdUk) = \ufffd\ufffdU\ufffdn\u2212kW\ufffd2 is upper bounded by \ufffd.\nLemma 3.2. If \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A) for \ufffd \u2208 (0, 1) then \ufffd\ufffdU\ufffdn\u2212kW\ufffd2 \u2264 \ufffd.\n\ufffdA = Am1 + WW\ufffdAWW\ufffd.\nWe use \ufffdA as the \u201creference matrix\" because we can decompose \ufffd\ufffdAk \u2212 A\ufffdF as\n\ufffdAk and \ufffdA have rank at most k. The following lemma bounds the \ufb01rst term.\nLemma 3.3. If \ufffd\ufffdA\u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A)2 for \ufffd \u2208 (0, 1/4] then \ufffdA\u2212\ufffdA\ufffdF \u2264 (1 + 32\ufffd)\ufffdA\u2212 Ak\ufffdF .\n\n\ufffd\ufffdAk \u2212 A\ufffdF \u2264 \ufffdA \u2212 \ufffdA\ufffdF + \ufffd\ufffdAk \u2212 \ufffdA\ufffdF \u2264 \ufffdA \u2212 \ufffdA\ufffdF + \u221a2k\ufffd\ufffdAk \u2212 \ufffdA\ufffd2\n\n(9)\nand bound each term on the right hand side separately. Here the last inequality holds because both\n\n7\n\n\fv = \ufffdUk\ufffdU\ufffdk v + \ufffdU\ufffdU\ufffd\ufffdUn\u2212k\ufffdU\ufffdn\u2212kv\n= \ufffdU\ufffdU\ufffdv + \ufffdUk\ufffdU\ufffdk \ufffdU\u22a5\ufffdU\ufffd\n\n\u22a5v.\n\n(10)\n(11)\n\nThe proof of this lemma relies Pythagorean theorem and Poincar\u00e9 separation theorem. Let Um1:m2\nbe the (m2 \u2212 m1)-dimensional linear subspace such that Um2 = Um1 \u2295 Um1:m2. De\ufb01ne Am1:m2 =\nUm1:m2 \u03a3m1:m2 U\ufffdm1:m2, where \u03a3m1:m2 = diag(\u03c3m1+1(A),\u00b7\u00b7\u00b7 , \u03c3m2 (A)) and Um1:m2 is an or-\nthonormal basis associated with Um1:m2. Applying Pythagorean theorem (Lemma C.2), we can\ndecompose\n\nF + \ufffdAm1:m2\ufffd2\n\n\ufffdA \u2212 \ufffdA\ufffd2\nApplying Poincar\u00e9 separation theorem (Lemma C.3) where X = \u03a3m1:m2 and P = U\ufffdm1:m2W, we\nhave \ufffdW\ufffdAm1:m2W\ufffd2\nj=m1+m2\u2212k+1 \u03c3j(A)2. With some\nroutine algebra we can prove Lemma 3.3.\nTo bound the second term of Eq. (9) we use the following lemma.\n\nj=m2\u2212k+1 \u03c3j(Am1:m2 )2 = \ufffdm2\n\nF = \ufffdA \u2212 Am2\ufffd2\nF \u2265 \ufffdm2\u2212m1\n\nF \u2212 \ufffdWW\ufffdAm1:m2WW\ufffd\ufffd2\nF .\n\nLemma 3.4. If \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A) for \ufffd \u2208 (0, 1/4] then \ufffd\ufffdAk \u2212 \ufffdA\ufffd2 \u2264 102\ufffd2\ufffdA \u2212 Ak\ufffd2.\nThe proof of Lemma 3.4 relies on the low-rankness of \ufffdAk and \ufffdA. Recall the de\ufb01nition that\n\ufffdU = Range(\ufffdA) and \ufffdU\u22a5 = Null(\ufffdA). Consider \ufffdv\ufffd2 = 1 such that v\ufffd(\ufffdAk \u2212 \ufffdA)v = \ufffd\ufffdAk \u2212 \ufffdA\ufffd2.\nBecause v maximizes v\ufffd(\ufffdAk\u2212\ufffdA)v over all unit-length vectors, it must lie in the range of\ufffd\ufffdAk \u2212 \ufffdA\ufffd\nthat v = v1 + v2 where v1 \u2208 Range(\ufffdAk) = \ufffdUk and v2 \u2208 Range(\ufffdA) = \ufffdU. Subsequently, we have\n\nbecause otherwise the component outside the range will not contribute. Therefore, we can choose v\n\nthat\n\nConsider the following decomposition:\n\n\ufffd\ufffd\ufffdv\ufffd(\ufffdAk \u2212 \ufffdA)v\ufffd\ufffd\ufffd \u2264\ufffd\ufffd\ufffdv\ufffd(\ufffdA \u2212 A)v\ufffd\ufffd\ufffd +\ufffd\ufffd\ufffdv\ufffd(\ufffdAk \u2212 \ufffdA)v\ufffd\ufffd\ufffd +\ufffd\ufffd\ufffdv\ufffd(A \u2212 \ufffdA)v\ufffd\ufffd\ufffd .\n\nThe \ufb01rst term |v\ufffd(\ufffdA \u2212 A)v| is trivially upper bounded by \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A). The second\nand the third term can be bounded by Wely\u2019s inequality (Lemma C.4) and basic properties of \ufffdA\n\n(Lemma A.3). See Section A for details.\n\n4 Discussion\n\nWe mention two potential directions to further extend results of this paper.\n\n4.1 Model selection for general high-rank matrices\n\nThe validity of Theorem 2.1 depends on the condition \ufffd\ufffdA \u2212 A\ufffd2 \u2264 \ufffd2\u03c3k+1(A), which could be\n\nhard to verify if \u03c3k+1(A) is unknown and dif\ufb01cult to estimate. Furthermore, for general high-rank\nmatrices, the model selection problem of determining an appropriate (or even optimal) cut-off rank k\nrequires knowledge of the distribution of the entire spectrum of an unknown data matrix, which is\neven more challenging to obtain.\nOne potential approach is to impose a parametric pattern of decay of the eigenvalues (e.g., polynomial\nand exponential decay), and to estimate a small set of parameters (e.g., degree of polynomial) from\n\nanalysis, similar to the examples in Corollaries 2.1 and 2.2. Another possibility is to use repeated\nsampling techniques such as boostrap in a stochastic problem (e.g., matrix de-noising) to estimate the\n\nthe noisy observations \ufffdA. Afterwards, the optimal cut-off rank k could be determined by a theoretical\n\u201cbias\u201d term \ufffdA \u2212 Ak\ufffdF for different k, as the variance term \u221ak\u03bd is known or easy to estimate.\n4.2 Minimax rates for polynomial spectral decay\n\nConsider the class of PSD matrices whose eigenvalues follow a polynomial (power-law) decay:\n\u0398(\u03b2, n) = {A \u2208 Rn\u00d7n : A \ufffd 0, \u03c3j(A) = j\u2212\u03b2}. We are interested in the following minimax rates\nfor completing or de-noising matrices in \u0398(\u03b2, n):\n\n8\n\n\fQuestion 1 (Completion of \u0398(\u03b2, n)). Fix n \u2208 N, p \u2208 (0, 1) and de\ufb01ne N = pn2. For M \u2208 \u0398(\u03b2, n),\nlet \ufffdAij = Mij with probability p and \ufffdAij = 0 with probability 1 \u2212 p. Also let \u039b(\u00b50, n) = {M \u2208\nRn\u00d7n : n\ufffdM\ufffdmax \u2264 \u00b50\ufffdM\ufffdF} be the class of all non-spiky matrices. Determine\nE\ufffd\ufffdM \u2212 M\ufffd2\nQuestion 2 (De-noising of \u0398(\u03b2, n)). Fix n \u2208 N, \u03bd > 0 and let \ufffdA = M + \u03bd/\u221anZ, where Z is a\n\nsymmetric matrices with i.i.d. standard Normal random variables on its upper triangle. Determine\n\nR1(\u00b50, \u03b2, n, N ) := inf\n\nM\u2208\u0398(\u03b2,n)\u2229\u039b(\u00b50,n)\n\n\ufffdA\ufffd\u2192\ufffdM\n\nsup\n\nF .\n\nR2(\u03bd, \u03b2, n) := inf\n\nsup\n\nM\u2208\u0398(\u03b2,n)\n\n\ufffdA\ufffd\u2192\ufffdM\n\nF .\n\nE\ufffd\ufffdM \u2212 M\ufffd2\n\nCompared to existing settings on matrix completion and de-noising, we believe \u0398(\u03b2, n) is a more\nnatural matrix class which allows for general high-rank matrices, but also imposes suf\ufb01cient spectral\ndecay conditions so that spectrum truncation algorithms result in signi\ufb01cant bene\ufb01ts. Based on\nCorollary 2.1 and its matching lower bounds for a larger \ufffdp class [Negahban and Wainwright, 2012],\nwe make the following conjecture:\nConjecture 4.1. For \u03b2 > 1/2 and \u03bd not too small, we conjecture that\n\nR1(\u00b50, \u03b2, n, N ) \ufffd C(\u00b50) \u00b7\ufffd n\n\nN\ufffd 2\u03b2\u22121\n\n2\u03b2\n\nwhere C(\u00b50) > 0 is a constant that depends only on \u00b50.\n\nand\n\nR2(\u03bd, \u03b2, n) \ufffd\ufffd\u03bd2\ufffd 2\u03b2\u22121\n\n2\u03b2\n\n,\n\n5 Acknowledgements\n\nS.S.D. was supported by ARPA-E Terra program. Y.W. and A.S. were supported by the NSF CAREER\ngrant IIS-1252412.\n\nReferences\nDimitris Achlioptas and Frank McSherry. Fast computation of low-rank matrix approximations.\n\nJournal of the ACM, 54(2):9, 2007.\n\nZeyuan Allen-Zhu and Yuanzhi Li. Even faster svd decomposition yet without agonizing pain. In\n\nAdvances in Neural Information Processing Systems, pages 974\u2013982, 2016.\n\nDavid Anderson, Simon Du, Michael Mahoney, Christopher Melgaard, Kunming Wu, and Ming Gu.\nSpectral gap error bounds for improving cur matrix decomposition and the nystr\u00f6m method. In\nArti\ufb01cial Intelligence and Statistics, pages 19\u201327, 2015.\n\nMaria Florina Balcan, Simon S Du, Yining Wang, and Adams Wei Yu. An improved gap-dependency\nanalysis of the noisy power method. In 29th Annual Conference on Learning Theory, pages\n284\u2013309, 2016.\n\nFlorentina Bunea and Luo Xiao. On the sample covariance matrix estimator of reduced effective rank\n\npopulation matrices, with applications to fpca. Bernoulli, 21(2):1200\u20131230, 2015.\n\nT Tony Cai and Harrison H Zhou. Optimal rates of convergence for sparse covariance matrix\n\nestimation. The Annals of Statistics, 40(5):2389\u20132420, 2012.\n\nT Tony Cai, Cun-Hui Zhang, and Harrison H Zhou. Optimal rates of convergence for covariance\n\nmatrix estimation. The Annals of Statistics, 38(4):2118\u20132144, 2010.\n\nT Tony Cai, Zhao Ren, and Harrison H Zhou. Optimal rates of convergence for estimating toeplitz\n\ncovariance matrices. Probability Theory and Related Fields, 156(1-2):101\u2013143, 2013.\n\nT Tony Cai, Zhao Ren, and Harrison H Zhou. Estimating structured high-dimensional covariance and\nprecision matrices: Optimal rates and adaptive estimation. Electronic Journal of Statistics, 10(1):\n1\u201359, 2016.\n\n9\n\n\fEmmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Commu-\n\nnications of the ACM, 55(6):111\u2013119, 2012.\n\nEmmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):\n\n925\u2013936, 2010.\n\nSourav Chatterjee et al. Matrix estimation by universal singular value thresholding. The Annals of\n\nStatistics, 43(1):177\u2013214, 2015.\n\nYudong Chen, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. Matrix completion with\ncolumn manipulation: Near-optimal sample-robustness-rank tradeoffs. IEEE Transactions on\nInformation Theory, 62(1):503\u2013526, 2016.\n\nDavid Donoho and Matan Gavish. Minimax risk of matrix denoising by singular value thresholding.\n\nThe Annals of Statistics, 42(6):2413\u20132440, 2014.\n\nDavid L Donoho, Matan Gavish, and Andrea Montanari. The phase transition of matrix recovery\nfrom gaussian measurements matches the minimax mse of matrix denoising. Proceedings of the\nNational Academy of Sciences, 110(21):8405\u20138410, 2013.\n\nBrian Eriksson, Laura Balzano, and Robert D Nowak. High-rank matrix completion. In AISTATS,\n\npages 373\u2013381, 2012.\n\nMatan Gavish and David L Donoho. The optimal hard threshold for singular values is 4/\u221a3. IEEE\n\nTransactions on Information Theory, 60(8):5040\u20135053, 2014.\n\nMoritz Hardt. Understanding alternating minimization for matrix completion. In Foundations of\nComputer Science (FOCS), 2014 IEEE 55th Annual Symposium on, pages 651\u2013660. IEEE, 2014.\n\nMoritz Hardt and Mary Wootters. Fast matrix completion without the condition number. In COLT,\n\npages 638\u2013678, 2014.\n\nPrateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating\nminimization. In Proceedings of the forty-\ufb01fth annual ACM symposium on Theory of computing,\npages 665\u2013674. ACM, 2013.\n\nRaghunandan H Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few\n\nentries. Information Theory, IEEE Transactions on, 56(6):2980\u20132998, 2010.\n\nAlois Kneip and Pascal Sarda. Factor models and variable selection in high-dimensional regression\n\nanalysis. The Annals of Statistics, pages 2410\u20132447, 2011.\n\nVladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclear-norm penalization and\noptimal rates for noisy low-rank matrix completion. The Annals of Statistics, pages 2302\u20132329,\n2011.\n\nZiqi Liu, Yu-Xiang Wang, and Alexander Smola. Fast differentially private matrix factorization. In\nProceedings of the 9th ACM Conference on Recommender Systems, pages 171\u2013178. ACM, 2015.\n\nLester Mackey, Michael I Jordan, Richard Y Chen, Brendan Farrell, and Joel A Tropp. Matrix\nconcentration inequalities via the method of exchangeable pairs. The Annals of Probability, 42(3):\n906\u2013945, 2014.\n\nSahand Negahban and Martin J Wainwright. Restricted strong convexity and weighted matrix\ncompletion: Optimal bounds with noise. The Journal of Machine Learning Research, 13(1):\n1665\u20131697, 2012.\n\nRuoyu Sun and Zhi-Quan Luo. Guaranteed matrix completion via non-convex factorization. IEEE\n\nTransactions on Information Theory, 62(11):6535\u20136579, 2016.\n\nStephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Benjamin Recht. Low-rank\nsolutions of linear matrix equations via procrustes \ufb02ow. arXiv preprint arXiv:1507.03566, 2015.\n\nAad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.\n\n10\n\n\fLingxiao Wang, Xiao Zhang, and Quanquan Gu. A uni\ufb01ed computational and statistical framework\n\nfor nonconvex low-rank matrix estimation. arXiv preprint arXiv:1610.05275, 2016.\n\nXinyang Yi, Dohyung Park, Yudong Chen, and Constantine Caramanis. Fast algorithms for robust pca\nvia gradient descent. In Advances in Neural Information Processing Systems, pages 4152\u20134160,\n2016.\n\nLijun Zhang, Tianbao Yang, Rong Jin, and Zhi-Hua Zhou. Analysis of nuclear norm regularization\n\nfor full-rank matrix completion. arXiv preprint arXiv:1504.06817, 2015.\n\n11\n\n\f", "award": [], "sourceid": 325, "authors": [{"given_name": "Simon", "family_name": "Du", "institution": "Carnegie Mellon University"}, {"given_name": "Yining", "family_name": "Wang", "institution": "Carnegie Mellon University"}, {"given_name": "Aarti", "family_name": "Singh", "institution": "CMU"}]}