{"title": "Sparse and Low-Rank Tensor Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 2548, "page_last": 2556, "abstract": "Motivated by the problem of robust factorization of a low-rank tensor, we study the question of sparse and low-rank tensor decomposition. We present an efficient computational algorithm that modifies Leurgans' algoirthm for tensor factorization. Our method relies on a reduction of the problem to sparse and low-rank matrix decomposition via the notion of tensor contraction. We use well-understood convex techniques for solving the reduced matrix sub-problem which then allows us to perform the full decomposition of the tensor. We delineate situations where the problem is recoverable and provide theoretical guarantees for our algorithm. We validate our algorithm with numerical experiments.", "full_text": "Sparse and Low-Rank Tensor Decomposition\n\nParikshit Shah\n\nparikshit@yahoo-inc.com\n\nNikhil Rao\n\nnikhilr@cs.utexas.edu\n\nGongguo Tang\n\ngtang@mines.edu\n\nAbstract\n\nMotivated by the problem of robust factorization of a low-rank tensor, we study\nthe question of sparse and low-rank tensor decomposition. We present an ef\ufb01cient\ncomputational algorithm that modi\ufb01es Leurgans\u2019 algoirthm for tensor factoriza-\ntion. Our method relies on a reduction of the problem to sparse and low-rank ma-\ntrix decomposition via the notion of tensor contraction. We use well-understood\nconvex techniques for solving the reduced matrix sub-problem which then allows\nus to perform the full decomposition of the tensor. We delineate situations where\nthe problem is recoverable and provide theoretical guarantees for our algorithm.\nWe validate our algorithm with numerical experiments.\n\n1\n\nIntroduction\n\nTensors are useful representational objects to model a variety of problems such as graphical models\nwith latent variables [1], audio classi\ufb01cation [20], psychometrics [8], and neuroscience [3]. One\nconcrete example proposed in [1] involves topic modeling in an exchangeable bag-of-words model\nwherein given a corpus of documents one wishes to estimate parameters related to the different top-\nics of the different documents (each document has a unique topic associated to it). By computing\nthe empirical moments associated to (exchangeable) bi-grams and tri-grams of words in the docu-\nments, [1] shows that this problem reduces to that of a (low rank) tensor decomposition. A number of\nother machine learning tasks, such as Independent Component Analysis [11], and learning Gaussian\nmixtures [2] are reducible to that of tensor decomposition. While most tensor problems are com-\nputationally intractable [12] there has been renewed interest in developing tractable and principled\napproaches for the same [4, 5, 12, 15, 19, 21, 24\u201327].\nIn this paper we consider the problem of performing tensor decompositions when a subset of the\nentries of a low-rank tensor X are corrupted adversarially, so that the tensor observed is Z = X +Y\nwhere Y is the corruption. One may view this problem as the tensor version of a sparse and low-rank\nmatrix decomposition problem as studied in [6, 9, 10, 13]. We develop an algorithm for performing\nsuch a decomopsition and provide theoretical guarantees as to when such decomposition is possible.\nOur work draws on two sets of tools: (a) The line of work addressing the Robust PCA problem in\nthe matrix case [6, 9], and (b) Application of Leaurgans\u2019 algorithm for tensor decomposition and\ntensor inverse problems [4, 17, 24].\nOur algorithm is computationally ef\ufb01cient and scalable, it relies on the key notion of tensor con-\ntraction which effectively reduces a tensor problem of dimension n \u00d7 n \u00d7 n to four decompostion\nproblems for matrices of size n\u00d7n. One can then apply convex methods for sparse and low-rank ma-\ntrix decomposition followed by certain linear algebraic operations to recover the constituent tensors.\nOur algorithm not only produces the correct decomposition of Z into X and Y , but also produces\nthe low rank factorization of X. We are able to avoid tensor unfolding based approaches [14,21,26]\nwhich are expensive and would lead to solving convex problems that are larger by orders of mag-\nnitude; in the 3rd order case the unfolded matrix would be n2 \u00d7 n. Furthermore, our method is\n\n1\n\n\fconceptually simple, to impelement as well as to analyze theoretically. Finally our method is also\nmodular \u2013 it can be extended to the higher order case as well as to settings where the corrupted\ntensor Z has missing entries, as described in Section 5.\n\n1.1 Problem Setup\n\nr(cid:88)\n\nIn this paper, vectors are denoted using lower case characters (e.g. x, y, a, b, etc.), matrices by upper-\ncase characters (e.g. X, Y, etc,) and tensors by upper-case bold characters (e.g. X, T , A etc.). We\nwill work with tensors of third order (representationally to be thought of as three-way arrays), and\nthe term mode refers to one of the axes of the tensor. A slice of a tensor refers to a two dimensional\nmatrix generated from the tensor by varying indices along two modes while keeping the third mode\n\ufb01xed. For a tensor X we will refer to the indices of the ith mode-1 slice (i.e., the slice corresponding\nto the indices {i} \u00d7 [n2] \u00d7 [n3]) by S(1)\n, where [n2] = {1, 2, . . . , n2} and [n3] is de\ufb01ned similarly.\nWe denote the matrix corresponding to S(1)\nby X 1\ni . Similarly the indices of the kth mode-3 slice\nwill be denoted by S(3)\nand the matrix by X 3\nk.\nGiven a tensor of interest X, consider its decomposition into rank one tensors\n\nk\n\ni\n\ni\n\n\u03bbiui \u2297 vi \u2297 wi,\n\ni=1\n\nX =\n\n(1)\nwhere {ui}i=1,...,r \u2286 Rn1, {vi}i=1,...,r \u2286 Rn2, and {wi}i=1,...,r \u2286 Rn3 are unit vectors. Here\n\u2297 denotes the tensor product, so that X \u2208 Rn1\u00d7n2\u00d7n3 is a tensor of order 3 and dimension n1 \u00d7\nn2 \u00d7 n3. Without loss of generality, throughout this paper we assume that n1 \u2264 n2 \u2264 n3. We\nwill present our results for third order tensors, and analogous results for higher orders follow in\na transparent manner. We will be dealing with low-rank tensors, i.e. those tensors with r \u2264 n1.\nTensors can have rank larger than the dimension, indeed r \u2265 n3 is an interesting regime, but far\nmore challenging and is a topic left for future work.\nKruskal\u2019s Theorem [16] guarantees that tensors satisfying Assumption 1.1 below have a unique\nminimal decomposition into rank one terms of the form (1). The number of terms is called the\n(Kruskal) rank.\nAssumption 1.1. {ui}i=1,...,r \u2286 Rn1, {vi}i=1,...,r \u2286 Rn2, and {wi}i=1,...,r \u2286 Rn3 are sets of\nlinearly independent vectors.\n\nWhile rank decomposition of tensors in the worst case is known to be computationally intractable\n[12], it is known that the (mild) assumption stated in Assumption 1.1 above suf\ufb01ces for an algorithm\nknown as Leurgans\u2019 algorithm [4, 18] to correctly identify the factors in this unique decomposition.\nIn this paper, we will make this assumption about our tensor X throughout. This assumption may\nbe viewed as a \u201cgenericity\u201d or \u201csmoothness\u201d assumption [4].\nIn (1), r is the rank, \u03bbi \u2208 R are scalars, and ui \u2208 Rn1 , vi \u2208 Rn2, wi \u2208 Rn3 are the tensor factors. Let\nU \u2208 Rn1\u00d7r denote the matrix whose columns are ui, and correspondingly de\ufb01ne V \u2208 Rn2\u00d7r and\nW \u2208 Rn3\u00d7r. Let Y \u2208 Rn1\u00d7n2\u00d7n3 be a sparse tensor to be viewed as a \u201ccorruption\u201d or adversarial\nnoise added to X, so that one observes:\n\nZ = X + Y .\n\nThe problem of interest is that of decomposition, i.e. recovering Xand Y from Z.\nFor a tensor X, we de\ufb01ne its mode-3 contraction with respect to a contraction vector a \u2208 Rn3,\ndenoted by X 3\n\nn3(cid:88)\na \u2208 Rn1\u00d7n2, as the following matrix:\n\n(cid:2)X 3\n\n(cid:3)\n\n(2)\nso that the resulting n1 \u00d7 n2 matrix is a weighted sum of the mode-3 slices of the tensor X. Un-\nder this notation, the kth mode-3 slice matrix X 3\nk is a mode-3 contraction with respect to the kth\ncanonical basis vector. We similarly de\ufb01ne the mode-1 contraction with respect to a vector c \u2208 Rn1\nas\n\nXijkak,\n\nij =\n\nk=1\n\na\n\nXijkci.\n\n(3)\n\n(cid:3)\n\n(cid:2)X 1\n\nc\n\njk =\n\nn1(cid:88)\n\ni=1\n\n2\n\n\fto the spectral norm, (cid:107)M(cid:107)\u2217 the nuclear norm, (cid:107)M(cid:107)1 :=(cid:80)\n\nIn the subsequent discussion we will also use the following notation. For a matrix M, (cid:107)M(cid:107) refers\ni,j |Mij| the elementwise (cid:96)1 norm, and\n(cid:107)M(cid:107)\u221e := maxi,j |Mi,j| the elementwise (cid:96)\u221e norm.\n\n1.2\n\nIncoherence\n\nThe problem of sparse and low-rank decomposition for matrices has been studied in [6, 9, 13, 22],\nand it is well understood that exact decomposition is not always possible. In order for the problem to\nbe identi\ufb01able, two situations must be avoided: (a) the low-rank component X must not be sparse,\nand (b) the sparse component Y must not be low-rank. In fact, something stronger is both necessary\nand suf\ufb01cient: the tangent spaces of the low-rank matrix (with respect to the rank variety) and the\nsparse matrix (with respect to the variety of sparse matrices) must have a transverse intersection [9].\nFor the problem to be amenable to recovery using comptationally tractable (convex) methods, some-\nwhat stronger, incoherence assumptions are standard in the matrix case [6,7,9]. We will make similar\nassumptions for the tensor case, which we now describe.\nGiven the decomposition (1) of X we de\ufb01ne the following subspaces of matrices:\n\nTU,V =(cid:8)U AT + BV T : A \u2208 Rn2\u00d7r, B \u2208 Rn1\u00d7r(cid:9)\nTV,W =(cid:8)V C T + DW T : C \u2208 Rn3\u00d7r, D \u2208 Rn2\u00d7r(cid:9) .\n\n(4)\n\nThus TU,V is the set of rank r matrices whose column spaces are contained in span(U ) or row spaces\nare contained in span(V ) respectively, and a similar de\ufb01nition holds for TV,W and matrices V, W . If\nQ is a rank r matrix with column space span(U ) and row space span(V ), TU,V is the tangent space\nat Q with respect to the variety of rank r matrices.\nFor a tensor Y , the support of Y refers to the indices corresponding to the non-zero entries of Y .\nLet \u2126 \u2286 [n1] \u00d7 [n2] \u00d7 [n3] denote the support of Y . Further, for a slice Y 3\ni \u2286 [n1] \u00d7 [n2]\ndenote the corresponding sparsity pattern of the slice Y 3\ncan be de\ufb01ned as\ni\nthe sparsity of the matrix resulting from the ith mode k slice). When a tensor contraction of Y is\ncomputed along mode k, the sparsity of the resulting matrix is the union of the sparsity patterns of\n\n. Let S(cid:0)\u2126(k)(cid:1) denote the set of (sparse) matrices with\n\n(more generally \u2126(k)\n\ni , let \u2126(3)\n\ni\n\ni=1 \u2126(k)\n\ni\n\neach (matrix) slice, i.e. \u2126(k) = (cid:83)nk\n\u2126(k)(cid:17)\n(cid:16)\n\nM\u2208TU,V :(cid:107)M(cid:107)\u22641\n\nmax\n\n\u00b5\n\nsupport \u2126(k). We de\ufb01ne the following incoherence parameters:\n\u03b6 (V, W ) :=\n\n\u03b6 (U, V ) :=\n\n(cid:107)M(cid:107)\u221e\n\n(cid:107)M(cid:107)\u221e\n\nmax\n\nM\u2208TV,W :(cid:107)M(cid:107)\u22641\n(cid:107)N(cid:107).\n\n:=\n\nmax\n\nN\u2208S(\u2126(k)):(cid:107)N(cid:107)\u221e\u22641\n\nThe quantities \u03b6 (U, V ) and \u03b6 (V, W ) being small implies that for contractions of the tensor Z, all\nmatrices in the tangent space of those contractions with respect to the variety of rank r matrices\n\nare \u201cdiffuse\u201d, i.e. do not have sparse elements [9]. Similarly, \u00b5(cid:0)\u2126(k)(cid:1) being small implies that\n\nall matrices with the contracted sparsity pattern \u2126(k) are such that their spectrum is \u201cdiffuse\u201d, i.e.\nthey do not have low rank. We will see speci\ufb01c settings where these forms of incoherence hold for\ntensors in Section 3.\n\n2 Algorithm for Sparse and Low Rank Tensor Decomposition\n\nWe now introduce our algorithm to perform sparse and low rank tensor decompositions. We begin\nwith a Lemma:\nLemma 2.1. Let X \u2208 Rn1\u00d7n2\u00d7n3, with n1 \u2264 n2 \u2264 n3 be a tensor of rank r \u2264 n1. Then the rank\nof X 3\n\na is at most r. Similarly the rank of X 1\n\nc is at most r.\n\ni=1 \u03bbi ui \u2297 vi \u2297 wi. The reader may verify in a straightforward\n\nProof. Consider a tensor X =(cid:80)r\n\nmanner that X 3\n\na enjoys the decomposition:\n\nr(cid:88)\n\ni=1\n\n3\n\nX 3\n\na =\n\n\u03bbi(cid:104)wi, a(cid:105)uivT\ni .\n\n(5)\n\n\fThe proof for the rank of X 1\n\nc is analogous.\n\n[4, 18] Suppose we are given an order 3 tensor X =(cid:80)r\n\nNote that while (5) is a matrix decomposition of the contraction, it is not a singular value decom-\nposition (the components need not be orthogonal, for instance). Recovering the factors needs an\napplication of simultaneous diagonalization, which we describe next.\ni=1 \u03bbi ui \u2297 vi \u2297 wi of size\nLemma 2.2.\nn1 \u00d7 n2 \u00d7 n3 satisfying the conditions of Assumption 1.1. Suppose the contractions X 3\na and X 3\nare computed with respect to unit vectors a, b \u2208 Rn3 distributed independently and uniformly on the\nb\nunit sphere Sn3\u22121 and consider the matrices M1 and M2 formed as:\nb )\u2020X 3\na.\n\n2 are {vi}i=1,...,r.\n\nThen the eigenvectors of M1 (corresponding to the non-zero eigenvalues) are {ui}i=1,...,r, and the\neigenvectors of M T\nRemark Note that while the eigenvectors {ui} ,{vj} are thus determined, a source of ambiguity\nremains. For a \ufb01xed ordering of {ui} one needs to determine the order in which {vj} are to be\narranged. This can be (generically) achieved by using the (common) eigenvalues of M1 and M2 for\nb are computed with respect to random vectors a, b the eigenvalues\npairing i(f the contractions X 3\nare distinct almost surely). Since the eigenvalues of M1, M2 are distinct they can be used to pair\nthe columns of U and V .\n\nb )\u2020\na(X 3\n\nM2 = (X 3\n\nM1 = X 3\n\na, X 3\n\nLemma 2.2 is essentially a simultaneous diagonalization result [17] that facilitates tensor decompo-\nsition [4]. Given a tensor T , one can compute two contractions for mode 1 and apply simultaneous\ndiagonalization as described in Lemma 2.2 - this would yield the factors vi, wi (up to sign and re-\nordering). One can then repeat the same process with mode 3 contractions to obtain ui, vi. In the\n\ufb01nal step one can then obtain \u03bbi by solving a system of linear equations. The full algorithm is\ndescribed in Algorithm 2 in the supplementary material.\nFor a contraction Z k\nconvex problem:\n\nv of a tensor Z with respect to a vector v along mode k, consider solving the\n\nminimize\n\nX ,Y\n\n(cid:107)X(cid:107)\u2217 + \u03bdk(cid:107)Y(cid:107)1\n\nsubject to\n\nv = X + Y.\nZ k\n\n(6)\n\nb\n\nb\n\nb\n\nb\n\n, Y (3)\n\na , X (3)\n\na , X (3)\n\na , Y (3)\n\n. We then use X (3)\n\nOur algorithm, stated in Algorithm 1, proceeds as follows: Given a tensor Z = X + Y , we perform\na , Z (3)\ntwo random contractions (w.r.t. vectors a, b) of the tensor along mode 3 to obtain matrices Z (3)\nb\na , Z (3)\n. Since Z is a sum of sparse and low-rank components, by Lemma 2.1 so are the matrices Z (3)\n.\nWe thus use (6) to decompose them into constituent sparse and low-rank components, which are the\ncontractions of the matrices X (3)\nand Lemma 2.2 to\nobtain the factors U, V . We perform the same operations along mode 1 to obtain factors V, W . In\nthe last step, we solve for the scale factors \u03bbi (a system of linear equations).\nAlgorithm 2 in the supplementary material, which we adopt for our decomposition problem in Algo-\nrithm 1, essentially relies on the idea of simultaneous diagonalization of matrices sharing common\nrow and column spaces [17]. In this paper we do not analyze the situation where random noise is\nadded to all the entries, but only the sparse adversarial noise setting. We note, however, that the key\nalgorithmic insight of using contractions to perform tensor recovery is numerically stable and robust\nwith respect to noise, as has been studied in [4, 11, 17].\nParameters that need to be picked to implement our algorithm are the regularization coef\ufb01cients\n\u03bd1, \u03bd3. In the theoretical guarantees we will see that this can be picked in a stable manner, and that\na range of values guarantee exact decomposition when the suitable incoherence conditions hold. In\npractice these coef\ufb01cents would need to be determined by a cross-validation method. Note also that\nunder suitable random sparsity assumptions [6], the regularization coef\ufb01cient may be picked to be\nthe inverse of the square-root of the dimension.\n\n2.1 Computational Complexity\n\nThe computational complexity of our algorithm is dominated by the complexity of perfoming the\nsparse and low-rank matrix decomposition of the contractions via (6). For simplicity, let us consider\n\n4\n\n\fa ,\na, Y 3\n\nb , Y 3\nb\n\nAlgorithm 1 Algorithm for sparse and low rank tensor decomposition\n1: Input: Tensor Z, parameters \u03bd1, \u03bd3.\n2: Generate contraction vectors a, b \u2208 Rn3 independently and uniformly distributed on unit sphere.\n3: Compute mode 3 contractions Z 3\n4: Solve the convex problem (6) with v = a, k = 3. Call the resulting solution matrices X 3\n\nb respectively.\n\na and Z 3\n\nand regularization parameter \u03bd1.\n\n5: Solve the convex problem (6) with v = b, k = 3. Call the resulting solution matrices X 3\n\nand regularization parameter \u03bd3.\n\n6: Compute eigen-decomposition of M1 := X 3\n\na. Let U and V denote\nthe matrices whose columns are the eigenvectors of M1 and M T\n2 respectively corresponding to\nthe non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M1 and M2.) The\neigenvectors, thus arranged are denoted as {ui}i=1,...,r and {vi}i=1,...,r.\n7: Generate contraction vectors c, d \u2208 Rn1 independently and uniformly distributed on unit sphere.\n8: Solve the convex problem (6) with v = c, k = 1. Call the resulting solution matrices X 1\nc , Y 1\nc\n\na(X 3\n\nb )\u2020 and M2 := (X 3\n\nb )\u2020X 3\n\nand regularization parameter \u03bd3.\n\n9: Solve the convex problem (6) with v = d, k = 1. Call the resulting solution matrices X 1\n\nd , Y 1\nd\n\nand regularization parameter \u03bd4.\n\n10: Compute eigen-decomposition of M3 := X 1\n\nc (X 1\n\nd )\u2020 and M4 := (X 1\n\nc )\u2020X 1\n\nd. Let \u02dcV and \u02dcW denote\n4 respectively corresponding to\n\nthe matrices whose columns are the eigenvectors of M3 and M T\nthe non-zero eigenvalues, in sorted order. (Let r be the (common) rank of M3 and M4.)\n\n11: Simultaneously reorder the columns of \u02dcV , \u02dcW , also performing simultaneous sign reversals as\nnecessary so that the columns of V and \u02dcV are equal, call the resulting matrix W with columns\n{wi}i=1,...,r.\n\n12: Solve for \u03bbi in the linear system\n\nr(cid:88)\n\ni=1\n\n(cid:16) n3\n\n(cid:17)\n\n13: Output: Decomposition \u02c6X :=(cid:80)r\n\nX 3\n\na =\n\n\u03bbiuivT\n\ni (cid:104)wi, a(cid:105).\n\ni=1 \u03bbi ui \u2297 vi \u2297 wi, \u02c6Y := Z \u2212 \u02c6X.\n\nthe case where the target tensor Z \u2208 Rn\u00d7n\u00d7n has equal dimensions in different modes. Using\na standard \ufb01rst order method, the solution of (6) has a per iteration complexity of O(n3), and to\n\n(cid:1) iterations are required [22]. Since only four such steps need be\n\nachieve an accuracy of \u0001, O(cid:0) 1\n\n\u0001\n\n\u0001\n\nwhere \u0001 is the accuracy to which (6) is solved.\nperformed, the complexity of the method is O\nAnother alternative is to reformulate (6) such that it is amenable to greedy atomic approaches [23],\nwhich yields an order of magnitude improvement. We note that in contrast, a tensor unfolding for\nthis problem [14, 21, 26] results in the need to solve much larger convex programs. For instance, for\nZ \u2208 Rn\u00d7n\u00d7n, the resulting \ufb02attened matrix would be of size n2 \u00d7 n and the resulting convex prob-\nlem would then have a complexity of O\n. For higher order tensors, the gap in computational\ncomplexity would increase by further orders of n.\n\n(cid:16) n4\n\n(cid:17)\n\n\u0001\n\n2.2 Numerical Experiments\n\ntensor X =(cid:80)r\n\nWe now present numerical results to validate our approach. We perform experiments for tensors\nof size 50 \u00d7 50 \u00d7 50 (non-symmetric). A tensor Z is generated as the sum of a low rank tensor\nX and a sparse tensor Y . The low-rank component is generated as follows: Three sets of r unit\nvecots ui, vi, wi \u2208 R50 are generated randomly, independently and uniformly distributed on the\nunit sphere. Also a random positive scale factor (uniformly distributed on [0, 1] is chosen and the\ni=1 \u03bbi ui \u2297 vi \u00d7 wi. The tensor Y is generated by (Bernoulli) randomly sampling its\nentries with probability p. For each such p, we perform 10 trials and apply our algorithm. In all our\nexperiments, the regularization parameter was picked to be \u03bd = 1\u221a\nn. The optimization problem (6)\nis solved using CVX in MATLAB. We report success if the MSE is smaller than 10\u22125, separately\nfor both the X and Y components. We plot the empirical probability of success as a function of\np in Fig. 1 (a), (b), for multiple values of the true rank r. In Fig. 1 (c), (d) we test the scalability\n\n5\n\n\f(a) Low Rank Compo-\nnent\n\n(b) Sparse Component\n\n(c) Low Rank Compo-\nnent\n\n(d) Sparse Component\n\nFigure 1: Recovery of the low rank and sparse components from our proposed methods. In \ufb01gures\n(a) and (b) we see that the probability of recovery is high when both the rank and sparsity are low.\nIn \ufb01gures (c) and (d) we study the recovery error for a tensor of dimensions 300 \u00d7 300 \u00d7 300 and\nrank 50.\n\nof our method, by generating a random 300 \u00d7 300 \u00d7 300 tensor of rank 50, and corrupting it with\na sparse tensor of varying sparsity level. We run 5 independent trials and see that for low levels of\ncorruption, both the low rank and sparse components are accurately recovered by our method.\n\n3 Main Results\n\nTheorem 3.1. Suppose Z = X + Y , where X = (cid:80)r\nWe now present the main rigorous guarantees related to the performance of our algorithm. Due to\nspace constraints, the proofs are deferred to the supplementary materials.\ni=1 \u03bbiui \u2297 vi \u2297 wi, has rank r \u2264 n1 and\nsuch that the factors satisfy Assumption 1.1. Suppose Y has support \u2126 and the following condition\n(cid:16)\nis satis\ufb01ed:\n(cid:18)\n\n\u03b6 (U, V ) \u2264 1\n6\n\n1\n6\ntensors,\n\n\u2126(1)(cid:17)\n\n\u2126(3)(cid:17)\n\n(X, Y ) =\n\n\u03b6 (V, W ) <\n\n(cid:19)\n\n(cid:16)\n\ni.e.\n\n\u00b5\n\n\u00b5\n\n.\n\nThen Algoritm 1 succeeds in exactly recovering the component\n( \u02c6X, \u02c6Y ) whenever \u03bdk are picked so that \u03bd3 \u2208\n\u03b6(U,V )\n\u03bd1 \u2208\n\u03bd1 = (3\u03b6(V,W ))p\n\n(cid:18)\n. Speci\ufb01cally, choice of \u03bd3 = (3\u03b6(U,V ))p\n(\u00b5(\u2126(1)))1\u2212p for any p \u2208 [0, 1] in these respective intervals guarantees exact recovery.\n\n1\u22123\u03b6(V,W )\u00b5(\u2126(1))\n\n1\u22124\u03b6(V,W )\u00b5(\u2126(1))\n\n1\u22123\u03b6(U,V )\u00b5(\u2126(3))\n\n1\u22124\u03b6(U,V )\u00b5(\u2126(3))\n\n\u00b5(\u2126(3))\n(\u00b5(\u2126(3)))1\u2212p and\n\n\u00b5(\u2126(1))\n\n(cid:19)\n\n\u03b6(V,W )\n\nand\n\n,\n\n,\n\nFor a matrix M, the degree of M, denoted by deg(M ), is the maximum number of non-zeros in any\nrow or column of M. For a tensor Y , we de\ufb01ne the degree along mode k, denoted by degk(Y ) to\nbe the maximum number of non-zero entries in any row or column of a matrix supported on \u2126(k)\n(de\ufb01ned in Section 1.2). The degree of Y is denoted by deg(Y ) := maxk\u2208{1,2,3} degk(Y ).\nLemma 3.2. We have:\n\n\u2126(k)(cid:17) \u2264 deg(Y ), for all k.\n\n(cid:16)\n\n\u00b5\n\nFor a subspace S \u2286 Rn, let us de\ufb01ne the incoherence of the subspace as:\n\nwhere PS denotes the projection operator onto S, ei is a standard unit vector and (cid:107) \u00b7 (cid:107)2 is the\nEuclidean norm of a vector. Let us de\ufb01ne:\n\n\u03b2(S) := max\n\ni\n\n(cid:107)PSei(cid:107)2,\n\ninc(X) := max{\u03b2 (span(U )) , \u03b2 (span(V )) , \u03b2 (span(W ))}\ninc3(X) := max{\u03b2 (span(U )) , \u03b2 (span(V ))}\ninc1(X) := max{\u03b2 (span(V )) , \u03b2 (span(W ))} .\n\n6\n\n00.511.5200.20.40.60.81sparsity x 100P(recovery) r = 1r = 2r = 3r = 400.511.5200.20.40.60.81sparsity x 100P(recovery) r = 1r = 2r = 3r = 400.050.10.150.2012345Corruption Sparsity# Inexact Recoveries00.050.10.150.2012345Corruption Sparsity# Inexact Recoveries\f(cid:113) max{r,log n}\n\nNote that inc(X) < 1, always. For many random ensembles of interest, we have that the incoherence\nscales gracefully with the dimension n, i.e.: inc(X) \u2264 K\nLemma 3.3. We have\n\nCorollary 3.4. Let Z = X + Y , with X =(cid:80)r\n\n\u03b6 (V, W ) \u2264 2 inc(X).\ni=1 \u03bbiui \u2297 vi \u2297 wi and rank r \u2264 n1, the factors\nsatisfy Assumption 1.1 and incoherence inc(X). Suppose Y is sparse and has degree deg(Y ). If\nthe condition\n\n\u03b6 (U, V ) \u2264 2 inc(X)\n\nn\n\n.\n\ninc(X)deg(Y ) <\n\n1\n12\n\nholds then Algorithm 1 successfully recovers the true solution, i.e. . (X, Y ) = ( \u02c6X, \u02c6Y ) when the\nparameters\n\n(cid:18)\n(cid:18)\n\n2inc3(X)\n\n(cid:19)\n1 \u2212 6deg3(Y )inc3(X)\n(cid:19)\n,\n1 \u2212 6deg1(Y )inc1(X)\n(2deg1(Y ))1\u2212p for any p \u2208 [0, 1] is a valid choice\n\ndeg3(Y )\n\ndeg1(Y )\n\n.\n\n2inc1(X)\n\n1 \u2212 8deg3(Y )inc3(X)\n1 \u2212 8deg1(Y )inc1(X)\n,\n(2deg3(Y ))1\u2212p , \u03bd1 = (6inc1(X))p\n\n\u03bd3 \u2208\n\n\u03bd1 \u2208\n\nSpeci\ufb01cally, a choice of \u03bd3 = (6inc3(X))p\nthat guarantees exact recovery.\n\nRemark Note that Corollary 3.4 presents a deterministic guarantee on the recoverability of a sparse\ncorruption of a low rank tensor, and can be viewed as a tensor extension of [9, Corollary 3].\nWe now consider, for the sake of simplicity, tensors of uniform dimension, i.e. X, Y , Z \u2208 Rn\u00d7n\u00d7n.\nWe show that when the low-rank and sparse components are suitably random, the approach outlined\nin Algorithm 1 achieves exact recovery.\nWe de\ufb01ne the random sparsity model to be one where each entry of the tensor Y is non-zero in-\ndependently and with identical probability \u03c1. We make no assumption about the mangitude of the\nentries of Y , only that its non-zero entries are thus sampled.\ni=1 \u03bbiui \u2297 vi \u2297 wi, where ui, vi, wi \u2208 Rn are uniformly randomly\n\nLemma 3.5. Let X = (cid:80)r\n\ndistributed on the unit sphere Sn\u22121. Then the incoherence of the tensor X satisi\ufb01es:\n\n(cid:114)\n\ninc(X) \u2264 c1\n\nmax{r, log n}\n\nn\n\nn 1\n\n2\n\n\u221a\n\nn\n\n\u221a\n\nn\n\nn 3\n\n(cid:16)\n\n(cid:17)\n\n(cid:17)\n\nmax(log n,r)\n\n(cid:18)(cid:16)\n\nfor some constant c3 > 0.\n\n(cid:17)\u22121(cid:19)\n(cid:16)\u2212c3\n\n. Then the tensor Y satis\ufb01es: deg(Y ) \u2264\n\nwith probability exceeding 1 \u2212 c2n\u22123 log n for some constants c1, c2.\nLemma 3.6. Suppose the entries of Y are sampled according to the random sparsity model, and\n12c1 max(log n,r) with\n\n\u03c1 = O\n2 max(log n, r)\nprobability exceeding 1 \u2212 exp\nCorollary 3.7. Let Z = X + Y where X is low rank with random factors as per the conditions\nof Lemma 3.5 and Y is sparse with random support as per the conditions in Lemma 3.6. Provided\nr \u223c o\n, Algorithm 1 successfully recovers the correct decomposition, i.e. ( \u02c6X, \u02c6Y ) = (X, Y )\nwith probability exceeding 1 \u2212 n\u2212\u03b1 for some \u03b1 > 0.\nRemarks 1) Under this sampling model, the cardinality of the support of Y is allowed to be as large\nas m = O(n 3\n2) We could equivalently have looked at a uniformly random sampling model, i.e. one where a\nsupport set of size m is chosen uniformly randomly from the set of all possible support sets of\ncardinality at most m, and our results for exact recovery would have gone through. This follows\nfrom the equivalence principle for successful recovery between Bernoulli sampling and uniform\nsampling, see [6, Appendix 7.1].\n3) Note that for the random sparsity ensemble, [6] shows that a choice of \u03bd = 1\u221a\nn ensures exact\nrecovery (an additional condition regarding the magnitudes of the factors is needed, however). By\nextension, the same choice can be shown to work for our setting.\n\n\u22121 n) when the rank r is constant (independent of n).\n\n2 log\n\n7\n\n\f4 Extensions\n\n(cid:78)K\n\nX =(cid:80)r\n\nThe approach described in Algorithm 1 and the analysis is quite modular and can be adapted to\nvarious settings to account for different forms of measurements and robustness models. We do not\npresent an analysis of these situations due to space constraints, but outline how these extensions\nfollow from the current development in a straightforward manner.\n1) Higher Order Tensors: Algorithm 1 can be extended naturally to the higher order setting. Re-\ncall that in the third order case, one needs to recover two contractions along the third mode to\ndiscover factors U, V and then two contractions along the \ufb01rst mode to discover factors V, W .\nFor an order K tensor of the form Z \u2208 Rn1\u00d7...\u00d7nK which is the sum of a low rank component\nand a sparse component Y , one needs to compute higher order contrac-\ntions of Z along K \u2212 1 different modes. For each of these K \u2212 1 modes the resulting contraction is\nthe sum of a sparse and low-rank matrix, and thus pairs of matrix problems of the form (6) reveal the\nsparse and low-rank components of the contractions. The low-rank factors can then be recovered via\napplication of Lemma 2.2 and the full decomposition can thus be recovered. The same guarantees\nas in Theorem 3.1 and Corollary 3.4 hold verbatim (the notions of incoherence inc(X) and degree\ndeg(Y ) of tensors need to be extended to the higher order case in the natural way)\n2) Block sparsity: Situations where entire slices of the tensor are corrupted may happen in recom-\nmender systems with adversarial ratings [10]. A natural approach in this case is to use a convex\nrelaxation of the form\n\nl=1 u(l)\n\ni=1 \u03bbi\n\ni\n\nminimize\n\n\u03bdk(cid:107)M1(cid:107)\u2217 + (cid:107)M2(cid:107)1,2\n\nZ k\n\nM1,M2\n\nsubject to\n\nv = M1 + M2\n\nin place of (6) in Algorithm 1. In the above, (cid:107)M(cid:107)1,2 :=(cid:80)\n\ni (cid:107)Mi(cid:107)2, where Mi is the ith column of\nM. Since exact recovery of the block-sparse and low-rank components of the contractions are guar-\nanteed via this relaxation under suitable assumptions [10], the algorithm would inherit associated\nprovable guarantees.\n3) Tensor completion: In applications such as recommendation systems, it may be desirable to\nperform tensor completion in the presence of sparse corruptions. In [24], an adaptation of Leurgans\u2019\nalgorithm was presented for performing completion from measurements restricted to only four slices\nof the tensor with near-optimal sample complexity (under suitable genericity assumptions about\nthe tensor). We note that it is straightforward to blend Algorithm 1 with this method to achieve\ncompletion with sparse corruptions. Recalling that Z = X + Y and therefore Z 3\nk (i.e.\nthe kth mode 3 slice of Z is a sum of sparse and low rank slices of X and Y ), if only a subset of\nelements of Z 3\nwith\n\n(cid:1)) is observed for some index set \u039b, we can replace (6) in Algorithm 1\n\n(cid:0)Z 3\n\nk (say P\u039b\n\nk = X 3\n\nk + Y 3\n\nk\n\n(cid:0)Z k\n\nv\n\n(cid:1) = P\u039b (M1 + M2) .\n\nminimize\n\nM1,M2\n\n\u03bdk(cid:107)M1(cid:107)\u2217 + (cid:107)M2(cid:107)1\n\nsubject to\n\nP\u039b\n\nUnder suitable incoherence assumptions [6, Theorem 1.2], the above will achieve exact recovery of\nthe slices. Once four slices are accurately recovered, one can then use Leurgans\u2019 algorithm to recover\nthe full tensor [24, Theorem 3.6]. Indeed the above idea can be extended more generally to the\nconcept of deconvolving a sum of sparse and low-rank tensors from separable measurements [24].\n4) Non-convex approaches: A basic primitive for sparse and low-rank tensor decomposition used\nin this paper is that of using (6) for matrix decomposition. More ef\ufb01cient non-convex approaches\nsuch as the ones described in [22] may be used instead to speed up Algorithm 1. These alternative\n\nnonconvex methods [22] requre O(rn2) steps per iterations, and O(cid:0)log 1\ntotal complexity of O(cid:0)rn2 log 1\n\n(cid:1) iterations resulting in a\n(cid:1) for solving the decomposition of the contractions to an accuracy\n\n\u0001\n\n\u0001\n\nof \u0001.\n\nReferences\n[1] A. ANANDKUMAR, R. GE, D. HSU, AND S. M. KAKADE, A tensor approach to learning mixed mem-\n\nbership community models, The Journal of Machine Learning Research, 15 (2014), pp. 2239\u20132312.\n\n[2] A. ANANDKUMAR, R. GE, D. HSU, S. M. KAKADE, AND M. TELGARSKY, Tensor decompositions for\n\nlearning latent variable models, Tech. Rep. 1, 2014.\n\n8\n\n\f[3] C. BECKMANN AND S. SMITH, Tensorial extensions of independent component analysis for multisubject\n\nFMRI analysis, NeuroImage, 25 (2005), pp. 294\u2013311.\n\n[4] A. BHASKARA, M. CHARIKAR, A. MOITRA, AND A. VIJAYARAGHAVAN, Smoothed analysis of tensor\ndecompositions, in Proceedings of the 46th Annual ACM Symposium on Theory of Computing, ACM,\n2014, pp. 594\u2013603.\n\n[5] S. BHOJANAPALLI AND S. SANGHAVI, A new sampling technique for tensors, arXiv preprint\n\narXiv:1502.05023, (2015).\n\n[6] E. J. CAND `ES, X. LI, Y. MA, AND J. WRIGHT, Robust principal component analysis?, Journal of the\n\nACM, 58 (2011), pp. 11\u201337.\n\n[7] E. J. CAND `ES AND B. RECHT, Exact matrix completion via convex optimization, Foundations of Com-\n\nputational Mathematics, 9 (2009), pp. 717\u2013772.\n\n[8] R. B. CATTELL, Parallel proportional pro\ufb01les and other principles for determining the choice of factors\n\nby rotation, Psychometrika, 9 (1944), pp. 267\u2013283.\n\n[9] V. CHANDRASEKARAN, S. SANGHAVI, P. A. PARRILO, AND A. S. WILLSKY, Rank-sparsity incoher-\n\nence for matrix decomposition, SIAM Journal on Optimization, 21 (2011), pp. 572\u2013596.\n\n[10] Y. CHEN, H. XU, C. CARAMANIS, AND S. SANGHAVI, Robust matrix completion and corrupted\ncolumns, in Proceedings of the 28th International Conference on Machine Learning (ICML-11), L. Getoor\nand T. Scheffer, eds., New York, NY, USA, 2011, ACM, pp. 873\u2013880.\n\n[11] N. GOYAL, S. VEMPALA, AND Y. XIAO, Fourier PCA and robust tensor decomposition, in Proceedings\n\nof the 46th Annual ACM Symposium on Theory of Computing, ACM, 2014, pp. 584\u2013593.\n\n[12] C. J. HILLAR AND L.-H. LIM, Most tensor problems are NP-hard, Journal of the ACM, 60 (2013),\n\npp. 45:1\u201345:39.\n\n[13] D. HSU, S. KAKADE, AND T. ZHANG, Robust matrix decomposition with sparse corruptions, Informa-\n\ntion Theory, IEEE Transactions on, 57 (2011), pp. 7221\u20137234.\n\n[14] B. HUANG, C. MU, D. GOLDFARB, AND J. WRIGHT, Provable models for robust low-rank tensor\n\ncompletion, Paci\ufb01c Journal of Optimization, 11 (2015), pp. 339\u2013364.\n\n[15] A. KRISHNAMURTHY AND A. SINGH, Low-rank matrix and tensor completion via adaptive sampling,\n\nin Advances in Neural Information Processing Systems, 2013.\n\n[16] J. B. KRUSKAL, Three-way arrays: Rank and uniqueness of trilinear decompositions, with application\n\nto arithmetic complexity and statistics, Linear Algebra Applicat., 18 (1977).\n\n[17] V. KULESHOV, A. CHAGANTY, AND P. LIANG, Tensor factorization via matrix factorization, arXiv.org,\n\n(2015).\n\n[18] S. LEURGANS, R. ROSS, AND R. ABEL, A decomposition for three-way arrays, SIAM Journal on Matrix\n\nAnalysis and Applications, 14 (1993), pp. 1064\u20131083.\n\n[19] Q. LI, A. PRATER, L. SHEN, AND G. TANG, Overcomplete tensor decomposition via convex optimiza-\ntion, in IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing\n(CAMSAP), Cancun, Mexico, Dec. 2015.\n\n[20] N. MESGARANI, M. SLANEY, AND S. A. SHAMMA, Discrimination of speech from non-speech based on\nmultiscale spectro-temporal modulations, Audio, Speech and Language Processing, IEEE Transactions\non, 14 (2006), pp. 920\u2013930.\n\n[21] C. MU, B. HUANG, J. WRIGHT, AND D. GOLDFARB, Square deal: Lower bounds and improved relax-\n\nations for tensor recovery, preprint arXiv:1307.5870, 2013.\n\n[22] P. NETRAPALLI, U. NIRANJAN, S. SANGHAVI, A. ANANDKUMAR, AND P. JAIN, Non-convex robust\n\nPCA, in Advances in Neural Information Processing Systems, 2014.\n\n[23] N. RAO, P. SHAH, AND S. WRIGHT, Forward-backward greedy algorithms for signal demixing, in Sig-\n\nnals, Systems and Computers, 2013 Asilomar Conference on, IEEE, 2014.\n\n[24] P. SHAH, N. RAO, AND G. TANG, Optimal low-rank tensor recovery from separable measurements:\n\nFour contractions suf\ufb01ce, arXiv.org, (2015).\n\n[25] G. TANG AND P. SHAH, Guaranteed tensor decomposition: A moment approach, International Confer-\n\nence on Machine Learning (ICML 2015), (2015), pp. 1491\u20131500.\n\n[26] R. TOMIOKA, K. HAYASHI, AND H. KASHIMA, Estimation of low-rank tensors via convex optimization,\n\npreprint arXiv:1010.0789, 2011.\n\n[27] M. YUAN AND C.-H. ZHANG, On tensor completion via nuclear norm minimization, preprint\n\narXiv:1405.1773, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1501, "authors": [{"given_name": "Parikshit", "family_name": "Shah", "institution": "Yahoo Labs"}, {"given_name": "Nikhil", "family_name": "Rao", "institution": "University of Texas at Austin"}, {"given_name": "Gongguo", "family_name": "Tang", "institution": "Colorado School of Mines"}]}