{"title": "High-Rank Matrix Completion and Clustering under Self-Expressive Models", "book": "Advances in Neural Information Processing Systems", "page_first": 73, "page_last": 81, "abstract": "We propose efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces. We cast the problem as finding a completion of the data matrix so that each point can be reconstructed as a linear or affine combination of a few data points. Since the problem is NP-hard, we propose a lifting framework and reformulate the problem as a group-sparse recovery of each incomplete data point in a dictionary built using incomplete data, subject to rank-one constraints. To solve the problem efficiently, we propose a rank pursuit algorithm and a convex relaxation. The solution of our algorithms recover missing entries and provides a similarity matrix for clustering. Our algorithms can deal with both low-rank and high-rank matrices, does not suffer from initialization, does not need to know dimensions of subspaces and can work with a small number of data points. By extensive experiments on synthetic data and real problems of video motion segmentation and completion of motion capture data, we show that when the data matrix is low-rank, our algorithm performs on par with or better than low-rank matrix completion methods, while for high-rank data matrices, our method significantly outperforms existing algorithms.", "full_text": "High-Rank Matrix Completion and Clustering\n\nunder Self-Expressive Models\n\nE. Elhamifar\u2217\n\nCollege of Computer and Information Science\n\nNortheastern University\n\nBoston, MA 02115\n\neelhami@ccs.neu.edu\n\nAbstract\n\nWe propose ef\ufb01cient algorithms for simultaneous clustering and completion of\nincomplete high-dimensional data that lie in a union of low-dimensional subspaces.\nWe cast the problem as \ufb01nding a completion of the data matrix so that each point\ncan be reconstructed as a linear or af\ufb01ne combination of a few data points. Since the\nproblem is NP-hard, we propose a lifting framework and reformulate the problem\nas a group-sparse recovery of each incomplete data point in a dictionary built using\nincomplete data, subject to rank-one constraints. To solve the problem ef\ufb01ciently,\nwe propose a rank pursuit algorithm and a convex relaxation. The solution of our\nalgorithms recover missing entries and provides a similarity matrix for clustering.\nOur algorithms can deal with both low-rank and high-rank matrices, does not suffer\nfrom initialization, does not need to know dimensions of subspaces and can work\nwith a small number of data points. By extensive experiments on synthetic data\nand real problems of video motion segmentation and completion of motion capture\ndata, we show that when the data matrix is low-rank, our algorithm performs on\npar with or better than low-rank matrix completion methods, while for high-rank\ndata matrices, our method signi\ufb01cantly outperforms existing algorithms.\n\n1\n\nIntroduction\n\nHigh-dimensional data, which are ubiquitous in computer vision, image processing, bioinformatics\nand social networks, often lie in low-dimensional subspaces corresponding to different categories\nthey belong to [1, 2, 3, 4, 5, 6]. Clustering and \ufb01nding low-dimensional representations of data are\nimportant unsupervised learning problems with numerous applications, including data compression\nand visualization, image/video/costumer segmentation, collaborative \ufb01ltering and more.\nA major challenge in real problems is dealing with missing entries in data, due to sensor failure,\nad-hoc data collection, or partial knowledge of relationships in a dataset. For instance, in estimating\nobject motions in videos, the tracking algorithm may loose the track of features in some video frames\n[7]; in the image inpainting problem, intensity values of some pixels are missing due to sensor failure\n[8]; or in recommender systems, each user provides ratings for a limited number of products [9].\n\nPrior Work. Existing algorithms that deal with missing entries in high-dimensional data can be\ndivided into two main categories. The \ufb01rst group of algorithms assume that data lie in a single\nlow-dimensional subspace. Probabilistic PCA (PPCA) [10] and Factor Analysis (FA) [11] optimize\na non-convex function using Expectation Maximization (EM), estimating low-dimensional model\nparameters and missing entries of data in an iterative framework. However, their performance depends\n\u2217E. Elhamifar is an Assistant Professor in the College of Computer and Information Science, Northeastern\n\nUniversity.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fon initialization and degrades as the dimension of the subspace or the percentage of missing entries\nincreases. Low-rank matrix completion algorithms, such as [12, 13, 14, 15, 16, 17] recover missing\nentries by minimizing the convex surrogate of the rank, i.e., nuclear norm, of the complete data\nmatrix. When the underlying subspace is incoherent with standard basis vectors and missing entries\nlocations are spread uniformly at random, they are guaranteed to recover missing entries.\nThe second group of algorithms addresses the more general and challenging scenario where data\nlie in a union of low-dimensional subspaces. The goals in this case are to recover missing entries\nand cluster data according to subspaces. Since the union of low-dimensional subspaces is often\nhigh/full-rank, methods in the \ufb01rst category are not effective. Mixture of Probabilistic PCA (MPPCA)\n[18, 19], Mixture of Factor Analyzers (MFA) [20] and K-GROUSE [21] address clustering and\ncompletion of multi-subspace data, yet suffer from dependence on initialization and perform poorly\nas the dimension/number of subspaces or the percentage of missing entires increases. On the other\nhand, [22] requires a polynomial number of data points in the ambient space dimension, which often\ncannot be met in high-dimensional datasets. Building on the unpublished abstract in [23], a clustering\nalgorithm using expectation completion on the data kernel matrix was proposed in [24]. However, the\nalgorithm only addresses clustering and the resulting non-convex optimization is dealt with using the\nheuristic approach of shifting eigenvalues of the Hessian to nonnegative values. [25] assumes that the\nobserved matrix corresponds to applying a Lipschitz, monotonic function to a low-rank matrix. While\nan important generalization to low-rank regime, [25] cannot cover the case of multiple subspaces.\n\nPaper Contributions. In this paper, we propose an ef\ufb01cient algorithm for the problem of simulta-\nneous completion and clustering of incomplete data lying in a union of low-dimensional subspaces.\nBuilding on the Sparse Subspace Clustering (SSC) algorithm [26], we cast the problem as \ufb01nding a\ncompletion of the data so that each complete point can be ef\ufb01ciently reconstructed using a few com-\nplete points from the same subspace. Since the formulation is non-convex and, in general, NP-hard,\nwe propose a lifting scheme, where we cast the problem as \ufb01nding a group-sparse representation of\neach incomplete data point in a modi\ufb01ed dictionary, subject to a set of rank-one constraints. In our\nformulation, coef\ufb01cients in groups correspond to pairwise similarities and missing entries of data.\nMore speci\ufb01cally, our group-sparse recovery formulation \ufb01nds a few incomplete data points that\nwell reconstruct a given point and, at the same time, completes the selected data points in a globally\nconsistent fashion. Our framework has several advantages over the state of the art:\n\n\u2013 Unlike algorithms such as [22] that require a polynomial number of points in the ambient-space\ndimension, our framework needs about as many points as the subspace dimension not the ambient\nspace. In addition, we do not need to know dimensions of subspaces a priori.\n\n\u2013 While two-stage methods such as [24], which \ufb01rst obtain a similarity graph for clustering and then\napply low-rank matrix completion to each cluster, fail when subspaces intersect or clustering fails,\nour method simultaneously recovers missing entries and builds a similarity matrix for clustering,\nhence, each goal bene\ufb01ts from the other. Moreover, in scenarios where a hard clustering does not\nexist, we can still recover missing entries.\n\n\u2013 While we motivate and present our algorithm in the context of clustering and completion of multi-\nsubspace data, our framework can address any task that relies on the self-expressiveness property of\nthe data, e.g., column subset selection in the presence of missing data.\n\n\u2013 By experiments on synthetic and real data, we show that our algorithm performs on par with or better\nthan low-rank matrix completion methods when the data matrix is low-rank, while it signi\ufb01cantly\noutperforms state-of-the-art clustering and completion algorithms when the data matrix is high-rank.\n\n2 Problem Statement\nAssume we have L subspaces {S(cid:96)}L\n(cid:96)=1 in an n-dimensional ambient space,\nRn. Let {yj}N\nj=1 denote a set of N data points lying in the union of subspaces, where we observe only\nsome entries of each yj\n. Assume that we do not know a priori the bases for\nsubspaces nor do we know which data points belong to which subspace. Given the incomplete data\npoints, our goal is to recover missing entries and cluster the data into their underlying subspaces.\n\n(cid:96)=1 of dimensions {d(cid:96)}L\n\n(cid:44) [y1j\n\ny2j\n\n. . . ynj]\n\n(cid:62)\n\n2\n\n\fTo set the notation, let \u2126j \u2286 {1, . . . , n} and \u2126c\nj denote, respectively, indices of observed and missing\nentries of yj. Let U \u2126j \u2208 Rn\u00d7|\u2126j| be the submatrix of the standard basis whose columns are indexed\nby \u2126j. We denote by P \u2126j \u2208 Rn\u00d7n the projection matrix onto the subspace spanned by U \u2126j , i.e.,\n\u2126j . Hence, xj (cid:44) U(cid:62)\nj| corresponds to the vector of missing entries of yj.\nP \u2126j\nWe denote by \u00afyj an n-dimensional vector whose i-th coordinate is yij for i \u2208 \u2126j and is zero for\ni \u2208 \u2126c\n(cid:44) P \u2126j yj \u2208 Rn. We can write each yj as the summation of two orthogonal vectors\nwith observed and unobserved entries, i.e.,\n\n(cid:44) U \u2126j U(cid:62)\n\nyj \u2208 R|\u2126c\n\nj, i.e., \u00afyj\n\n\u2126c\nj\n\nU(cid:62)\n\n\u2126c\nj\n\nj\n\nj\n\nyj = \u00afyj + U \u2126c\n\nyj = P \u2126j yj + P \u2126c\n\nj=1 and zero-\ufb01lled data {\u00afyj}N\n\n(1)\nFinally, we denote by Y \u2208 Rn\u00d7N and \u00afY \u2208 Rn\u00d7N matrices whose columns are complete data points\n{yj}N\nTo address completion and clustering of multi-subspace data, we propose a uni\ufb01ed framework to\nsimultaneously recover missing entries and learn a similarity graph for clustering. To do so, we build\non the SSC algorithm [26, 4], which we review next.\n\nj=1, respectively.\n\nyj = \u00afyj + U \u2126c\n\nxj.\n\nj\n\n3 Sparse Subspace Clustering Review\n\nThe sparse subspace clustering (SSC) algorithm [26, 4] addresses the problem of clustering complete\nmulti-subspace data. It relies on the observation that in a high-dimensional ambient space, while\nthere are many ways that each data point yj can be reconstructed using the entire dataset, a sparse\nrepresentation selects a few data points from the underlying subspace of yj, since each point in S(cid:96)\ncan be represented using d(cid:96) data points, in general directions, from S(cid:96). This motivates solving2\n\nN(cid:88)\n\nN(cid:88)\n\nmin\n\n{c1j ,...,cN j}\n\ni=1\n\ni=1\n\n|cij|\n\ns. t.\n\ncijyi = 0, cjj = \u22121,\n\n(2)\n\nwhere the constraints express that each yj should be written as a combination of other points. To\ninfer clustering, one builds a similarity graph using sparse coef\ufb01cients, by connecting nodes i and j\nof the graph, representing, respectively, yi and yj, with an edge with the weight wij = |cij| + |cji|.\nClustering of data is obtained then by applying spectral clustering [27] to the similarity graph.\nWhile [4, 26, 28] show that, under appropriate conditions on subspace angles and data distribution,\n(2) is guaranteed to recover desired representations, the algorithm requires complete data points.\n\n3.1 Naive Extensions of SSC to Deal with Missing Entries\n\nIn the presence of missing entries, the (cid:96)1-minimization in (2) becomes non-convex, since coef\ufb01cients\nand a subset of data entries are both unknown. A naive approach is to solve (2) using zero-\ufb01lled\ndata points, {\u00afyi}N\ni=1, to perform clustering and then apply standard matrix completion on each\ncluster. However, the drawback of this approach is that not only it does not take advantage of the\nknown locations of missing entries, but also zero-\ufb01lled data will no longer lie in original subspaces,\nand deviate more from subspaces as the percentage of missing entries increases. Hence, a sparse\nrepresentation does not necessarily \ufb01nd points from the same subspace and spectral clustering fails.\nAn alternative approach to deal with incomplete data is to use standard low-rank matrix completion\nalgorithms to recover missing values and then apply SSC to cluster data into subspaces. While this\napproach works when the union of subspaces is low-rank, its effectiveness diminishes as the number\nof subspaces or their dimensions increases and the data matrix becomes high/full-rank.\n\n4 Sparse Subspace Clustering and Completion via Lifting\n\nIn this section, we propose an algorithm to recover missing entries and build a similarity graph for\nclustering, given observations {yij; i \u2208 \u2126j}N\n\n2(cid:96)1 is the convex surrogate of the cardinality function,(cid:80)N\n\nj=1 for N data points lying in a union of subspaces.\ni=1 I(|cij|), where I(\u00b7) is the indicator function.\n\n3\n\n\f4.1 SSC\u2013Lifting Formulation\nTo address the problem, we start from the SSC observation that, given complete data {yj}N\nsolution of\n\nj=1, the\n\nN(cid:88)\n\nN(cid:88)\n\nmin{cij}\n\nN(cid:88)\n\nI(|cij|) s. t.\n\ncijyi = 0, cjj = \u22121, \u2200j\n\n(3)\n\nj=1\n\ni=1\n\ni=1\n\nideally \ufb01nds a representation of each yj as a linear combination of a few data points that lie in the\nsame subspace as of yj. I(\u00b7) denotes the indicator function, which is zero when its argument is zero\nand is one otherwise. Notice that, using (1), we can write each yi as\n\nxi =(cid:2)\u00afyi U \u2126c\n\ni\n\n(cid:3)(cid:20) 1\n\n(cid:21)\n\nxi\n\nyi = \u00afyi + U \u2126c\n\ni\n\n,\n\n(4)\n\nwhere \u00afyi is the i-th data point whose missing entries are \ufb01lled with zeros and xi is the vector\ncontaining missing entries of yi. Thus, substituting (4) in the optimization (3), we would like to solve\n\n(cid:21)\n\n(cid:3)(cid:20) cij\n\n(cid:2)\u00afyi U \u2126c\n\nN(cid:88)\ni|+1 are given and known while vectors(cid:2)cij\n\n= 0, cjj = \u22121, \u2200j.\n\ncijxi\n\ni=1\n\ni\n\n(5)\n\n(cid:3)(cid:62) \u2208\n\ni=1\n\nj=1\n\nmin\n\n{cij},{xi}\n\nN(cid:88)\n\nI(|cij|) s. t.\n\n(cid:3) \u2208 Rn\u00d7|\u2126c\n\nN(cid:88)\nNotice that matrices(cid:2)\u00afyi U \u2126c\ncij is the same as the number of nonzero blocks(cid:2)cij\n(cid:2)cij\nN(cid:88)\n\n(cid:3)(cid:62)\nN(cid:88)\n\nN(cid:88)\n\ncijx(cid:62)\n\n(cid:33)\n\nj\n\ni\n\ns. t.\n\nmin\n\n{cij},{xi}\n\nI\n\nj=1\n\ni=1\n\ni|+1 are unknown. In fact, the optimization (5) has two sources of non-convexity: the (cid:96)0-norm in\n\ncijx(cid:62)\nR|\u2126c\nthe objective function and the product of unknown variables {cij} and {xi} in the constraint.\nTo pave the way for an ef\ufb01cient algorithm, \ufb01rst we use the fact that the number of nonzero coef\ufb01cients\n, since cij is nonzero if and only if\n\ncijx(cid:62)\n\ni\n\nis nonzero. Thus, we can write (5) as the equivalent group-sparse optimization\n\n(cid:3)(cid:62)\n(cid:3)(cid:20) cij\n\ncijxi\n\n(cid:21)\n\n= 0, cjj = \u22121, \u2200j,\n\n(6)\n\nwhere (cid:107) \u00b7 (cid:107)p denotes the (cid:96)p-norm for p > 0. Next, to deal with the non-convexity of the product of\ncij and xi, we use the fact that for each i \u2208 {1, . . . , N}, the matrix\n\n\u00b7\u00b7\u00b7\nciN\n\u00b7\u00b7\u00b7 ciN xi\n\n[ci1 \u00b7\u00b7\u00b7 ciN ] ,\n\n(7)\n\ncijxi\n\n(cid:32)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) cij\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)p\n(cid:20) ci1\n\nAi (cid:44)\n\nci1xi\n\ni\n\ni\n\n(cid:2)\u00afyi U \u2126c\n(cid:21)\n(cid:20) 1\n\n=\n\nxi\n\ni=1\n\n(cid:21)\n\nis of rank one, since it can be written as the outer product of two vectors. This motivates to use a\nlifting scheme where we de\ufb01ne new optimization variables\n\u03b1ij (cid:44) cijxi \u2208 R|\u2126c\ni|,\n\n(8)\n\nand consider the group-sparse optimization program\n\nN(cid:88)\n\nN(cid:88)\n\nj=1\n\ni=1\n\n(cid:33)\n\n(cid:32)(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) cij\n\n\u03b1ij\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)p\n\nI\n\n(cid:2)\u00afyi U \u2126c\n\ni\n\nN(cid:88)\n\ni=1\n\ns. t.\n\nmin\n\n{cij},{\u03b1ij}\ncjj =\u22121,\u2200j\n\n(cid:21)\n\n(cid:3)(cid:20) cij\n\n\u03b1ij\n\n(cid:18)(cid:20) ci1 \u00b7\u00b7\u00b7 ciN\n\n\u03b1i1 \u00b7\u00b7\u00b7 \u03b1iN\n\n(cid:21)(cid:19)\n\n= 0, rk\n\n= 1,\u2200i, j,\n\n(9)\n\nwhere we have replaced cijxi with \u03b1ij and have introduced rank-one constraints. In fact, we show\nthat one can recover the solution of (5) using (9) and vice versa.\nProposition 1 Given a solution {cij} and {\u03b1ij} of (9), by computing xi\u2019s via the factorization in\n(7), {cij} and {xi} is a solution of (5). Also, given a solution {cij} and {xi} of (5), {cij} and\n{\u03b1ij (cid:44) cijxi} would be a solution of (9).\nNotice that, we have transferred the non-convexity of the product cijxi in (5) into a set of non-convex\nrank-one constraints in (9). However, as we will see next, (9) admits an ef\ufb01cient convex relaxation.\n\n4\n\n\f4.2 Relaxations and Extensions\n\nThe optimization program in (9) is, in general, NP-hard, due to the mixed (cid:96)0/(cid:96)p-norm in the objective\nfunction. It is non-convex due to both mixed (cid:96)0/(cid:96)p-norm and rank-one constraints. To solve (9), we\n\ufb01rst take the convex surrogate of the objective function, which corresponds to an (cid:96)1/(cid:96)p-norm [29, 30],\nwhere we drop the indicator function and, for p \u2208 {2,\u221e}, solve\n\nN(cid:88)\n\nN(cid:88)\n\nj=1\n\ni=1\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) cij\n\n\u03b1ij\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)p\n\n(cid:32) N(cid:88)\n\nN(cid:88)\n\n+\n\n\u03c1\n\nj=1\n\ni=1\n\n(cid:2)\u00afyi U \u2126c\n\ni\n\n(cid:21)(cid:33)\n\n(cid:3)(cid:20) cij\n\n\u03b1ij\n\n(cid:18)(cid:20) ci1 \u00b7\u00b7\u00b7 ciN\n\n\u03b1i1 \u00b7\u00b7\u00b7 \u03b1iN\n\n(cid:21)(cid:19)\n\n= 1,\u2200i.\n\ns. t. rk\n\nmin\n\n\u03bb\n{cij ,\u03b1ij}\n{cjj =\u22121}\n\n(10)\nThe nonnegative parameter \u03bb is a regularization parameter and the function \u03c1(\u00b7) \u2208 {\u03c1e(\u00b7), \u03c1a(\u00b7)}\nenforces whether the reconstruction of each point should be exact or approximate, where\n\n(cid:26)+\u221e if u (cid:54)= 0\n\n0\n\nif u = 0\n\n\u03c1e(u) (cid:44)\n\n,\n\n\u03c1a(u) (cid:44) 1\n2\n\n(cid:107)u(cid:107)2\n2.\n\n(11)\n\n\u03b1i1 \u00b7\u00b7\u00b7 \u03b1iN\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) ci1 \u00b7\u00b7\u00b7 ciN\n(cid:80)N\n(cid:80)N\n\n\u02c6xi =\n\nMore speci\ufb01cally, when dealing with missing entries from noise-free data, which perfectly lie in\nmultiple subspaces, we enforce exact reconstruction by selecting \u03c1(\u00b7) = \u03c1e(\u00b7). On the other hand,\nwhen dealing with real data where observed entries are corrupted by noise, exact reconstruction is\ninfeasible or comes at the price of losing the sparsity of the solution, which is undesired. Thus, to\ndeal with noisy incomplete data, we consider approximate reconstruction by selecting \u03c1(\u00b7) = \u03c1a(\u00b7).\nNotice that the objective function of (10) is convex for p \u2265 1, while the rank-one constraints are\nnon-convex. We can obtain a local solution, by solving (10) with an Alternating Direction Method of\nMultipliers (ADMM) framework using projection onto the set of rank-one matrices.\nTo obtain a convex algorithm, we use a nuclear-norm3 relaxation [12, 14, 15] for the rank-one\nconstraints, where we replace rank(Ai) = 1 with (cid:107)Ai(cid:107)\u2217 \u2264 \u03c4, for \u03c4 > 0. In addition, to reduce\nthe number of constraints and the complexity of the problem, we choose to bring the nuclear norm\nconstraints into the objective function using a Lagrange multiple \u03b3 > 0. Hence, we propose to solve\n\nN(cid:88)\n\nN(cid:88)\n\n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) cij\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)p\n\nN(cid:88)\n\n+ \u03b3\n\n(cid:32) N(cid:88)\n\nN(cid:88)\n\n+\n\n\u03c1\n\n(cid:2)\u00afyi U \u2126c\n\ni\n\n(cid:21)(cid:33)\n\n(cid:3)(cid:20) cij\n\n(cid:21)(cid:13)(cid:13)(cid:13)(cid:13)\u2217\n\nmin\n\n\u03bb\n\ni=1\n\nj=1\n\n\u03b1ij\n\n{cij ,\u03b1ij}\n{cjj =\u22121}\nwhich is convex for p \u2265 1 and can be solved ef\ufb01ciently using convex solvers. Finally, using the\nsolution of (10), we recover missing entries by \ufb01nding the best rank-one factorization of each block\nAi as in (7), which results in4\n\n\u03b1ij\n\nj=1\n\ni=1\n\ni=1\n\n,\n\n(12)\n\nj=1 cij\u03b1ij\n\n(13)\nIn addition, we use the coef\ufb01cients {cij} to build a similarity graph with weights wij = |cij| + |cji|\n(cid:80)N\nand obtain clustering of data using graph partitioning. It is important to note that we do not need to\nknow dimensions of subspaces a priori, since (10) automatically selects the appropriate number of\ni=1 |cij| instead\n\ndata points from each subspace. Also, it is worth metioning that we can use(cid:80)N\n\nj=1 c2\nij\n\nj=1\n\n.\n\nof the group-sparsity term in (10) and (12).\n\ni = \u2205, the rank-\nRemark 1 Notice that when all entries of all data points are observed, i.e., \u2126c\none constraints in (9) are trivially satis\ufb01ed. Hence, (10) and (12) with \u03b3 = 0 reduce to the (cid:96)1-\nminimization of SSC. In other words, our framework is a generalization of SSC, which simultaneously\n\ufb01nds similarities and missing entries for incomplete data.\n\nTable 1 shows the stable rank5 [31] of blocks Ai of the solution for the synthetic dataset explained in\nthe experiments in Section 5. As the results show, the penalized optimization successfully recovers\nclose to rank-one solutions for practical values of \u03b3 and \u03bb.\n\n3The nuclear norm of A, denoted by (cid:107)A(cid:107)\u2217, is the sum of its singular values, i.e., (cid:107)A(cid:107)\u2217 =(cid:80)\n5Stable rank of B is de\ufb01ned as(cid:80)\n\n4The denominator is always nonzero since cii = \u22121 for all i.\n\ni , where \u03c3i\u2019s are singular values of B.\n\ni / maxi \u03c32\n\ni \u03c3i(A).\n\ni \u03c32\n\n5\n\n\fTable 1: Average stable-rank of matrices Ai for high-rank data, n = 100, L = 12, d = 10, N = 600, with\n\u03c1 = 0.4, explained in section 5. Notice that rank of Ai is close to one, and as \u03b3 increases, it gets closer to one.\n\n\u03b3 = 0.001\n\u03bb = 0.01 1.015 \u00b1 0.005 1.009 \u00b1 0.005 1.004 \u00b1 0.002\n1.021 \u00b1 0.007 1.011 \u00b1 0.006 1.006 \u00b1 0.003\n\u03bb = 0.1\n\n\u03b3 = 0.01\n\n\u03b3 = 0.1\n\n\u00b7\u00b7\u00b7\n\n\u00b7\u00b7\u00b7\n\nFigure 1: Subset selection and completion via lifting on the Olivetti face dataset. Top: faces from the dataset\nwith missing entries. Bottom: solution of our method on the dataset. We successfully recover missing entries\nand, at the same time, select a subset of faces as representatives.\n\nnonzero coef\ufb01cient blocks(cid:2)cij \u03b1(cid:62)\n\nij\n\n(cid:3). In other words, we \ufb01nd a representation of each incomplete\n\nNotice that the mixed (cid:96)1/(cid:96)p-norm in the objective function of (10) and (12) promotes selecting a few\n\ndata point using a few other incomplete data points, while, at the same time, \ufb01nd missing entries\nof the selected data points. On the other hand, rank constraints on the sub-blocks of the solution\nensure that recovered missing entries are globally consistent, i.e., if a data point takes part in the\nreconstruction of multiple points, the associated missing entries in each representation are the same.\n\nthe self-expressiveness property, i.e., yj =(cid:80)N\n\nRemark 2 Our lifting framework can also deal with missing entries in other tasks that rely the on\ni=1 cijyi. Figure 1 shows results of the extension of\nour method to column subset selection [32, 33] with missing entries. In fact, simultaneously selecting\na few data points that well reconstruct the entire dataset and recovering missing entires can be cast\nas a modi\ufb01cation of (10) or (12), where we modify the \ufb01rst term in the objective function in order to\nselect a few nonzero blocks, Ai.\n\nj=1\n\nWe implement (10) and (12) with(cid:80)N\n\n5 Experiments\n(cid:80)N\nWe study the performance of our algorithm for completion and clustering of synthetic and real data.\ni=1 |cij| instead of the group-sparsity term using the\nADMM framework [34, 35]. Unless stated otherwise, we set \u03bb = 0.01 and \u03b3 = 0.1. However, the\nresults are stable for \u03bb \u2208 [0.005, 0.05] and \u03b3 \u2208 [0.01, 0.5].\nWe compare our algorithm, SSC-Lifting, with MFA [20], K-Subspaces with Missing Entries\n(KSub-M) [21], Low-Rank Matrix Completion [13] followed by SSC (LRMC+SSC) or LSA [36]\n(LRMC+LSA), and SSC using Column-wise Expectation Completion (SSC-CEC) [24]. It is worth\nmentioning that in all experiments, we found that the performance of SSC-CEC is slightly better\nthan SSC using zero-\ufb01lled data. In addition, as reported in [21], KSub-M generally outperforms\nthe high-rank matrix completion algorithm in [22], since the latter requires a very large number of\nsamples, which becomes impractical in high-dimensional problems. We compute\n\nClustering Error =\n\n(14)\nwhere Y and \u02c6Y denote, respectively, the true and recovered matrix and (cid:107) \u00b7 (cid:107)F is the Frobenius norm.\n\n, Completion Error =\n\n# All points\n\n(cid:107)Y (cid:107)F\n\n# Misclassi\ufb01ed points\n\n(cid:107) \u02c6Y \u2212 Y (cid:107)F\n\n,\n\n5.1 Synthetic Experiments\nIn this section, we evaluate the performance of different algorithms on synthetic data. We generate L\nrandom d-dimensional subspaces in Rn and draw Ng data points, at random, from each subspace. We\nconsider two scenarios: 1) a low-rank data matrix whose columns lie in a union of low-dimensional\nsubspaces; 2) a high rank data matrix whose columns lie in a union of low-dimensional subspaces.\nUnless stated otherwise, for low-rank matrices, we set L = 3 and d = 5, hence, Ld = 15 < n = 100,\nwhile for high-rank matrices, we set L = 12 and d = 10, hence, Ld = 120 > n = 100.\n\nCompletion Performance. We generate missing entries by selecting \u03c1 fraction of entries of the\ndata matrix uniformly at random and dropping their values. The left and middle left plots in Figure 2\n\n6\n\n\fFigure 2: Completion errors of different algorithms as a function of \u03c1. Left: low-rank matrices. Middle left:\nhigh-rank matrices. Middle right: effect of the ambient space dimension, n. Right: effect of the number of data\npoints in each subspace, Ng, for low-rank (solid lines) and high-rank (dashed lines) matrices.\n\nshow completion errors of different algorithms for low-rank and high-rank matrices, respectively, as\na function of the fraction of missing entries, \u03c1. Notice that in both cases, MFA and KSub-M have\nhigh errors, which rapidly increase as \u03c1 increases, due to dependence on initialization and getting\ntrapped in local optima. In both cases, SSC-lifting outperforms all methods across all values of\n\u03c1. Speci\ufb01cally, in the low-rank regime, while LRMC and SSC-lifting have almost zero error for\n\u03c1 \u2264 0.35, the performance of LRMC quickly degrades for larger \u03c1\u2019s, while SSC-lifting performs well\nfor \u03c1 \u2264 0.6. On the other hand, the performance of LRMC signi\ufb01cantly degrades for the high-rank\ncase, with a large gap to SSC-lifting, which performs well for \u03c1 < 0.45. The middle right plot in\nFigure 2 demonstrates the effect of the ambient space dimension, n, for L = 7, d = 5, Ng = 100\nand \u03c1 = 0.3. Notice that errors of MFA and KSub-M increases as n increases, due to larger number\nof local optima. LRMC has a large error for small values of n, where n is smaller than or close to Ld,\ni.e., high-rank regime. As n increases and matrices becomes low-rank, the error decreases. Notice\nthat SSC-lifting for n \u2265 40 has a low error, demonstrating its effectiveness in handling both low-rank\nand high-rank matrices. Finally, the right plot in Figure 2 demonstrates the effect of the number\nof points, Ng, for low and high rank matrices with \u03c1 = 0.5. We do not show results of MFA and\nKSub-M, since they have large errors for all Ng. Notice that for all values of Ng, SSC-lifting obtains\nsmaller errors than LRMC, verifying the effectiveness of sparsity principle to complete the data.\n\nClustering Performance. Next, we compare the clustering performance. To better study the effect\nof missing entries, we generate missing entries by selecting a fraction \u03b4 of data points and for each\nselected data point, we drop the values for a fraction \u03c1 of its entries, both uniformly at random. We\nchange \u03b4 in [0.1, 1.0] and \u03c1 in [0.1, 0.9] and for each pair (\u03c1, \u03b4), record the average clustering and\ncompletion errors over 20 trials, each with different random subspaces and data points. Figure 3\nshows the clustering errors of different algorithms for low-rank (top row) and high-rank (bottom\nrow) data matrices (completion errors provided in supplementary materials). In both cases, MFA\nperforms poorly, due to local optima. While LRMC+SSC, SSC-CEC and SSC-Lifting perform\nsimilarly for low-rank matrices, SSC-Lifting performs best among all methods for high-rank matrices.\nIn particular, when the percentage of missing entries, \u03c1, is more than 70%, SSC-Lifting performs\nsigni\ufb01cantly better than other algorithms. It is important to notice that for small values of (\u03c1, \u03b4), since\ncompletion errors via SSC-Lifting and LRMC are suf\ufb01ciently small, the recovered matrices will be\nnoisy versions of the original matrices. As a result, Lasso-type optimizations of SSC and SSC-Lifting\nwill succeed in recovering subspace-sparse representations, leading to zero clustering errors. In the\nhigh-rank case, SSC-EC has a higher clustering error than LRMC and SSC-Lifting, which is due to\nthe fact that it relies on a heuristic of shifting eigenvalues of the kernel matrix to non-negative values.\n\n5.2 Real Experiments on Motion Segmentation\nWe consider the problem of motion segmentation [37, 38] with missing entries on the Hopkins 155\ndataset, with 155 sequences of 2 and 3 motions. Since the dataset consists of complete feature\ntrajectories (incomplete trajectories were removed manually to form the dataset), we select \u03c1 fraction\nof feature points across all frames uniformly at random and remove their x \u2212 y coordinate values.\nLeft plot in Figure 4 shows clustering error bars of different algorithms on the dataset as a function of\n\u03c1. Notice that in all cases, MFA and SSC-CEC have large errors, due to, respectively, dependence\non initialization and the heuristic convex reformulation. On the other hand, LRMC+SSC and SSC-\nLifting perform well, achieving less than 5% error for all values of \u03c1. This comes from the fact that\nsequences have at most L = 3 motions and dimension of each motion subspace is at most d = 4,\nhence, Ld \u2264 12 (cid:28) 2F , where F is the number of video frames. Since the data matrix is low-rank\nand LRMC succeeds, SSC and our method achieve roughly the same errors for different values of \u03c1.\n\n7\n\n00.10.20.30.40.50.60.70.80.9Missing entries fraction00.20.40.60.81Completion errorMFAKSub-MLRMCSSC-lifting00.10.20.30.40.50.60.70.80.9Missing entries fraction00.20.40.60.81Completion errorMFAKSub-MLRMCSSC-lifting2030405060708090100110120n00.20.40.60.81Completion errorMFAKSub-MLRMCSSC-lifting2030405060708090100110120Ng00.20.40.60.8Completion errorLRMCSSC-lifting\fFigure 3: Clustering errors for low-rank matrices (top row) with L = 3, d = 5, n = 100 and high-rank matrices\n(bottom row) with L = 12, d = 10, n = 100 as a function of (\u03c1, \u03b4), where \u03b4 is the fraction of data with missing\nentires (vertical axis) and \u03c1 is the fraction of missing entries in each affected point (horizontal axis). Left to\nRight: MFA, SSC-CEC, LRMC+SSC and SSC-Lifting.\n\nFigure 4: Left: Clustering error bars of MFA, LRMC+LSA, LRMC+SSC, SSC-CEC and SSC-Lifting as a\nfunction of the fraction of missing entries, \u03c1. Middle: Singular values of CMU Mocap data reveal that each\nactivity lie in a low-dimensional subspace. Right: Average completion errors of MFA, LRMC and SSC-Lifting\non the CMU Mocap Dataset as a function of \u03c1. Solid lines correspond to \u03b4 = 0.5, i.e., 50% of data have missing\nentries, while dashed lines correspond to \u03b4 = 1, i.e., all data have missing entries.\n\n5.3 Real Experiments on Motion Capture Data\nWe consider completion of time-series trajectories from motion capture sensors, where a trajectory\nconsists of different human activities, such as running, jumping, squatting, etc. We use the CMU\nMocap dataset, where each data point corresponds to measurements from n sensors at a particular\ntime instant. Since transition from one activity to another happens gradually, we do not consider\nclustering. However, as the middle plot in Figure 4 shows, excluding the transition time periods, data\nfrom each activity lie in a low-rank subspace. Since typically there are L \u2248 7 activities, each having a\ndimension of d \u2248 8, and there are n = 42 sensors, the data matrix is full-rank, as Ld \u2248 56 > n = 42.\nTo evaluate performance of different algorithms, we select \u03b4 \u2208 {0.5, 1.0} fraction of data points and\nremove entries of \u03c1 \u2208 {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7} fraction of each selected point, both uniformly\nat random. Right plot in Figure 4 shows completion errors of different algorithms as a function of \u03c1\nfor \u03b4 \u2208 {0.5, 1.0}. Notice that, unlike the previous experiment, since the data matrix is high-rank,\nLRMC has a large completion error, similar to synthetic experiments. On the other hand, SSC-Lifting\nerror is less than 0.1 for \u03c1 = 0.1 and less than 0.55 for \u03c1 = 0.7. In all cases, for \u03b4 = 1, the\nperformance degrades with respect to \u03b4 = 0.5. Lastly, it is important to notice that MFA performs\nslightly better than LRMC, demonstrating the importance of the union of low-dimensional subspaces\nmodel for the problem. However, getting trapped in local optima does not allow MFA to take full\nadvantage of such a model, as opposed to SSC-Lifting.\n\n6 Conclusions\nWe proposed ef\ufb01cient algorithms, based on lifting, for simultaneous clustering and completion of\nincomplete multi-subspace data. By extensive experiments on synthetic and real data, we showed\nthat for low-rank data matrices, our algorithm performs on par with or better than low-rank matrix\ncompletion methods, while for high-rank data matrices, it signi\ufb01cantly outperforms existing algo-\nrithms. Theoretical guarantees of the proposed method and scaling the algorithm to large data is the\nsubject of our ongoing research.\n\n8\n\nCorrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction 0.10.30.50.70.910.80.60.40.200.20.40.60.8Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction0.10.30.50.70.910.80.60.40.2Corrupted data fractionMissing entries fraction 0.10.30.50.70.910.80.60.40.200.20.40.60.80.10.20.30.40.50.60.7Missing entries fraction00.10.20.30.4Clustering errorSSC-CECMFALRMC+LSALRMC+SSCSSC-lifting5101520051015IndexSingular values squatrunstandarm\u2212upjumpdrinkpunch0.10.20.30.40.50.60.7Missing entries fraction00.20.40.60.8Completion errorLRMCMFASSC-lifting\fReferences\n[1] R. Basri and D. Jacobs, \u201cLambertian re\ufb02ection and linear subspaces,\u201d IEEE Transactions on Pattern Analysis and Machine Intelligence,\n\nvol. 25, 2003.\n\n[2] T. Hastie and P. Simard, \u201cMetrics and models for handwritten character recognition,\u201d Statistical Science, 1998.\n[3] C. Tomasi and T. Kanade, \u201cShape and motion from image streams under orthography,\u201d International Journal of Computer Vision, vol. 9,\n\n1992.\n\n[4] E. Elhamifar and R. Vidal, \u201cSparse subspace clustering: Algorithm, theory, and applications,\u201d IEEE Transactions on Pattern Analysis\n\nand Machine Intelligence, 2013.\n\n[5] G. Chen and G. Lerman, \u201cSpectral curvature clustering (SCC),\u201d International Journal of Computer Vision, vol. 81, 2009.\n[6] A. Zhang, N. Fawaz, S. Ioannidis, and A. Montanari, \u201cGuess who rated this movie: Identifying users through subspace clustering,\u201d\n\nUncertainty in Arti\ufb01cial Intelligence (UAI), 2012.\n\n[7] R. Vidal, R. Tron, and R. Hartley, \u201cMultiframe motion segmentation with missing data using PowerFactorization and GPCA,\u201d Interna-\n\ntional Journal of Computer Vision, vol. 79, 2008.\n\n[8] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, \u201cOnline dictionary learning for sparse coding,\u201d in International Conference on Machine\n\nLearning, 2009.\n\n[9] D. Park, J. Neeman, J. Zhang, S. Sanghavi, and I. S. Dhillon, \u201cPreference completion: Large-scale collaborative ranking from pairwise\n\ncomparisons,\u201d International Conference on Machine Learning (ICML), 2015.\n\n[10] M. Tipping and C. Bishop, \u201cProbabilistic principal component analysis,\u201d Journal of the Royal Statistical Society, vol. 61, 1999.\n[11] M. Knott and D. Bartholomew, Latent variable models and factor analysis. London: Edward Arnold, 1999.\n[12] E. J. Cand\u00e8s and B. Recht, \u201cExact matrix completion via convex optimization,\u201d Foundations of Computational Mathematics, vol. 9, 2008.\n[13] E. J. Cand\u00e8s and Y. Plan, \u201cMatrix completion with noise,\u201d Proceedings of the IEEE, 2009.\n[14] R. Keshavan, A. Montanari, and S. Oh, \u201cMatrix completion from noisy entries,\u201d IEEE Transactions on Information Theory, 2010.\n[15] Y. Chen, H. Xu, C. Caramanis, and S. Sanghavi, \u201cRobust matrix completion with corrupted columns,\u201d in International Conference on\n\nMachine Learning (ICML), 2011.\n\n[16] S. Bhojanapalli and P. Jain, \u201cUniversal matrix completion,\u201d International Conference on Machine Learning (ICML), 2013.\n[17] K. Y. Chiang, C. J. Hsieh, and I. S. Dhillon, \u201cMatrix completion with noisy side information,\u201d Neural Information Processing Systems\n\n(NIPS), 2015.\n\n[18] M. Tipping and C. Bishop, \u201cMixtures of probabilistic principal component analyzers,\u201d Neural Computation, vol. 11, 1999.\n[19] A. Gruber and Y. Weiss, \u201cMultibody factorization with uncertainty and missing data using the em algorithm,\u201d IEEE Conference on\n\nComputer Vision and Pattern Recognition (CVPR), 2004.\n\n[20] Z. Ghahramani and G. E. Hinton, \u201cThe em algorithm for mixtures of factor analyzers,\u201d Technical Report CRG-TR-96-1, Dept. Computer\n\nScience, Univ. of Toronto, 1996.\n\n[21] L. Balzano, A. Szlam, B. Recht, and R. Nowak, \u201cK-subspaces with missing data,\u201d IEEE Statistical Signal Processing Workshop, 2012.\n[22] B. Eriksson, L. Balzano, and R. Nowak, \u201cHigh rank matrix completion,\u201d International Conference on Arti\ufb01cial Intelligence and Statistics,\n\n2012.\n\n[23] E. J. Candes, L. Mackey, and M. Soltanolkotabi, \u201cFrom robust subspace clustering to full-rank matrix completion,\u201d Unpublished abstract,\n\n2014.\n\n[24] C. Yang, D. Robinson, and R. Vidal, \u201cSparse subspace clustering with missing entries,\u201d International Conference on Machine Learning\n\n(ICML), 2015.\n\n[25] R. Ganti, L. Balzano, and R. Willett, \u201cMatrix completion under monotonic single index models,\u201d Neural Information Processing Systems\n\n(NIPS), 2015.\n\n[26] E. Elhamifar and R. Vidal, \u201cSparse subspace clustering,\u201d in IEEE Conference on Computer Vision and Pattern Recognition, 2009.\n[27] A. Ng, Y. Weiss, and M. Jordan, \u201cOn spectral clustering: analysis and an algorithm,\u201d in Neural Information Processing Systems, 2001.\n[28] M. Soltanolkotabi, E. Elhamifar, and E. J. Candes, \u201cRobust subspace clustering,\u201d Annals of Statistics, 2014.\n[29] B. Zhao, G. Rocha, and B. Yu, \u201cThe composite absolute penalties family for grouped and hierarchical selection,\u201d The Annals of Statistics,\n\nvol. 37, 2009.\n\n[30] R. Jenatton, J. Y. Audibert, and F. Bach, \u201cStructured variable selection with sparsity-inducing norms,\u201d Journal of Machine Learning\n\nResearch, vol. 12, 2011.\n\n[31] J. Tropp, \u201cColumn subset selection, matrix factorization, and eigenvalue optimization,\u201d in ACM-SIAM Symp. Discrete Algorithms\n\n(SODA), 2009.\n\n[32] E. Elhamifar, G. Sapiro, and S. S. Sastry, \u201cDissimilarity-based sparse subset selection,\u201d IEEE Transactions on Pattern Analysis and\n\nMachine Intelligence, 2016.\n\n[33] E. Elhamifar, G. Sapiro, and R. Vidal, \u201cSee all by looking at a few: Sparse modeling for \ufb01nding representative objects,\u201d in IEEE\n\nConference on Computer Vision and Pattern Recognition, 2012.\n\n[34] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, \u201cDistributed optimization and statistical learning via the alternating direction\n\nmethod of multipliers,\u201d Foundations and Trends in Machine Learning, vol. 3, 2010.\n\n[35] D. Gabay and B. Mercier, \u201cA dual algorithm for the solution of nonlinear variational problems via \ufb01nite-element approximations,\u201d Comp.\n\nMath. Appl., vol. 2, 1976.\n\n[36] J. Yan and M. Pollefeys, \u201cA general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and\n\nnon-degenerate,\u201d in European Conf. on Computer Vision, 2006.\n\n[37] J. Costeira and T. Kanade, \u201cA multibody factorization method for independently moving objects.\u201d Int. Journal of Computer Vision,\n\nvol. 29, 1998.\n\n[38] K. Kanatani, \u201cMotion segmentation by subspace separation and model selection,\u201d in IEEE Int. Conf. on Computer Vision, vol. 2, 2001.\n\n9\n\n\f", "award": [], "sourceid": 56, "authors": [{"given_name": "Ehsan", "family_name": "Elhamifar", "institution": "Northeastern University"}]}