{"title": "Greedy Subspace Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 2753, "page_last": 2761, "abstract": "We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets to estimate the subspaces. As the geometric structure of the clusters (linear subspaces) forbids proper performance of general distance based approaches such as K-means, many model-specific methods have been proposed. In this paper, we provide new simple and efficient algorithms for this problem. Our statistical analysis shows that the algorithms are guaranteed exact (perfect) clustering performance under certain conditions on the number of points and the affinity be- tween subspaces. These conditions are weaker than those considered in the standard statistical literature. Experimental results on synthetic data generated from the standard unions of subspaces model demonstrate our theory. We also show that our algorithm performs competitively against state-of-the-art algorithms on real-world applications such as motion segmentation and face clustering, with much simpler implementation and lower computational cost.", "full_text": "Greedy Subspace Clustering\n\nDohyung Park\n\nConstantine Caramanis\n\nDept. of Electrical and Computer Engineering\n\nDept. of Electrical and Computer Engineering\n\nThe University of Texas at Austin\n\ndhpark@utexas.edu\n\nThe University of Texas at Austin\nconstantine@utexas.edu\n\nSujay Sanghavi\n\nDept. of Electrical and Computer Engineering\n\nThe University of Texas at Austin\nsanghavi@mail.utexas.edu\n\nAbstract\n\nWe consider the problem of subspace clustering: given points that lie on or near\nthe union of many low-dimensional linear subspaces, recover the subspaces. To\nthis end, one \ufb01rst identi\ufb01es sets of points close to the same subspace and uses the\nsets to estimate the subspaces. As the geometric structure of the clusters (linear\nsubspaces) forbids proper performance of general distance based approaches such\nas K-means, many model-speci\ufb01c methods have been proposed. In this paper,\nwe provide new simple and ef\ufb01cient algorithms for this problem. Our statistical\nanalysis shows that the algorithms are guaranteed exact (perfect) clustering perfor-\nmance under certain conditions on the number of points and the af\ufb01nity between\nsubspaces. These conditions are weaker than those considered in the standard\nstatistical literature. Experimental results on synthetic data generated from the\nstandard unions of subspaces model demonstrate our theory. We also show that\nour algorithm performs competitively against state-of-the-art algorithms on real-\nworld applications such as motion segmentation and face clustering, with much\nsimpler implementation and lower computational cost.\n\n1\n\nIntroduction\n\nSubspace clustering is a classic problem where one is given points in a high-dimensional ambient\nspace and would like to approximate them by a union of lower-dimensional linear subspaces. In\nparticular, each subspace contains a subset of the points. This problem is hard because one needs to\njointly \ufb01nd the subspaces, and the points corresponding to each; the data we are given are unlabeled.\nThe unions of subspaces model naturally arises in settings where data from multiple latent phenom-\nena are mixed together and need to be separated. Applications of subspace clustering include motion\nsegmentation [23], face clustering [8], gene expression analysis [10], and system identi\ufb01cation [22].\nIn these applications, data points with the same label (e.g., face images of a person under varying\nillumination conditions, feature points of a moving rigid object in a video sequence) lie on a low-\ndimensional subspace, and the mixed dataset can be modeled by unions of subspaces. For detailed\ndescription of the applications, we refer the readers to the reviews [10, 20] and references therein.\nThere is now a sizable literature on empirical methods for this particular problem and some statis-\ntical analysis as well. Many recently proposed methods, which perform remarkably well and have\ntheoretical guarantees on their performances, can be characterized as involving two steps: (a) \ufb01nd-\ning a \u201cneighborhood\u201d for each data point, and (b) \ufb01nding the subspaces and/or clustering the points\ngiven these neighborhoods. Here, neighbors of a point are other points that the algorithm estimates\nto lie on the same subspace as the point (and not necessarily just closest in Euclidean distance).\n\n1\n\n\fAlgorithm\nSSC [4, 16]\nLRR [14]\n\nSSC-OMP [3]\n\nTSC [6, 7]\nLRSSC [24]\nNSN+GSR\n\nNSN+Spectral\n\nWhat is guaranteed\n\nCorrect neighborhoods\n\nSubspace\ncondition\n\nNone\n\nExact clustering\n\nCorrect neighborhoods\n\nExact clustering\n\nCorrect neighborhoods\n\nExact clustering\nExact clustering\n\nNo intersection\nNo intersection\n\nNone\nNone\nNone\nNone\n\nConditions for:\n\n\u21b5\n\n=\n\nFully random model\np = O( log(n/d)\nd\nlog(nL) )\n-\n-\nlog(nL) )\n=\nlog(nL) )\nlog n\nlog(ndL) ) max a\u21b5=\nlog(ndL) )\n\nd\np = O(\nd\np = O(\nd\np = O(\nd\np = O(\n\nlog n\n\n\u21b5\n\n1\n\n1\n\nmax a\n\nmax a\n\nSemi-random model\n\nplog(n/d)\nlog(nL) )\n\nlog(nL) )\n\nO(\n-\n-\nO(\n-\n(log dL)\u00b7log(ndL) )\n-\n\nlog n\n\n1\n\nO(q\n\nTable 1: Subspace clustering algorithms with theoretical guarantees. LRR and SSC-OMP have only\ndeterministic guarantees, not statistical ones. In the two standard statistical models, there are n data\npoints on each of L d-dimensional subspaces in Rp. For the de\ufb01nition of max a\u21b5, we refer the\nreaders to Section 3.1.\n\nOur contributions: In this paper we devise new algorithms for each of the two steps above; (a) we\ndevelop a new method, Nearest Subspace Neighbor (NSN), to determine a neighborhood set for each\npoint, and (b) a new method, Greedy Subspace Recovery (GSR), to recover subspaces from given\nneighborhoods. Each of these two methods can be used in conjunction with other methods for the\ncorresponding other step; however, in this paper we focus on two algorithms that use NSN followed\nby GSR and Spectral clustering, respectively. Our main result is establishing statistical guarantees\nfor exact clustering with general subspace conditions, in the standard models considered in recent\nanalytical literature on subspace clustering. Our condition for exact recovery is weaker than the\nconditions of other existing algorithms that only guarantee correct neighborhoods1, which do not\nalways lead to correct clustering. We provide numerical results which demonstrate our theory. We\nalso show that for the real-world applications our algorithm performs competitively against those\nof state-of-the-art algorithms, but the computational cost is much lower than them. Moreover, our\nalgorithms are much simpler to implement.\n\n1.1 Related work\n\nThe problem was \ufb01rst formulated in the data mining community [10]. Most of the related work in\nthis \ufb01eld assumes that an underlying subspace is parallel to some canonical axes. Subspace cluster-\ning for unions of arbitrary subspaces is considered mostly in the machine learning and the computer\nvision communities [20]. Most of the results from those communities are based on empirical justi-\n\ufb01cation. They provided algorithms derived from theoretical intuition and showed that they perform\nempirically well with practical dataset. To name a few, GPCA [21], Spectral curvature clustering\n(SCC) [2], and many iterative methods [1, 19, 26] show their good empirical performance for sub-\nspace clustering. However, they lack theoretical analysis that guarantees exact clustering.\nAs described above, several algorithms with a common structure are recently proposed with both\ntheoretical guarantees and remarkable empirical performance. Elhamifar and Vidal [4] proposed an\nalgorithm called Sparse Subspace Clustering (SSC), which uses `1-minimization for neighborhood\nconstruction. They proved that if the subspaces have no intersection2, SSC always \ufb01nds a correct\nneighborhood matrix. Later, Soltanolkotabi and Candes [16] provided a statistical guarantee of the\nalgorithm for subspaces with intersection. Dyer et al. [3] proposed another algorithm called SSC-\nOMP, which uses Orthogonal Matching Pursuit (OMP) instead of `1-minimization in SSC. Another\nalgorithm called Low-Rank Representation (LRR) which uses nuclear norm minimization is pro-\nposed by Liu et al. [14]. Wang et al. [24] proposed an hybrid algorithm, Low-Rank and Sparse Sub-\nspace Clustering (LRSSC), which involves both `1-norm and nuclear norm. Heckel and B\u00a8olcskei [6]\npresented Thresholding based Subspace Clustering (TSC), which constructs neighborhoods based\non the inner products between data points. All of these algorithms use spectral clustering for the\nclustering step.\nThe analysis in those papers focuses on neither exact recovery of the subspaces nor exact clustering\nin general subspace conditions. SSC, SSC-OMP, and LRSSC only guarantee correct neighbor-\nhoods which do not always lead to exact clustering. LRR guarantees exact clustering only when\n\n1By correct neighborhood, we mean that for each point every neighbor point lies on the same subspace.\n2By no intersection between subspaces, we mean that they share only the null point.\n\n2\n\n\fthe subspaces have no intersections. In this paper, we provide novel algorithms that guarantee exact\nclustering in general subspace conditions. When we were preparing this manuscript, it is proved\nthat TSC guarantees exact clustering under certain conditions [7], but the conditions are stricter than\nours. (See Table 1)\n\n1.2 Notation\nThere is a set of N data points in Rp, denoted by Y = {y1, . . . , yN}. The data points are lying on\nor near a union of L subspaces D = [L\ni=1Di. Each subspace Di is of dimension di which is smaller\nthan p. For each point yj, wj denotes the index of the nearest subspace. Let Ni denote the number\nof points whose nearest subspace is Di, i.e., Ni = PN\nj=1 Iwj =i. Throughout this paper, sets and\nsubspaces are denoted by calligraphic letters. Matrices and key parameters are denoted by letters\nin upper case, and vectors and scalars are denoted by letters in lower case. We frequently denote\nthe set of n indices by [n] = {1, 2, . . . , n}. As usual, span{\u00b7} denotes a subspace spanned by a\nset of vectors. For example, span{v1, . . . , vn} = {v : v =Pn\ni=1 \u21b5ivi,\u21b5 1, . . . ,\u21b5 n 2 R}. ProjU y\nis de\ufb01ned as the projection of y onto subspace U. That is, ProjU y = arg minu2U ky  uk2. I{\u00b7}\ndenotes the indicator function which is one if the statement is true and zero otherwise. Finally,L\ndenotes the direct sum.\n\n2 Algorithms\n\nWe propose two algorithms for subspace clustering as follows.\n\n\u2022 NSN+GSR : Run Nearest Subspace Neighbor (NSN) to construct a neighborhood matrix\n\u2022 NSN+Spectral : Run Nearest Subspace Neighbor (NSN) to construct a neighborhood ma-\n\nW 2{ 0, 1}N\u21e5N, and then run Greedy Subspace Recovery (GSR) for W .\ntrix W 2{ 0, 1}N\u21e5N, and then run spectral clustering for Z = W + W >.\n\n2.1 Nearest Subspace Neighbor (NSN)\n\nNSN approaches the problem of \ufb01nding neighbor points most likely to be on the same subspace in\na greedy fashion. At \ufb01rst, given a point y without any other knowledge, the one single point that is\nmost likely to be a neighbor of y is the nearest point of the line span{y}. In the following steps, if\nwe have found a few correct neighbor points (lying on the same true subspace) and have no other\nknowledge about the true subspace and the rest of the points, then the most potentially correct point\nis the one closest to the subspace spanned by the correct neighbors we have. This motivates us to\npropose NSN described in the following.\n\nAlgorithm 1 Nearest Subspace Neighbor (NSN)\nInput: A set of N samples Y = {y1, . . . , yN}, The number of required neighbors K, Maximum\nOutput: A neighborhood matrix W 2{ 0, 1}N\u21e5N\n\nsubspace dimension kmax.\nyi yi/kyik2, 8i 2 [N ]\nfor i = 1, . . . , N do\n\n. Normalize magnitudes\n. Run NSN for each data point\n\n. Iteratively add the closest point to the current subspace\n\nIi { i}\nfor k = 1, . . . , K do\nif k \uf8ff kmax then\nend if\nj\u21e4 arg maxj2[N ]\\Ii kProjU yjk2\nIi I i [{ j\u21e4}\n\nU span{yj : j 2I i}\n\nend for\nWij Ij2Ii or yj2U , 8j 2 [N ]\n\nend for\n\n. Construct the neighborhood matrix\n\nNSN collects K neighbors sequentially for each point. At each step k, a k-dimensional subspace U\nspanned by the point and its k  1 neighbors is constructed, and the point closest to the subspace is\n\n3\n\n\fnewly collected. After k  kmax, the subspace U constructed at the kmaxth step is used for collect-\ning neighbors. At last, if there are more points lying on U, they are also counted as neighbors. The\nsubspace U can be stored in the form of a matrix U 2 Rp\u21e5dim(U) whose columns form an orthonor-\nmal basis of U. Then kProjU yjk2 can be computed easily because it is equal to kU>yjk2. While\na naive implementation requires O(K2pN 2) computational cost, this can be reduced to O(KpN 2),\nand the faster implementation is described in Section A.1. We note that this computational cost is\nmuch lower than that of the convex optimization based methods (e.g., SSC [4] and LRR [14]) which\nsolve a convex program with N 2 variables and pN constraints.\nNSN for subspace clustering shares the same philosophy with Orthogonal Matching Pursuit (OMP)\nfor sparse recovery in the sense that it incrementally picks the point (dictionary element) that is\nthe most likely to be correct, assuming that the algorithms have found the correct ones. In subspace\nclustering, that point is the one closest to the subspace spanned by the currently selected points, while\nin sparse recovery it is the one closest to the residual of linear regression by the selected points. In\nthe sparse recovery literature, the performance of OMP is shown to be comparable to that of Basis\nPursuit (`1-minimization) both theoretically and empirically [18, 11]. One of the contributions of\nthis work is to show that this high-level intuition is indeed born out, provable, as we show that NSN\nalso performs well in collecting neighbors lying on the same subspace.\n\n2.2 Greedy Subspace Recovery (GSR)\n\nSuppose that NSN has found correct neighbors for a data point. How can we check if they are\nindeed correct, that is, lying on the same true subspace? One natural way is to count the number\nof points close to the subspace spanned by the neighbors. If they span one of the true subspaces,\nthen many other points will be lying on the span. If they do not span any true subspaces, few points\nwill be close to it. This fact motivates us to use a greedy algorithm to recover the subspaces. Using\nthe neighborhood constructed by NSN (or some other algorithm), we recover the L subspaces. If\nthere is a neighborhood set containing only the points on the same subspace for each subspace, the\nalgorithm successfully recovers the unions of the true subspaces exactly.\n\nAlgorithm 2 Greedy Subspace Recovery (GSR)\nInput: N points Y = {y1, . . . , yN}, A neighborhood matrix W 2{ 0, 1}N\u21e5N, Error bound \u270f\nOutput: Estimated subspaces \u02c6D = [L\nyi yi/kyik2, 8i 2 [N ]\n. Normalize magnitudes\nWi Top-d{yj : Wij = 1}, 8i 2 [N ] . Estimate a subspace using the neighbors for each point\nI [N ]\nwhile I6 = ; do\n. Iteratively pick the best subspace estimates\ni\u21e4 arg maxi2IPN\n\u02c6Dl \u02c6Wi\u21e4\nI I \\ { j : kProjWi\u21e4\nend while\n\u02c6wi arg maxl2[L] kProj \u02c6Dl\n\nyjk2  1  \u270f}\nyik2, 8i 2 [N ]\n\n. Label the points using the subspace estimates\n\n\u02c6Dl. Estimated labels \u02c6w1, . . . , \u02c6wN\n\nl=1\n\nj=1 I{kProjWiyjk2  1  \u270f}\n\nRecall that the matrix W contains the labelings of the points, so that Wij = 1 if point i is assigned\nto subspace j. Top-d{yj : Wij = 1} denotes the d-dimensional principal subspace of the set of\nvectors {yj : Wij = 1}. This can be obtained by taking the \ufb01rst d left singular vectors of the\nmatrix whose columns are the vector in the set. If there are only d vectors in the set, Gram-Schmidt\northogonalization will give us the subspace. As in NSN, it is ef\ufb01cient to store a subspace Wi in\nthe form of its orthogonal basis because we can easily compute the norm of a projection onto the\nsubspace.\nTesting a candidate subspace by counting the number of near points has already been considered in\nthe subspace clustering literature. In [25], the authors proposed to run RANdom SAmple Consensus\n(RANSAC) iteratively. RANSAC randomly selects a few points and checks if there are many other\npoints near the subspace spanned by the collected points. Instead of randomly choosing sample\npoints, GSR receives some candidate subspaces (in the form of sets of points) from NSN (or possibly\nsome other algorithm) and selects subspaces in a greedy way as speci\ufb01ed in the algorithm above.\n\n4\n\n\f3 Theoretical results\n\nWe analyze our algorithms in two standard noiseless models. The main theorems present suf\ufb01cient\nconditions under which the algorithms cluster the points exactly with high probability. For simplicity\nof analysis, we assume that every subspace is of the same dimension, and the number of data points\non each subspace is the same, i.e., d , d1 = \u00b7\u00b7\u00b7 = dL, n , N1 = \u00b7\u00b7\u00b7 = NL. We assume that d\nis known to the algorithm. Nonetheless, our analysis can extend to the general case.\n\n3.1 Statistical models\n\nWe consider two models which have been used in the subspace clustering literature:\n\nalso iid randomly generated.\n\n\u2022 Fully random model: The subspaces are drawn iid uniformly at random, and the points are\n\u2022 Semi-random model: The subspaces are arbitrarily determined, but the points are iid ran-\n\ndomly generated.\n\nLet Di 2 Rp\u21e5d, i 2 [L] be a matrix whose columns form an orthonormal basis of Di. An important\nmeasure that we use in the analysis is the af\ufb01nity between two subspaces, de\ufb01ned as\n\na\u21b5(i, j) , kD>i DjkFpd\n\n=sPd\n\nk=1 cos2 \u2713i,j\n\nk\n\nd\n\n2 [0, 1],\n\nwhere \u2713i,j\nk is the kth principal angle between Di and Dj. Two subspaces Di and Dj are identical if\nand only if a\u21b5(i, j) = 1. If a\u21b5(i, j) = 0, every vector on Di is orthogonal to any vectors on Dj. We\nalso de\ufb01ne the maximum af\ufb01nity as\n\nmax a\u21b5 , max\n\ni,j2[L],i6=j\n\na\u21b5(i, j) 2 [0, 1].\n\nThere are N = nL points, and there are n points exactly lying on each subspace. We assume that\neach data point yi is drawn iid uniformly at random from Sp1 \\D wi where Sp1 is the unit sphere\nin Rp. Equivalently,\n\nyi = Dwixi,\n\nxi \u21e0 Unif(Sd1),\n\n8i 2 [N ].\n\nAs the points are generated randomly on their corresponding subspaces, there are no points lying on\nan intersection of two subspaces, almost surely. This implies that with probability one the points are\nclustered correctly provided that the true subspaces are recovered exactly.\n\n3.2 Main theorems\n\nThe \ufb01rst theorem gives a statistical guarantee for the fully random model.\n\nTheorem 1 Suppose L d-dimensional subspaces and n points on each subspace are generated in\nthe fully random model with n polynomial in d. There are constants C1, C2 > 0 such that if\n\nn\nd\n\nne\n\n,\n\n> C1\u21e3log\n1 , NSN+GSR3 clusters the points exactly. Also, there are\nthen with probability at least 1  3L\nother constants C01, C02 > 0 such that if (1) with C1 and C2 replaced by C01 and C02 holds then\nNSN+Spectral4 clusters the points exactly with probability at least 1  3L\n1 . e is the exponential\nconstant.\n\nd\u23182\n\nlog(ndL1)\n\nC2 log n\n\nd\np\n\n<\n\n,\n\n(1)\n\n3NSN with K = kmax = d followed by GSR with arbitrarily small \u270f.\n4NSN with K = kmax = d.\n\n5\n\n\fOur suf\ufb01cient conditions for exact clustering explain when subspace clustering becomes easy or\ndif\ufb01cult, and they are consistent with our intuition. For NSN to \ufb01nd correct neighbors, the points on\nthe same subspace should be many enough so that they look like lying on a subspace. This condition\nis spelled out in the \ufb01rst inequality of (1). We note that the condition holds even when n/d is a\nconstant, i.e., n is linear in d. The second inequality implies that the dimension of the subspaces\nshould not be too high for subspaces to be distinguishable. If d is high, the random subspaces are\nmore likely to be close to each other, and hence they become more dif\ufb01cult to be distinguished.\nHowever, as n increases, the points become dense on the subspaces, and hence it becomes easier to\nidentify different subspaces.\nLet us compare our result with the conditions required for success in the fully random model in the\nexisting literature. In [16], it is required for SSC to have correct neighborhoods that n should be\nsuperlinear in d when d/p \ufb01xed. In [6, 24], the conditions on d/p becomes worse as we have more\npoints. On the other hand, our algorithms are guaranteed exact clustering of the points, and the\nsuf\ufb01cient condition is order-wise at least as good as the conditions for correct neighborhoods by the\nexisting algorithms (See Table 1). Moreover, exact clustering is guaranteed even when n is linear in\nd, and d/p \ufb01xed.\nFor the semi-random model, we have the following general theorem.\n\nn\nd\n\nne\n\n.\n\n(2)\n\nC2 log n\n\nTheorem 2 Suppose L d-dimensional subspaces are arbitrarily chosen, and n points on each\nsubspace are generated in the semi-random model with n polynomial in d. There are constants\nC1, C2 > 0 such that if\n\n> C1\u21e3log\n\nlog(dL1) \u00b7 log(ndL1)\n\n, max a\u21b5 <s\nd\u23182\n1 , NSN+GSR5 clusters the points exactly.\nthen with probability at least 1  3L\nIn the semi-random model, the suf\ufb01cient condition does not depend on the ambient dimension p.\nWhen the af\ufb01nities between subspaces are \ufb01xed, and the points are exactly lying on the subspaces,\nthe dif\ufb01culty of the problem does not depend on the ambient dimension.\nIt rather depends on\nmax a\u21b5, which measures how close the subspaces are. As they become closer to each other, it\nbecomes more dif\ufb01cult to distinguish the subspaces. The second inequality of (2) explains this in-\ntuition. The inequality also shows that if we have more data points, the problem becomes easier to\nidentify different subspaces.\nCompared with other algorithms, NSN+GSR is guaranteed exact clustering, and more importantly,\nthe condition on max a\u21b5 improves as n grows. This remark is consistent with the practical per-\nformance of the algorithm which improves as the number of data points increases, while the ex-\nIn [16], correct neighborhoods in SSC are guar-\nisting guarantees of other algorithms are not.\nIn [6], exact clustering of TSC is guaranteed if\nmax a\u21b5 = O(1/ log(nL)). However, these algorithms perform empirically better as the number of\ndata points increases.\n\nanteed if max a\u21b5 = O(plog(n/d)/ log(nL)).\n\n4 Experimental results\n\nIn this section, we empirically compare our algorithms with the existing algorithms in terms of\nclustering performance and computational time (on a single desktop). For NSN, we used the fast\nimplementation described in Section A.1. The compared algorithms are K-means, K-\ufb02ats6, SSC,\nLRR, SCC, TSC7, and SSC-OMP8. The numbers of replicates in K-means, K-\ufb02ats, and the K-\n\n5NSN with K = d  1 and kmax = d2 log de followed by GSR with arbitrarily small \u270f.\n6K-\ufb02ats is similar to K-means. At each iteration, it computes top-d principal subspaces of the points with\n\nthe same label, and then labels every point based on its distances to those subspaces.\n\n7The MATLAB codes for SSC, LRR, SCC, and TSC are obtained from http://www.cis.\nand\nhttp://www.nari.ee.ethz.ch/\n\njhu.edu/\u02dcehsan/code.htm,\nhttp://www.math.duke.edu/\u02dcglchen/scc.html,\ncommth/research/downloads/sc.html, respectively.\n\nhttps://sites.google.com/site/guangcanliu,\n\n8For each data point, OMP constructs a neighborhood for each point by regressing the point on the other\n\npoints up to 104 accuracy.\n\n6\n\n\fSSC\n\nSSC\u2212OMP\n\nLRR\n\nTSC\n\nNSN+Spectral\n\nNSN+GSR\n\ni\n\n)\np\n(\n \nn\no\ns\nn\ne\nm\nd\n\ni\n\n \nt\n\ni\n\nn\ne\nb\nm\nA\n\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\n2 4 6 8 10\n\n2 4 6 8 10\n\n2 4 6 8 10\n\nNumber of points per dimension for each subspace (n/d)\n\n2 4 6 8 10\n\n2 4 6 8 10\n\n50\n35\n20\n10\n5\n\n \n\n2 4 6 8 10\n\n \n\n1\n0.8\n0.6\n0.4\n0.2\n0\n\nFigure 1: CE of algorithms on 5 random d-dimensional subspaces and n random points on each\nsubspace. The \ufb01gures shows CE for different numbers of n/d and ambient dimension p. d/p is\n\ufb01xed to be 3/5. Brighter cells represent that less data points are clustered incorrectly.\n\nl1\u2212minimization (SSC)\n\nOMP (SSC\u2212OMP)\n\n)\np\n(\n \n\ni\n\nn\no\ns\nn\ne\nm\nd\n\ni\n\n \nt\n\ni\n\nn\ne\nb\nm\nA\n\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\n2\n\n4\n\n6\n\n8 10\n\n2\n\n4\n\nNuclear norm min. (LRR)\n50\n35\n20\n10\n5\n\nNearest neighbor (TSC)\n50\n35\n20\n10\n5\n\n50\n35\n20\n10\n5\n\nNSN\n\n \n\n \n\n2\n\n4\n\n6\n\n8 10\n\n1\n0.8\n0.6\n0.4\n0.2\n0\n\n8 10\n\n6\n6\nNumber of points per dimension for each subspace (n/d)\n\n8 10\n\n4\n\n2\n\n2\n\n4\n\n6\n\n8 10\n\nFigure 2: NSE for the same model parameters as those in Figure 1. Brighter cells represent that\nmore data points have all correct neighbors.\n\n)\nc\ne\ns\n(\n \n\ne\nm\nT\n\ni\n\n5\n\n4\n\n3\n\n2\n\n1\n\n \n0\n20\n\n100\u2212dim ambient space, five 10\u2212dim subspaces\n\nl1\u2212minimization (SSC)\nOMP (SSC\u2212OMP)\nNuclear norm min. (LRR)\nThresholding (TSC)\nNSN\n\n40\n\n60\n\nNumber of data points per subspace (n)\n\n80\n\n \n\n100\n\n100\u2212dim ambient space, 10\u2212dim subspaces, 20 points/subspace\n\n \n\n5\n\n)\nc\ne\ns\n(\n \n\ne\nm\nT\n\ni\n\n4\n\n3\n\n2\n\n1\n\n0\n\n \n5\n\n10\n\n15\n\nNumber of subspaces (L)\n\n20\n\n25\n\nFigure 3: Average computational time of the neighborhood selection algorithms\n\nmeans used in the spectral clustering are all \ufb01xed to 10. The algorithms are compared in terms of\nClustering error (CE) and Neighborhood selection error (NSE), de\ufb01ned as\n\n(CE) = min\n\u21e12\u21e7L\n\n1\nN\n\nNXi=1\n\nI(wi 6= \u21e1( \u02c6wi)),\n\n(NSE) =\n\n1\nN\n\nI(9j : Wij 6= 0, wi 6= wj)\n\nNXi=1\n\nwhere \u21e7L is the permutation space of [L]. CE is the proportion of incorrectly labeled data points.\nSince clustering is invariant up to permutation of label indices, the error is equal to the minimum\ndisagreement over the permutation of label indices. NSE measures the proportion of the points\nwhich do not have all correct neighbors.9\n\n4.1 Synthetic data\n\nWe compare the performances on synthetic data generated from the fully random model. In Rp,\n\ufb01ve d-dimensional subspaces are generated uniformly at random. Then for each subspace n unit-\nnorm points are generated iid uniformly at random on the subspace. To see the agreement with the\ntheoretical result, we ran the algorithms under \ufb01xed d/p and varied n and d. We set d/p = 3/5 so\nthat each pair of subspaces has intersection. Figures 1 and 2 show CE and NSE, respectively. Each\nerror value is averaged over 100 trials. Figure 1 indicates that our algorithm clusters the data points\nbetter than the other algorithms. As predicted in the theorems, the clustering performance improves\n\n9For the neighborhood matrices from SSC, LRR, and SSC-OMP, the d points with the maximum weights\n\nare regarded as neighbors for each point. For TSC, the d nearest neighbors are collected for each point.\n\n7\n\n\fL\n\n2\n\n3\n\n19.80\n17.92\n\nAlgorithms K-means K-\ufb02ats\n13.62\n10.65\n0.80\n14.07\n14.18\n1.89\n\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\n\n26.10\n20.48\n\n-\n\n-\n\nSSC\n1.52\n0.00\n3.03\n4.40\n0.56\n5.39\n\nLRR\n2.13\n0.00\n3.42\n4.03\n1.43\n4.05\n\nSCC\n2.06\n0.00\n1.28\n6.37\n0.21\n2.16\n\nSSC-OMP(8)\n\n16.92\n12.77\n0.50\n27.96\n30.98\n0.82\n\nTSC(10)\n18.44\n16.92\n0.50\n28.58\n29.67\n1.15\n\nNSN+Spectral(5)\n\n3.62\n0.00\n0.25\n8.28\n2.76\n0.51\n\nTable 2: CE and computational time of algorithms on Hopkins155 dataset. L is the number of\nclusters (motions). The numbers in the parentheses represent the number of neighbors for each\npoint collected in the corresponding algorithms.\n\nL\n\n2\n\n3\n\n5\n\n10\n\n45.98\n47.66\n\n62.55\n63.54\n\nAlgorithms K-means K-\ufb02ats\n37.62\n39.06\n15.78\n45.81\n47.92\n27.91\n55.51\n56.25\n52.90\n62.72\n62.89\n134.0\n\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\nMean CE (%)\nMedian CE (%)\nAvg. Time (sec)\n\n73.77\n74.06\n\n82.68\n82.97\n\n-\n\n-\n\n-\n\n-\n\nSSC\n1.77\n0.00\n37.72\n5.77\n1.56\n49.45\n4.79\n2.97\n74.91\n9.43\n8.75\n157.5\n\nSSC-OMP\n\n4.45\n1.17\n0.45\n6.35\n2.86\n0.76\n8.93\n5.00\n1.41\n15.32\n17.11\n5.26\n\nTSC\n11.84\n1.56\n0.33\n20.02\n15.62\n0.60\n11.90\n33.91\n1.17\n39.48\n39.45\n3.17\n\nNSN+Spectral\n\n1.71\n0.78\n0.78\n3.63\n3.12\n3.37\n5.81\n4.69\n5.62\n9.82\n9.06\n14.73\n\nTable 3: CE and computational time of algorithms on Extended Yale B dataset. For each number of\nclusters (faces) L, the algorithms ran over 100 random subsets drawn from the overall 38 clusters.\n\nas the number of points increases. However, it also improves as the dimension of subspaces grows in\ncontrast to the theoretical analysis. We believe that this is because our analysis on GSR is not tight.\nIn Figure 2, we can see that more data points obtain correct neighbors as n increases or d decreases,\nwhich conforms the theoretical analysis.\nWe also compare the computational time of the neighborhood selection algorithms for different\nnumbers of subspaces and data points. As shown in Figure 3, the greedy algorithms (OMP, Thresh-\nolding, and NSN) are signi\ufb01cantly more scalable than the convex optimization based algorithms\n(`1-minimization and nuclear norm minimization).\n\n4.2 Real-world data : motion segmentation and face clustering\n\nWe compare our algorithm with the existing ones in the applications of motion segmentation and\nface clustering. For the motion segmentation, we used Hopkins155 dataset [17], which contains\n155 video sequences of 2 or 3 motions. For the face clustering, we used Extended Yale B dataset\nwith cropped images from [5, 13]. The dataset contains 64 images for each of 38 individuals in\nfrontal view and different illumination conditions. To compare with the existing algorithms, we\nused the set of 48 \u21e5 42 resized raw images provided by the authors of [4]. The parameters of the\nexisting algorithms were set as provided in their source codes.10 Tables 2 and 3 show CE and average\ncomputational time.11 We can see that NSN+Spectral performs competitively with the methods with\nthe lowest errors, but much faster. Compared to the other greedy neighborhood construction based\nalgorithms, SSC-OMP and TSC, our algorithm performs signi\ufb01cantly better.\n\nAcknowledgments\n\nThe authors would like to acknowledge NSF grants 1302435, 0954059, 1017525, 1056028 and\nDTRA grant HDTRA1-13-1-0024 for supporting this research. This research was also partially\nsupported by the U.S. Department of Transportation through the Data-Supported Transportation\nOperations and Planning (D-STOP) Tier 1 University Transportation Center.\n\n10As SSC-OMP and TSC do not have proposed number of parameters for motion segmentation, we found\n\nthe numbers minimizing the mean CE. The numbers are given in the table.\n\n11The LRR code provided by the author did not perform properly with the face clustering dataset that we\nused. We did not run NSN+GSR since the data points are not well distributed in its corresponding subspaces.\n\n8\n\n\fReferences\n[1] P. S. Bradley and O. L. Mangasarian. K-plane clustering. Journal of Global Optimization, 16(1):23\u201332,\n\n2000.\n\n[2] G. Chen and G. Lerman. Spectral curvature clustering. International Journal of Computer Vision, 81(3):\n\n317\u2013330, 2009.\n\n[3] E. L. Dyer, A. C. Sankaranarayanan, and R. G. Baraniuk. Greedy feature selection for subspace clustering.\n\nThe Journal of Machine Learning Research (JMLR), 14(1):2487\u20132517, 2013.\n\n[4] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory, and applications. Pattern\n\nAnalysis and Machine Intelligence, IEEE Transactions on, 35(11):2765\u20132781, 2013.\n\n[5] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models\nfor face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intelligence, 23\n(6):643\u2013660, 2001.\n\n[6] R. Heckel and H. B\u00a8olcskei. Subspace clustering via thresholding and spectral clustering. In IEEE Inter-\n\nnational Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013.\n\n[7] R. Heckel and H. B\u00a8olcskei.\n\narXiv:1307.4891v2, 2014.\n\nRobust subspace clustering via thresholding.\n\narXiv preprint\n\n[8] J. Ho, M.-H. Yang, J. Lim, K.-C. Lee, and D. Kriegman. Clustering appearances of objects under varying\nillumination conditions. In IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2003.\n[9] T. Inglot. Inequalities for quantiles of the chi-square distribution. Probability and Mathematical Statistics,\n\n30(2):339\u2013351, 2010.\n\n[10] H.-P. Kriegel, P. Kr\u00a8oger, and A. Zimek. Clustering high-dimensional data: A survey on subspace clus-\ntering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery\nfrom Data (TKDD), 3(1):1, 2009.\n\n[11] S. Kunis and H. Rauhut. Random sampling of sparse trigonometric polynomials, ii. orthogonal matching\n\npursuit versus basis pursuit. Foundations of Computational Mathematics, 8(6):737\u2013763, 2008.\n\n[12] M. Ledoux. The concentration of measure phenomenon, volume 89. AMS Bookstore, 2005.\n[13] K. C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable\n\nlighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5):684\u2013698, 2005.\n\n[14] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma. Robust recovery of subspace structures by low-rank\nrepresentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(1):171\u2013184, 2013.\n[15] V. D. Milman and G. Schechtman. Asymptotic Theory of Finite Dimensional Normed Spaces: Isoperi-\n\nmetric Inequalities in Riemannian Manifolds. Lecture Notes in Mathematics. Springer, 1986.\n\n[16] M. Soltanolkotabi and E. J. Candes. A geometric analysis of subspace clustering with outliers. The Annals\n\nof Statistics, 40(4):2195\u20132238, 2012.\n\n[17] R. Tron and R. Vidal. A benchmark for the comparison of 3-d motion segmentation algorithms. In IEEE\n\nconference on Computer Vision and Pattern Recognition (CVPR), 2007.\n\n[18] J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal matching\n\npursuit. Information Theory, IEEE Transactions on, 53(12):4655\u20134666, 2007.\n\n[19] P. Tseng. Nearest q-\ufb02at to m points. Journal of Optimization Theory and Applications, 105(1):249\u2013252,\n\n2000.\n\n[20] R. Vidal. Subspace clustering. Signal Processing Magazine, IEEE, 28(2):52\u201368, 2011.\n[21] R. Vidal, Y. Ma, and S. Sastry. Generalized principal component analysis. In IEEE conference on Com-\n\nputer Vision and Pattern Recognition (CVPR), 2003.\n\n[22] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. An algebraic geometric approach to the identi\ufb01cation of a\nclass of linear hybrid systems. In Decision and Control, 2003. Proceedings. 42nd IEEE Conference on,\nvolume 1, pages 167\u2013172. IEEE, 2003.\n\n[23] R. Vidal, R. Tron, and R. Hartley. Multiframe motion segmentation with missing data using power\n\nfactorization and GPCA. International Journal of Computer Vision, 79(1):85\u2013105, 2008.\n\n[24] Y.-X. Wang, H. Xu, and C. Leng. Provable subspace clustering: When LRR meets SSC. In Advances in\n\nNeural Information Processing Systems (NIPS), December 2013.\n\n[25] A. Y. Yang, S. R. Rao, and Y. Ma. Robust statistical estimation and segmentation of multiple subspaces.\n\nIn IEEE conference on Computer Vision and Pattern Recognition (CVPR), 2006.\n\n[26] T. Zhang, A. Szlam, Y. Wang, and G. Lerman. Hybrid linear modeling via local best-\ufb01t \ufb02ats. International\n\njournal of computer vision, 100(3):217\u2013240, 2012.\n\n9\n\n\f", "award": [], "sourceid": 1423, "authors": [{"given_name": "Dohyung", "family_name": "Park", "institution": "UT Austin"}, {"given_name": "Constantine", "family_name": "Caramanis", "institution": "UT Austin"}, {"given_name": "Sujay", "family_name": "Sanghavi", "institution": "UT Austin"}]}