{"title": "The Ordered Residual Kernel for Robust Motion Subspace Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 333, "page_last": 341, "abstract": "We present a novel and highly effective approach for multi-body motion segmentation. Drawing inspiration from robust statistical model fitting, we estimate putative subspace hypotheses from the data. However, instead of ranking them we encapsulate the hypotheses in a novel Mercer kernel which elicits the potential of two point trajectories to have emerged from the same subspace. The kernel permits the application of well-established statistical learning methods for effective outlier rejection, automatic recovery of the number of motions and accurate segmentation of the point trajectories. The method operates well under severe outliers arising from spurious trajectories or mistracks. Detailed experiments on a recent benchmark dataset (Hopkins 155) show that our method is superior to other state-of-the-art approaches in terms of recovering the number of motions, segmentation accuracy, robustness against gross outliers and computational efficiency.", "full_text": "The Ordered Residual Kernel for\n\nRobust Motion Subspace Clustering\n\nTat-Jun Chin, Hanzi Wang and David Suter\n\nSchool of Computer Science\n\nThe University of Adelaide, South Australia\n\n{tjchin, hwang, dsuter}@cs.adelaide.edu.au\n\nAbstract\n\nWe present a novel and highly effective approach for multi-body motion segmen-\ntation. Drawing inspiration from robust statistical model \ufb01tting, we estimate pu-\ntative subspace hypotheses from the data. However, instead of ranking them we\nencapsulate the hypotheses in a novel Mercer kernel which elicits the potential of\ntwo point trajectories to have emerged from the same subspace. The kernel per-\nmits the application of well-established statistical learning methods for effective\noutlier rejection, automatic recovery of the number of motions and accurate seg-\nmentation of the point trajectories. The method operates well under severe outliers\narising from spurious trajectories or mistracks. Detailed experiments on a recent\nbenchmark dataset (Hopkins 155) show that our method is superior to other state-\nof-the-art approaches in terms of recovering the number of motions, segmentation\naccuracy, robustness against gross outliers and computational ef\ufb01ciency.\n\n1 Introduction1\n\nMulti-body motion segmentation concerns the separation of motions arising from multiple moving\nobjects in a video sequence. The input data is usually a set of points on the surface of the objects\nwhich are tracked throughout the video sequence. Motion segmentation can serve as a useful pre-\nprocessing step for many computer vision applications. In recent years the case of rigid (i.e. non-\narticulated) objects for which the motions could be semi-dependent on each other has received much\nattention [18, 14, 19, 21, 22, 17]. Under this domain the af\ufb01ne projection model is usually adopted.\nSuch a model implies that the point trajectories from a particular motion lie on a linear subspace\nof at most four, and trajectories from different motions lie on distinct subspaces. Thus multi-body\nmotion segmentation is reduced to the problem of subspace segmentation or clustering.\n\nTo realize practical algorithms, motion segmentation approaches should possess four desirable at-\ntributes: (1) Accuracy in classifying the point trajectories to the motions they respectively belong\nto. This is crucial for success in the subsequent vision applications, e.g. object recognition, 3D\nreconstruction. (2) Robustness against inlier noise (e.g. slight localization error) and gross outliers\n(e.g. mistracks, spurious trajectories), since getting imperfect data is almost always unavoidable in\npractical circumstances. (3) Ability to automatically deduce the number of motions in the data. This\nis pivotal to accomplish fully automated vision applications. (4) Computational ef\ufb01ciency. This is\nintegral for the processing of video sequences which are usually large amounts of data.\n\nRecent work on multi-body motion segmentation can roughly be divided into algebraic or factoriza-\ntion methods [3, 19, 20], statistical methods [17, 7, 14, 6, 10] and clustering methods [22, 21, 5]. No-\ntable approaches include Generalized PCA (GPCA) [19, 20], an algebraic method based on the idea\nthat one can \ufb01t a union of m subspaces with a set of polynomials of degree m. Statistical methods of-\nten employ concepts such random hypothesis generation [4, 17], Expectation-Maximization [14, 6]\n\n1This work was supported by the Australian Research Council (ARC) under the project DP0878801.\n\n1\n\n\fand geometric model selection [7, 8]. Clustering based methods [22, 21, 5] are also gaining atten-\ntion due to their effectiveness. They usually include a dimensionality reduction step (e.g. manifold\nlearning [5]) followed by a clustering of the point trajectories (e.g. via spectral clustering in [21]).\n\nA recent benchmark [18] indicated that Local Subspace Af\ufb01nity (LSA) [21] gave the best per-\nformance in terms of classi\ufb01cation accuracy, although their result was subsequently surpassed\nby [5, 10]. However, we argue that most of the previous approaches do not simultaneously ful\ufb01l\nthe qualities desirable of motion segmentation algorithms. Most notably, although some of the ap-\nproaches have the means to estimate the number of motions, they are generally unreliable in this\nrespect and require manual input of this parameter. In fact this prior knowledge was given to all the\nmethods compared in [18]2. Secondly, most of the methods (e.g. [19, 5]) do not explicitly deal with\noutliers. They will almost always breakdown when given corrupted data. These de\ufb01ciencies reduce\nthe usefulness of available motion segmentation algorithms in practical circumstances.\n\nIn this paper we attempt to bridge the gap between experimental performance and practical usability.\nOur previous work [2] indicates that robust multi-structure model \ufb01tting can be achieved effectively\nwith statistical learning. Here we extend this concept to motion subspace clustering. Drawing inspi-\nration from robust statistical model \ufb01tting [4], we estimate random hypotheses of motion subspaces\nin the data. However, instead of ranking these hypotheses we encapsulate them in a novel Mercer\nkernel. The kernel can function reliably despite overwhelming sampling imbalance, and it permits\nthe application of non-linear dimensionality reduction techniques to effectively identify and reject\noutlying trajectories. This is then followed by Kernel PCA [11] to maximize the separation between\ngroups and spectral clustering [13] to recover the number of motions and clustering. Experiments\non the Hopkins 155 benchmark dataset [18] show that our method is superior to other approaches in\nterms of the qualities described above, including computational ef\ufb01ciency.\n\n1.1 Brief review of af\ufb01ne model multi-body motion segmentation\n\nLet {tf p \u2208 R2}f =1,...,F\np=1,...,P be the set of 2D coordinates of P trajectories tracked across F frames. In\nmulti-body motion segmentation the tf p\u2019s correspond to points on the surface of rigid objects which\nare moving. The goal is to separate the trajectories into groups corresponding to the motion they\nbelong to. In other words, if we arrange the coordinates in the following data matrix\n\nT =\n\n\uf8ee\n\uf8ef\uf8f0\n\nt11\n...\ntF 1\n\n\u00b7 \u00b7 \u00b7\n. . .\n. . .\n\nt1P\n...\ntF P\n\n\uf8f9\n\uf8fa\uf8fb\n\n\u2208 R2F \u00d7P ,\n\n(1)\n\nthe goal is to \ufb01nd the permutation \u0393 \u2208 RP \u00d7P such that the columns of T \u00b7 \u0393 are arranged according\nto the respective motions they belong to. It turns out that under af\ufb01ne projection [1, 16] trajectories\nfrom the same motion lie on a distinct subspace in R2F , and each of these motion subspaces is of\ndimensions 2, 3 or 4. Thus motion segmentation can be accomplished via clustering subspaces in\nR2F . See [1, 16] for more details. Realistically actual motion sequences might contain trajectories\nwhich do not correspond to valid objects or motions. These trajectories behave as outliers in the data\nand, if not taken into account, can be seriously detrimental to subspace clustering algorithms.\n\n2 The Ordered Residual Kernel (ORK)\n\nFirst, we take a statistical model \ufb01tting point of view to motion segmentation. Let {xi}i=1,...,N be\nthe set of N samples on which we want to perform model \ufb01tting. We randomly draw p-subsets from\nthe data and use it to \ufb01t a hypothesis of the model, where p is the number of parameters that de\ufb01ne\nthe model. In motion segmentation, the xi\u2019s are the columns of matrix T, and p = 4 since the model\nis a four-dimensional subspace3. Assume that M of such random hypotheses are drawn.\nFor each data point xi compute its absolute residual set ri = {ri\nM } as measured to the\nM hypotheses. For motion segmentation, the residual is the orthogonal distance to a hypothesis\n\n1, . . . , ri\n\n2As con\ufb01rmed through private contact with the authors of [18].\n3Ideally we should also consider degenerate motions with subspace dimensions 2 or 3, but previous\nwork [18] using RANSAC [4] and our results suggest this is not a pressing issue for the Hopkins 155 dataset.\n\n2\n\n\fsubspace. We sort the elements in ri to obtain the sorted residual set \u02dcri = {ri\n\u03bbi\nthe permutation {\u03bbi\n\nM } is obtained such that ri\n\u03bbi\n\n\u2264 \u00b7 \u00b7 \u00b7 \u2264 ri\n\u03bbi\n\n1, . . . , \u03bbi\n\n1\n\n. De\ufb01ne the following\n\n, . . . , ri\n\u03bbi\n\nM\n\n}, where\n\n1\n\nM\n\n\u02dc\u03b8i := {\u03bbi\n\n1, . . . , \u03bbi\n\nM }\n\n(2)\n\nas the sorted hypothesis set of point xi, i.e. \u02dc\u03b8i depicts the order in which xi becomes the inlier of\nthe M hypotheses as a \ufb01ctitious inlier threshold is increased from 0 to \u221e. We de\ufb01ne the Ordered\nResidual Kernel (ORK) between two data points as\n\nk\u02dcr(xi1 , xi2 ) :=\n\n1\nZ\n\nM/h\n\nX\n\nt=1\n\nzt \u00b7 kt\n\n\u2229( \u02dc\u03b8i1 , \u02dc\u03b8i2),\n\n(3)\n\nwhere zt = 1\nt=1 zt is the (M/h)-th harmonic number.\nWithout lost of generality assume that M is wholly divisible by h. Step size h is used to obtain the\nDifference of Intersection Kernel (DOIK)\n\nt are the harmonic series and Z = PM/h\n\n\u2229( \u02dc\u03b8i1 , \u02dc\u03b8i2 ) :=\nkt\n\n1\nh\n\n(| \u02dc\u03b81:\u03b1t\n\ni1 \u2229 \u02dc\u03b81:\u03b1t\n\ni2\n\n1:\u03b1t\u22121\n| \u2212 | \u02dc\u03b8\ni1\n\n1:\u03b1t\u22121\n\u2229 \u02dc\u03b8\ni2\n\n|)\n\n(4)\n\nwhere \u03b1t = t \u00b7 h and \u03b1t\u22121 = (t \u2212 1) \u00b7 h. Symbol \u02dc\u03b8a:b\nindicates the set formed by the a-th to\nthe b-th elements of \u02dc\u03b8i. Since the contents of the sorted hypotheses set are merely permutations of\n{1 . . . M }, i.e. there are no repeating elements,\n\ni\n\n0 \u2264 k\u02dcr(xi1 , xi2 ) \u2264 1.\n\n(5)\n\nNote that k\u02dcr is independent of the type of model to be \ufb01tted, thus it is applicable to generic statistical\nmodel \ufb01tting problems. However, we concentrate on motion subspaces in this paper.\n\nLet \u03c4 be a \ufb01ctitious inlier threshold. The kernel k\u02dcr captures the intuition that, if \u03c4 is low, two\npoints arising from the same subspace will have high normalized intersection since they share many\ncommon hypotheses which correspond to that subspace. If \u03c4 is high, implausible hypotheses \ufb01tted\non outliers start to dominate and decrease the normalized intersection. Step size h allows us to\nquantify the rate of change of intersection if \u03c4 is increased from 0 to \u221e, and since zt is decreasing,\nk\u02dcr will evaluate to a high value for two points from the same subspace. In contrast, k\u02dcr is always low\nfor points not from the same subspace or that are outliers.\n\nProof of satisfying Mercer\u2019s condition. Let D be a \ufb01xed domain, and P(D) be the power set of\nD, i.e. the set of all subsets of D. Let S \u2286 P(D), and p, q \u2208 S. If \u00b5 is a measure on D, then\n\nk\u2229(p, q) = \u00b5(p \u2229 q),\n\n(6)\n\ncalled the intersection kernel, is provably a valid Mercer kernel [12]. The DOIK can be rewritten as\n\n\u2229( \u02dc\u03b8i1 , \u02dc\u03b8i2 ) =\nkt\n\n(\u03b1t\u22121+1):\u03b1t\n\u2229 \u02dc\u03b8\ni2\n\n|\n\n(\u03b1t\u22121+1):\u03b1t\n(| \u02dc\u03b8\ni1\n\n1\nh\n1:(\u03b1t\u22121)\n+| \u02dc\u03b8\ni1\n\n(\u03b1t\u22121+1):\u03b1t\n\u2229 \u02dc\u03b8\ni2\n\n(\u03b1t\u22121+1):\u03b1t\n| + | \u02dc\u03b8\ni1\n\n1:(\u03b1t\u22121)\n\u2229 \u02dc\u03b8\ni2\n\n|).\n\n(7)\n\nIf we let D = {1 . . . M } be the set of all possible hypothesis indices and \u00b5 be uniform on D, each\nterm in Eq. (7) is simply an intersection kernel multiplied by |D|/h. Since multiplying a kernel\nwith a positive constant and adding two kernels respectively produce valid Mercer kernels [12], the\nDOIK and ORK are also valid Mercer kernels.\u2022\n\nParameter h in k\u02dcr depends on the number of random hypotheses M , i.e. step size h can be set as a\nratio of M . The value of M can be determined based on the size of the p-subset and the size of the\ndata N (e.g. [23, 15]), and thus h is not contingent on knowledge of the true inlier noise scale or\nthreshold. Moreover, our experiments in Sec. 4 show that segmentation performance is relatively\ninsensitive to the settings of h and M .\n\n2.1 Performance under sampling imbalance\n\nMethods based on random sampling (e.g. RANSAC [4]) are usually affected by unbalanced datasets.\nThe probability of simultaneously retrieving p inliers from a particular structure is tiny if points\n\n3\n\n\ffrom that structure represent only a small minority in the data. In an unbalanced dataset the \u201cpure\u201d\np-subsets in the M randomly drawn samples will be dominated by points from the majority structure\nin the data. This is a pronounced problem in motion sequences, since there is usually a background\n\u201cobject\u201d whose point trajectories form a large majority in the data. In fact, for motion sequences\nfrom the Hopkins 155 dataset [18] with typically about 300 points per sequence, M has to be raised\nto about 20,000 before a pure p-subset from the non-background objects is sampled.\n\nHowever, ORK can function reliably despite serious sampling imbalance. This is because points\nfrom the same subspace are roughly equi-distance to the sampled hypotheses in their vicinity, even\nthough these hypotheses might not pass through that subspace. Moreover, since zt in Eq. (3) is de-\ncreasing only residuals/hypotheses in the vicinity of a point are heavily weighted in the intersection.\nFig. 1(a) illustrates this condition. Results in Sec. 4 show that ORK excelled even with M = 1, 000.\n\n(a) Data in R2F .\n\n(b) Data in RKHS Fk \u02dcr .\n\nFigure 1: (a) ORK under sampling imbalance. (b) Data in RKHS induced by ORK.\n\n3 Multi-Body Motion Segmentation using ORK\n\nIn this section, we describe how ORK is used for multi-body motion segmentation.\n\n3.1 Outlier rejection via non-linear dimensionality reduction\n\nDenote by Fk\u02dcr the Reproducing Kernel Hilbert Space (RKHS) induced by k\u02dcr. Let matrix A =\n[\u03c6(x1) . . . \u03c6(xN )] contain the input data after it is mapped to Fk\u02dcr . The kernel matrix K = AT A is\ncomputed using the kernel function k\u02dcr as\n\nKp,q = h\u03c6(xp), \u03c6(xq)i = k\u02dcr(xp, xq), p, q \u2208 {1 . . . N }.\n\n(8)\n\nSince k\u02dcr is a valid Mercer kernel, K is guaranteed to be positive semi-de\ufb01nite [12]. Let K =\nQ\u2206QT be the eigenvalue decomposition (EVD) of K. Then the rank-n Kernel Singular Value\nDecomposition (Kernel SVD) [12] of A is\n\nAn = [AQn(\u2206n)\u2212\n\n1\n\n2 ][(\u2206n)\n\n1\n\n2 ][(Qn)T ] \u2261 Un\u03a3n(Vn)T .\n\n(9)\n\nVia the Matlab notation, Qn = Q:,1:n and \u2206n = \u22061:n,1:n. The left singular vectors Un is an\northonormal basis for the n-dimensional principal subspace of the whole dataset in Fk\u02dcr . Projecting\nthe data onto the principal subspace yields\n\nB = [AQn(\u2206n)\u2212\n\n1\n\n2 ]T A = (\u2206n)\n\n1\n\n2 (Qn)T ,\n\n(10)\n\nwhere B = [b1 . . . bN ] \u2208 Rn\u00d7N is the reduced dimension version of A. Directions of the principal\nsubspace are dominated by inlier points, since k\u02dcr evaluates to a high value generally for them, but\nalways to a low value for gross outliers. Moreover the kernel ensures that points from the same\nsubspace are mapped to the same cluster and vice versa. Fig. 1(b) illustrates this condition.\n\nFig. 2(a)(left) shows the \ufb01rst frame of sequence \u201cCars10\u201d from the Hopkins 155 dataset [18] with\n100 false trajectories of Brownian motion added to the original data (297 points). The corresponing\nRKHS norm histogram for n = 3 is displayed in Fig. 2(b). The existence of two distinct modes,\n\n4\n\n\f15\n\n10\n\n5\n\nt\n\nn\nu\no\nc\n \n\ni\n\nn\nB\n\n0\n\n0\n\nOutlier mode\n\nInlier mode\n\n0.02\n\n0.04\n\n0.06\n\n0.08\n\n0.1\n\n0.12\n\n0.14\n\n0.16\n\n0.18\n\n0.2\n\nVector norm in principal subspace\n\n(a) (left) Before and (right) after outlier removal. Blue\ndots are inliers while red dots are added outliers.\n\n(b) Actual norm histogram of \u201ccars10\u201d.\n\nFigure 2: Demonstration of outlier rejection on sequence \u201ccars10\u201d from Hopkins 155.\n\ncorresponding respectively to inliers and outliers, is evident. We exploit this observation for outlier\nrejection by discarding data with low norms in the principal subspace.\n\nThe cut-off threshold \u03c8 can be determined by analyzing the shape of the distribution. For instance\nwe can \ufb01t a 1D Gaussian Mixture Model (GMM) with two components and set \u03c8 as the point of\nequal Mahalanobis distance between the two components. However, our experimentation shows that\nan effective threshold can be obtained by simply setting \u03c8 as the average value of all the norms, i.e.\n\n\u03c8 =\n\n1\nN\n\nN\n\nX\n\ni=1\n\nkbik.\n\n(11)\n\nThis method was applied uniformly on all the sequences in our experiments in Sec. 4. Fig. 2(a)(right)\nshows an actual result of the method on Fig. 2(a)(left).\n\n3.2 Recovering the number of motions and subspace clustering\n\nAfter outlier rejection, we further take advantage of the mapping induced by ORK for recovering the\nnumber of motions and subspace clustering. On the remaining data, we perform Kernel PCA [11]\nto seek the principal components which maximize the variance of the data in the RKHS, as Fig. 1(b)\nillustrates. Let {yi}i=1,...,N \u2032 be the N \u2032-point subset of the input data that remains after outlier\nremoval, where N \u2032 < N . Denote by C = [\u03c6(y1) . . . \u03c6(yN \u2032 )] the data matrix after mapping the data\nto Fk\u02dcr , and by symbol \u02dcC the result of adjusting C with the empirical mean of {\u03c6(y1), . . . , \u03c6(yN \u2032)}.\nThe centered kernel matrix \u02dcK\u2032 = \u02dcCT \u02dcC [11] can be obtained as\n\n\u02dcK\u2032 = \u03bd T K\u2032\u03bd, \u03bd = [IN \u2032 \u2212\n\n1\nN \u2032\n\n1N \u2032,N \u2032],\n\n(12)\n\nwhere K\u2032 = CT C is the uncentered kernel matrix, Is and 1s,s are respectively the s \u00d7 s identity\nmatrix and a matrix of ones. If \u02dcK\u2032 = R\u2126RT is the EVD of \u02dcK\u2032, then we obtain \ufb01rst-m kernel\nprincipal components Pm of C as the \ufb01rst-m left singular vectors of \u02dcC , i.e.\n\n(13)\nwhere Rm = R:,1:m and \u21261:m,1:m; see Eq. (9). Projecting the data on the principal components\nyields\n\nPm = \u02dcCRm(\u2126m)\u2212\n\n1\n\n2 ,\n\n(14)\nwhere D \u2208 Rm\u00d7N \u2032\n. The af\ufb01ne subspace span(Pm) maximizes the spread of the centered data in\nthe RKHS, and the projection D offers an effective representation for clustering. Fig. 3(a) shows\nthe Kernel PCA projection results for m = 3 on the sequence in Fig. 2(a).\n\nD = [d1 . . . dN \u2032 ] = (\u2126m)\n\n2 (Rm)T ,\n\n1\n\nThe number of clusters in D is recovered via spectral clustering. More speci\ufb01cally we apply the\nNormalized Cut (Ncut) [13] algorithm. A fully connected graph is \ufb01rst derived from the data, where\nits weighted adjacency matrix W \u2208 RN \u2032\n\nis obtained as\n\n\u00d7N \u2032\n\nWp,q = exp(\u2212kdp \u2212 dqk2/2\u03b42),\n\n(15)\n\nand \u03b4 is taken as the average nearest neighbour distance in the Euclidean sense among the vectors\nin D. The Laplacian matrix [13] is then derived from W and eigendecomposed. Under Ncut,\n\n5\n\n\f0.1\n\n0.05\n\n0\n\n\u22120.05\n\n\u22120.1\n\n\u22120.15\n0.15\n\n0.1\n\n0.05\n\n0\n\n\u22120.05\n\n\u22120.1\n\n0.06\n\n0.1\n\n0.08\n\n(a) Kernel PCA and Ncut results.\n\n(b) W matrix.\n\n(c) Final result for \u201ccars10\u201d.\n\nFigure 3: Actual results on the motion sequence in Fig. 2(a)(left).\n\nthe number of clusters is revealed as the number of eigenvalues of the Laplacian that are zero or\nnumerically insigni\ufb01cant. With this knowledge, a subsequent k-means step is then performed to\ncluster the points. Fig. 3(b) shows W for the input data in Fig. 2(a)(left) after outlier removal. It\ncan be seen that strong af\ufb01nity exists between points from the same cluster, thus allowing accurate\nclustering. Figs. 3(a) and 3(c) illustrate the \ufb01nal clustering result for the data in Fig. 2(a)(left).\n\nThere are several reasons why spectral clustering under our framework is more successful than\nprevious methods. Firstly, we perform an effective outlier rejection step that removes bad trajectories\nthat can potentially mislead the clustering. Secondly, the mapping induced by ORK deliberately\nseparates the trajectories based on their cluster membership. Finally, we perform Kernel PCA to\nmaximize the variance of the data. Effectively this also improves the separation of clusters, thus\nfacilitating an accurate recovery of the number of clusters and also the subsequent segmentation.\nThis distinguishes our work from previous clustering based methods [21, 5] which tend to operate\nwithout maximizing the between-class scatter. Results in Sec. 4 validate our claims.\n\n4 Results\n\nHenceforth we indicate the proposed method as \u201cORK\u201d. We leverage on a recently published bench-\nmark on af\ufb01ne model motion segmentation [18] as a basis of comparison. The benchmark was eval-\nuated on the Hopkins 155 dataset4 which contains 155 sequences with tracked point trajectories.\nA total of 120 sequences have two motions while 35 have three motions. The sequences contain\ndegenerate and non-degenerate motions, independent and partially dependent motions, articulated\nmotions, nonrigid motions etc. In terms of video content three categories exist: Checkerboard se-\nquences, traf\ufb01c sequences (moving cars, trucks) and articulated motions (moving faces, people).\n\n4.1 Details on benchmarking\n\nFour major algorithms were compared in [18]: Generalized PCA (GPCA) [19], Local Subspace\nAf\ufb01nity (LSA) [21], Multi-Stage Learning (MSL) [14] and RANSAC [17]. Here we extend the\nbenchmark with newly reported results from Locally Linear Manifold Clustering (LLMC) [5] and\nAgglomerative Lossy Compression (ALC) [10, 9]. We also compare our method against Kanatani\nand Matsunaga\u2019s [8] algorithm (henceforth, the \u201cKM\u201d method) in estimating the number of indepen-\ndent motions in the video sequences. Note that KM per se does not perform motion segmentation.\nFor the sake of objective comparisons we use only implementations available publicly5.\n\nFollowing [18], motion segmentation performance is evaluated in terms of the labelling error of the\npoint trajectories, where each point in a sequence has a ground truth label, i.e.\n\nclassi\ufb01cation error =\n\nnumber of mislabeled points\n\ntotal number of points\n\n.\n\n(16)\n\nUnlike [18], we also emphasize on the ability of the methods in recovering the number of motions.\nHowever, although the methods compared in [18] (except RANSAC) theoretically have the means to\n\n4Available at http://www.vision.jhu.edu/data/hopkins155/.\n5For MSL and KM, see http://www.suri.cs.okayama-u.ac.jp/e-program-separate.html/. For GPCA, LSA\n\nand RANSAC, refer to the url for the Hopkins 155 dataset.\n\n6\n\n\fdo so, their estimation of the number of motions is generally unrealiable and the benchmark results\nin [18] were obtained by revealing the actual number of motions to the algorithms. A similar initial-\nization exists in [5, 10] where the results were obtained by giving LLMC and ALC this knowledge\na priori (for LLMC, this was given at least to the variant LLMC 4m during dimensionality reduc-\ntion [5], where m is the true number of motions). In the following subsections, where variants exist\nfor the compared algorithms we use results from the best performing variant.\n\nIn the following the number of random hypotheses M and step size h for ORK are \ufb01xed at 1000 and\n300 respectively, and unlike the others, ORK is not given knowledge of the number of motions.\n\n4.2 Data without gross outliers\n\nWe apply ORK on the Hopkins 155 dataset. Since ORK uses random sampling we repeat it 100\ntimes for each sequence and average the results. Table 1 depicts the obtained classi\ufb01cation error\namong those from previously proposed methods. ORK (column 9) gives comparable results to the\nother methods for sequences with 2 motions (mean = 7.83%, median = 0.41%). For sequences\nwith 3 motions, ORK (mean = 12.62%, median = 4.75%) outperforms GPCA and RANSAC, but is\nslightly less accurate than the others. However, bear in mind that unlike the other methods ORK is\nnot given prior knowledge of the true number of motions and has to estimate this independently.\n\nColumn\nMethod REF GPCA LSA MSL RANSAC LLMC ALC ORK ORK\u2217\n\n10\n\n2\n\n3\n\n4\n\n1\n\n5\n\n6\n\n8\n\n9\n\nMean\nMedian\n\nMean\nMedian\n\n2.03\n0.00\n\n5.08\n2.40\n\nSequences with 2 motions\n\n4.59\n0.38\n\n3.45\n0.59\n\n4.14\n0.00\n\n5.56\n1.18\n\n3.62\n0.00\n\n3.03\n0.00\n\n7.83\n0.41\n\nSequences with 3 motions\n\n28.66\n28.26\n\n9.73\n2.33\n\n8.23\n1.76\n\n22.94\n22.03\n\n8.85\n3.19\n\n6.26\n1.02\n\n12.62\n4.75\n\n1.27\n0.00\n\n2.09\n0.05\n\nTable 1: Classi\ufb01cation error (%) on Hopkins 155 sequences. REF represents the reference/control\nmethod which operates based on knowledge of ground truth segmentation. Refer to [18] for details.\n\nWe also separately investigate the accuracy of ORK in estimating the number of motions, and com-\npare it against KM [8] which was proposed for this purpose. Note that such an experiment was\nnot attempted in [18] since approaches compared therein generally do not perform reliably in esti-\nmating the number of motions. The results in Table 2 (columns 1\u20132) show that for sequences with\ntwo motions, KM (80.83%) outperforms ORK (67.37%) by \u2248 15 percentage points. However, for\nsequences with three motions, ORK (49.66%) vastly outperforms KM (14.29%) by more than dou-\nbling the percentage points of accuracy. The overall accuracy of KM (65.81%) is slightly better than\nORK (63.37%), but this is mostly because sequences with two motions form the majority in the\ndataset (120 out of 155). This leads us to conclude that ORK is actually the superior method here.\n\nDataset\nColumn\nMethod\n2 motions\n3 motions\nOverall\n\nHopkins 155\n1\n2\n\nKM\n\nORK\n\n3\n\nKM\n\n80.83% 67.37% 00.00%\n14.29% 49.66% 100.00%\n65.81% 63.37% 22.58%\n\nHopkins 155 + Outliers\n\n4\n\nORK\n\n47.58%\n50.00%\n48.13%\n\nTable 2: Accuracy in determining the number of motions in a sequence. Note that in the experiment\nwith outliers (columns 3\u20134), KM returns a constant number of 3 motions for all sequences.\n\nWe re-evaluate the performance of ORK by considering only results on sequences where the number\nof motions is estimated correctly by ORK (there are about 98 \u2261 63.37% of such cases). The results\nare tabulated under ORK\u2217 (column 10) in Table 1. It can be seen that when ORK estimates the\nnumber of motions correctly, it is signi\ufb01cantly more accurate than the other methods.\n\nFinally, we compare the speed of the methods in Table 3. ORK was implemented and run in Matlab\non a Dual Core Pentium 3.00GHz machine with 4GB of main memory (this is much less powerful\n\n7\n\n\fthan the 8 Core Xeon 3.66GHz with 32GB memory used in [18] for the other methods in Table 3).\nThe results show that ORK is comparable to LSA, much faster than MSL and ALC, but slower than\nGPCA and RANSAC. Timing results of LLMC are not available in the literature.\n\nMethod\n2 motions\n3 motions\n\nGPCA\n324ms\n738ms\n\nLSA\n7.584s\n15.956s\n\nMSL\n\nRANSAC\n\nALC\n\n11h 4m\n1d 23h\n\n175ms\n258ms\n\n10m 32s\n10m 32s\n\nORK\n4.249s\n8.479s\n\nTable 3: Average computation time on Hopkins 155 sequences.\n\n4.3 Data with gross outliers\n\nWe next examine the ability of the proposed method in dealing with gross outliers in motion data.\nFor each sequence in Hopkins 155, we add 100 gross outliers by creating trajectories corresponding\nto mistracks or spuriously occuring points. These are created by randomly initializing 100 locations\nin the \ufb01rst frame and allowing them to drift throughout the sequence according to Brownian motion.\nThe corrupted sequences are then subjected to the algorithms for motion segmentation. Since only\nORK is capable of rejecting outliers, the classi\ufb01cation error of Eq. (16) is evaluated on the inlier\npoints only. The results in Table 4 illustrate that ORK (column 4) is the most accurate method by a\nlarge margin. Despite being given the true number of motions a priori, GPCA, LSA and RANSAC\nare unable to provide satisfactory segmentation results.\n\nColumn\nMethod GPCA LSA RANSAC ORK ORK\u2217\n\n1\n\n2\n\n3\n\n4\n\n5\n\nSequences with 2 motions\n\nMean\nMedian\n\n28.66\n30.96\n\n24.25\n26.51\n\n30.64\n32.36\n\nSequences with 3 motions\n\nMean\nMedian\n\n40.61\n41.30\n\n30.94\n27.68\n\n42.24\n43.43\n\n16.50\n10.54\n\n19.99\n8.49\n\n1.62\n0.00\n\n2.68\n0.09\n\nTable 4: Classi\ufb01cation error (%) on Hopkins 155 sequences with 100 gross outliers per sequence.\n\nIn terms of estimating the number of motions, as shown in column 4 in Table 2 the overall accu-\nracy of ORK is reduced to 48.13%. This is contributed mainly by the deterioration in accuracy on\nsequences with two motions (47.58%), although the accuracy on sequences with three motions are\nmaintained (50.00%). This is not a surprising result, since sequences with three motions generally\nhave more (inlying) point trajectories than sequences with two motions, thus the outlier rates for se-\nquences with three motions are lower (recall that a \ufb01xed number of 100 false trajectories are added).\nOn the other hand, the KM method (column 3) is completely overwhelmed by the outliers\u2014 for all\nthe sequences with outliers it returned a constant \u201c3\u201d as the number of motions.\n\nWe again re-evaluate ORK by considering results from sequences (now with gross outliers) where\nthe number of motions is correctly estimated (there are about 75 \u2261 48.13% of such cases). The\nresults tabulated under ORK\u2217 (column 5) in Table 4 show that the proposed method can accurately\nsegment the point trajectories without being in\ufb02uenced by the gross outliers.\n\n5 Conclusions\n\nIn this paper we propose a novel and highly effective approach for multi-body motion segmenta-\ntion. Our idea is based on encapsulating random hypotheses in a novel Mercel kernel and statistical\nlearning. We evaluated our method on the Hopkins 155 dataset with results showing that the idea is\nsuperior other state-of-the-art approaches. It is by far the most accurate in terms of estimating the\nnumber of motions, and it excels in segmentation accuracy despite lacking prior knowledge of the\nnumber of motions. The proposed idea is also highly robust towards outliers in the input data.\n\nAcknowledgements. We are grateful to the authors of [18] especially Ren\u00b4e Vidal for discussions\nand insights which have been immensely helpful.\n\n8\n\n\fReferences\n\n[1] T. Boult and L. Brown. Factorization-based segmentation of motions. In IEEE Workshop on\n\nMotion Understanding, 1991.\n\n[2] T.-J. Chin, H. Wang, and D. Suter. Robust \ufb01tting of multiple structures: The statistical learning\n\napproach. In ICCV, 2009.\n\n[3] J. Costeira and T. Kanade. A multibody factorization method for independently moving ob-\n\njects. IJCV, 29(3):159\u2013179, 1998.\n\n[4] M. A. Fischler and R. C. Bolles. Random sample concensus: A paradigm for model \ufb01tting with\napplications to image analysis and automated cartography. Comm. of the ACM, 24:381\u2013395,\n1981.\n\n[5] A. Goh and R. Vidal. Segmenting motions of different types by unsupervised manifold clus-\n\ntering. In CVPR, 2007.\n\n[6] A. Gruber and Y. Weiss. Multibody factorization with uncertainty and missing data using the\n\nEM algorithm. In CVPR, 2004.\n\n[7] K. Kanatani. Motion segmentation by subspace separation and model selection.\n\nIn ICCV,\n\n2001.\n\n[8] K. Kanatani and C. Matsunaga. Estimating the number of independent motions for multibody\n\nsegmentation. In ACCV, 2002.\n\n[9] Y. Ma, H. Derksen, W. Hong, and J. Wright. Segmentation of multivariate mixed data via lossy\n\ncoding and compression. TPAMI, 29(9):1546\u20131562, 2007.\n\n[10] S. Rao, R. Tron, Y. Ma, and R. Vidal. Motion segmentation via robust subspace separation in\n\nthe presence of outlying, incomplete, or corrupted trajectories. In CVPR, 2008.\n\n[11] B. Sch\u00a8olkopf, A. Smola, and K. R. M\u00a8uller. Nonlinear component analysis as a kernel eigen-\n\nvalue problem. Neural Computation, 10:1299\u20131319, 1998.\n\n[12] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge University\n\nPress, 2004.\n\n[13] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 22(8):888\u2013905, 2000.\n[14] Y. Sugaya and K. Kanatani. Geometric structure of degeneracy for multi-body motion seg-\n\nmentation. In Workshop on Statistical Methods in Video Processing, 2004.\n\n[15] R. Toldo and A. Fusiello. Robust multiple structures estimation with J-Linkage. In ECCV,\n\n2008.\n\n[16] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography. IJCV,\n\n9(2):137\u2013154, 1992.\n\n[17] P. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of\n\nLondon, 356(1740):1321\u20131340, 1998.\n\n[18] R. Tron and R. Vidal. A benchmark for the comparison of 3-D motion segmentation algo-\n\nrithms. In CVPR, 2007.\n\n[19] R. Vidal and R. Hartley. Motion segmentation with missing data by PowerFactorization and\n\nGeneralized PCA. In CVPR, 2004.\n\n[20] R. Vidal, Y. Ma, and S. Sastry. Generalized Principal Component Analysis (GPCA). TPAMI,\n\n27(12):1\u201315, 2005.\n\n[21] J. Yan and M. Pollefeys. A general framework for motion segmentation: independent, articu-\n\nlated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, 2006.\n\n[22] L. Zelnik-Manor and M. Irani. Degeneracies, dependencies and their implications on multi-\n\nbody and multi-sequence factorization. In CVPR, 2003.\n\n[23] W. Zhang and J. Koseck\u00b4a. Nonparametric estimation of multiple structures with outliers. In\n\nDynamical Vision, ICCV 2005 and ECCV 2006 Workshops, 2006.\n\n9\n\n\f", "award": [], "sourceid": 504, "authors": [{"given_name": "Tat-jun", "family_name": "Chin", "institution": null}, {"given_name": "Hanzi", "family_name": "Wang", "institution": null}, {"given_name": "David", "family_name": "Suter", "institution": null}]}