{"title": "A Domain Decomposition Method for Fast Manifold Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 1625, "page_last": 1632, "abstract": null, "full_text": "A Domain Decomposition Method for Fast Manifold Learning\n\nZhenyue Zhang Department of Mathematics Zhejiang University, Yuquan Campus, Hangzhou, 310027, P. R. China zyzhang@zju.edu.cn\n\nHongyuan Zha Department of Computer Science Pennsylvania State University University Park, PA 16802 zha@cse.psu.edu\n\nAbstract\nWe propose a fast manifold learning algorithm based on the methodology of domain decomposition. Starting with the set of sample points partitioned into two subdomains, we develop the solution of the interface problem that can glue the embeddings on the two subdomains into an embedding on the whole domain. We provide a detailed analysis to assess the errors produced by the gluing process using matrix perturbation theory. Numerical examples are given to illustrate the efficiency and effectiveness of the proposed methods.\n\n1\n\nIntroduction\n\nThe setting of manifold learning we consider is the following. We are given a parameterized manifold of dimension d defined by a mapping f : Rm , where d < m, and open and connected in Rd . We assume the manifold is well-behaved, it is smooth and contains no self-intersections etc. Suppose we have a set of points x1 , , xN , sampled possibly with noise from the manifold, i.e., xi = f (i ) + i, i = 1, . . . , N , (1.1)\n\nwhere i's represent noise. The goal of manifold learning is to recover the parameters i 's and/or the mapping f () from the sample points xi 's [2, 6, 9, 12]. The general framework of manifold learning methods involves imposing a connectivity structure such as a k -nearestneighbor graph on the set of sample points and then turn the embedding problem into the solution of an eigenvalue problem. Usually constructing the graph dominates the computational cost of a manifold learning algorithm, but for large data sets, the computational cost of the eigenvalue problem can be substantial as well. The focus of this paper is to explore the methodology of domain decomposition for developing fast algorithms for manifold learning. Domain decomposition by now is a wellestablished field in scientific computing and has been successfully applied in many science and engineering fields in connection with numerical solutions of partial differential equations. One class of domain decomposition methods partitions the solution domain into subdomains, solves the problem on each subdomain and glue the partial solutions on the subdomains by solving an interface problem [7, 10]. This is the general approach we will\n\n\f\nfollow in this paper. In particular, in section 3, we consider the case where the given set of sample points x1 , . . . , xN are partitioned into two subdomains. On each of the subdomain, we can use a manifold learning method such as LLE [6], LTSA [12] or any other manifold learning methods to construct an embedding for the subdomain in question. We will then formulate the interface problem the solution of which will allow us to combine the embeddings on the two subdomains together to obtain an embedding over the whole domain. However, it is not always feasible to carry out the procedure described above. In section 2, we give necessary and sufficient conditions under which the embedding on the whole domain can be constructed from the embeddings on the subdomains. In section 4, we analyze the errors produced by the gluing process using matrix perturbation theory. In section 5, we briefly mention how the partitioning of the set of sample points into subdomains can be accomplished by some graph partitioning algorithms. Section 6 is devoted to numerical experiments. N OTAT I O N . We use e to denote a column vector of all 1's the dimension of which should be clear from the context. N () and R() denote the null space and range space of a matrix, respectively. For an index set I = [i1 , . . . , ik ], A(:, I ) denotes the submatrix of A consisting of columns of A with indices in I with a similar definition for the rows of a matrix. We use to denote the spectral norm of a matrix.\n\n2\n\nA Basic Theorem\n\nLet X = [x1 , , xN ] with xi = f (i ) + i, i = 1, . . . , N . Assume that the whole sample domain X is divided into two subdomains X1 = {xi | i I1 } and X2 = {xi | i I2 }. Here I1 and I2 denote the index sets such that I1 I2 = {1, . . . , N } and I1 I2 is not empty. Suppose we have obtained the two low-dimensional embeddings T 1 and T2 of the sub-domains X1 and X2 , respectively. The domain decomposition method attempts to recover the overall embedding T = {1 , . . . , N } from the embeddings T1 and T2 on the subdomains. In general, the recovered sub-embedding Tj , j = 1, 2, may not be exactly the subset {i | i Ij } of T . For example, it is often the case that the recovered embeddings Tj are approximately affinely equal to {i | i Ij }, i.e., up to certain approximation errors, there is an affine transformation such that Tj = {Fj i + cj | i Ij }, where Fj is a nonsingular matrix and cj a column vector. Thus a domain decomposition method for manifold learning should be invariant to affine transformation on the embeddings Tj obtained from subdomains. In that case, we can assume that Tj is just the subset of T , i.e., Tj = {i | i Ij }. With an abuse of notation, we also denote by T and Tj the matrices of the column vectors in the set T and Tj , for example, we write T = [1 , . . . , N ]. Let j be an orthogonal projection with N (j ) = span([e, TjT ]). Then Tj can be recovered by computing the eigenvectors of j corresponding to its zero eigenvalues. To recover the whole T we need to construct a matrix with N () = span([e, T T ]) [11]. To this end, for each Tj , let j = Qj QT RNj Nj , where Qj is an orthonormal basis j matrix of N ([e, TjT ]T ) and Nj is the column-size of Tj . To construct a matrix, Let Sj RN Nj be the 0-1 selection matrix defined as Sj = IN (:, Ij ), where IN is the T ^ ^ ^ identity matrix of order N . Let j = Sj j Sj . We then simply take = 1 + 2 , ^ ^ or more flexibly, = w1 1 + w2 2 , where w1 and w2 are the weights: wi > 0 and w1 + w2 = 1. Obviously 1 since j = 1. The following theorem gives the necessary and sufficient conditions under which the null space of is just span{[e, T T ]}. (In the theorem, we only require the j to positive semidefinite.)\n\n\f\nTheorem 2.1 Let i be two positive semidefinite matrices such that N (i ) = T T span([e, TiT ]), i = 1, 2, and T0 = T1 T2 . Assume that [e, T1 ] and [e, T2 ] are of full T T column-rank. Then N () = span([e, T ]) if and only if [e, T0 ] is of full column-rank.\nT Proof. We first prove the necessity by contradiction. Assume that N ([e, T0 ]) = T T T N ([e, T2 ]), then there is y = 0 such that [e, T0 ]y = 0 and [e, T (:, I2 )]y = 0. Dec note by I1 the complement of I1 , i.e., the index set of i's which do not belong to I1 . Then c [e, T T (:, I1 )]y = 0. Now we construct a vector x as T x(I1 ) = [e, T1 ]y , c x(I1 ) = 0.\n\nClearly x(I2 ) = 0 and hence x N (). By the condition N () = span([e, T T ]), we can T write x in the form x = [e, T T ]z for a column vector z . Specially, x(I1 ) = [e, T1 ]z . Note T T that we also have x(I1 ) = [e, T1 ]y by definition. It implies that z = y because [e, T1 ] is of full rank. Therefor,\nc c c [e, T T (:, I1 )]y = [e, T T (:, I1 )]z = x(I1 ) = 0. T Using it together with [e, T0 ]y = 0 we have [e, T T (:, I2 )]y = 0, a contradiction.\n\nNow we prove the sufficiency. Let Q be a basis matrix of N (). we have ^ ^ w1 G1 QT 1 Q + w2 G2 QT 2 Q = QT Q = 0, ^ which implies i Q(I1 , :) = 0, i = 1, 2, because i is positive semidefinite. So Q(Ii , :) = [e, TiT ]Gi , i = 1, 2. (2.2) Taking the overlap part Q(I0 , :) of Q with the different representations\nT Q(I0 , :) = [e, Ti (:, I0 )T ]Gi = [e, T0 ]Gi , T T we obtain [e, T0 ](G1 - G2 ) = 0. So G1 = G2 because [e, T0 ] is of full column rank, giving rise to Q = [e, T T ]G1 , i.e., N () span([e, T T ]). It follows together with the obvious result span([e, T T ]) N () that N () = span([e, T T ]). T The above result states that when the overlapping is large enough such that [e, T 0 ] is of full column-rank (which is generically true when T0 contains d + 1 points or more), the embedding over the whole domain can be recovered from the embeddings over the two subdomains. However, to follow Theorem 2.1, it seems that we will need to compute the null space of . In the next section, we will show this can done much cheaply by considering an interface problem which is of much smaller dimension.\n\n3\n\nComputing the Null Space of \n\nIn this section, we formulate the interface problem and show how to solve it to glue the embeddings from the two subdomains to obtain an embedding over the whole domain. To simplify notations, we re-denote by T the actual embedding over the whole domain and Tj the subsets of T corresponding to subdomains. We then use Tj to denote affinely transformed versions of Tj obtained by LTSA for example, i.e., Tj = cj eT + Fj Tj . Here cj is a constant column vector in Rd and Fj is a nonsingular matrix. Denote by T0j the overlapping part of Tj corresponding to I0 = I1 I2 as in the proof of Theorem 2.1. We consider the overlapping parts T0j of Tj ,\n c1 eT + F1 T01 = T01 = T02 = c2 eT + F2 T02 .\n\n(3.3)\n\nOr equivalently, [ T T e, T01 ], -[e, T02 ] ( c 1 , F1 )T (c2 , F2 )T = 0.\n\n\f\nT T nd partition G = [GT , GT ]T conformally, then [e, T01 ]G1 = [e, T02 ]G2 . Let Aj = 1 2 T TT Gj [e, Tj ] , j = 1, 2. Define the matrix A such that A(:, Ij ) = Aj . Then since i AT = i 0, the well-defined matrix AT is a basis of N (), T T AT = S1 1 S1 AT + S2 2 S2 AT = S1 1 AT + S2 2 AT = 0. 2 1\n\na [ T T Therefore, if we take an orthonormal basis G of the null space of e, T01 ], -[e, T02 ]\n\nTherefore, we can use AT to recover the global embedding T . A simpler alternative way is use a one-sided affine transformation, i.e., fix one of T i and ~ affinely transform the other; the affine matrix is obtained by fixing one of T0i and transforming the other. For example, we can determine c and F such that T01 = ceT + F T02 ,\nT\n\n(3.4)\n\n^ ^ and transform T2 to T2 = ce + F T2 . Clearly, for the overlapping part, T02 = T01 . Then we can construct a larger matrix T by T (:, I1 ) = T1 , T (:, I2 ) = ceT + F T2 . One can also readily verify that T T is a basis matrix of N (). In the noisy case, a least squares formulation will be needed. For example, for the simultaneous affine transformation, we take G = [GT , GT ]T to be an orthonormal matrix in 1 2 R2(d+1)(d+1) such that\nT T [e, T01 ]G1 - [e, T02 ]G2 = min .\n\nIt is known that the minimum G is given by the right singular vector m,atrix correspond[ T T ing to the d + 1 smallest singular values of W = e, T01 ], -[e, T02 ] and the residual = [ T T d+2 (W ). For the one-side approach (3.4), [c, F ] can be a e, T01 ]G1 - [e, T02 ]G2 solution to the least squares problem ( , T c = min T01 - t01 eT ) - F (T02 - t02 eT ) min 01 - eT + F T02\nc, F F\n\nwhere t0j is the column mean of T0j . The minimum is achieved at F = (T01 -t01 eT )(T02 - t02 eT )+ , c = t01 - F t02 . Clearly, the residual now reads as I . T c =( T01 - t01 eT ) - (T02 - t02 eT )+ (T02 - t02 eT ) min 01 - eT + F T02\nc, F\n\nNotice that the overlapping parts in the two affinely transformed subsets are not exactly equal to each other in the noisy case. There are several possible choices for setting A(:, I 0 ) ^ or T (:, I0 ). For example, one choice is to set T (:, I0 ) by a convex combination of T0j 's, ^ T (:, I0 ) = T01 + (1 - )T02 . with = 1/2 for example. We summarize discussions above in the following two algorithms for gluing the two subdomains T1 and T2 . Algorithm I. [Simultaneously affine transformation] 1. Compute the right singular vector matr. x G corresponding to the d + 1 smallest i [ T T singular values of e, T01 ], -[e, T02 ] A(:, I1 \\I0 ) = A11 , A(:, I0 ) = A01 + (1 - )A02 , A(:, I2 \\I0 ) = A12 ,\n\n2. Partition G = [GT , GT ]T and set Ai = GT [e, TiT ]T , i = 1, 2, and 1 2 i\n\nwhere A0j is the overlap part of Aj and A1j is the Aj with A0j deleted.\n\n\f\n3 Compute the column mean a of A, and an orthogonal basis U of N (aT ). 4. Set T = U T A. Algorithm II. [One-side affine transformation]\nT 1. Compute the least squares problem minW T01 - W [e, T02 ]T F . ^ 2. Affinely transform T2 to T2 = W [e, T T ]T . 2\n\n3. Set the global coordinate matrix T by T (:, I1 \\I0 ) = T11 , ^ T (:, I0 ) = T01 + (1 - )T02 , ^ T (:, I2 \\I0 ) = T12 .\n\n4\n\nError Analysis\n\nAs we mentioned before, the computation of Tj , j = 1, 2 using a manifold learning algorithm such as LTSA involves errors. In this section, we assess the impact of those errors on the accuracy of the gluing process. Two issues are considered for the error analysis. One is the perturbation analysis of N ( ) when the computation of is subject to error. In this i case, N ( ) will be approximated by the smallest (d + 1)-dimensional eigenspace V of an approximation (Theorem 4.1). The other issue is the error estimation of V when a basis matrix of V is approximately constructed by affinely transformed local embeddings as described in section 3 (Theorem 4.2). Because of space limit, we will not present the details of the proofs of the results. The distance of two linear subspaces X and Y are defined by dist(X , Y ) = PX - PY , where PX and PY are the orthogonal projection onto X and Y , respectively. Let i = i - , where and i are the orthogonal projectors onto the range spaces i i span([e, (Ti )T ]) and span([e, (Ti )T ]), respectively. Clearly, if = w1 + w2 and 1 2 = w1 1 + w2 2 , then s = dist pan([e, (T )T ]), span([e, T T ]) - w 1 1 + w2 2 . Theorem 4.1 Let be the smallest nonzero eigenvalue of and V the subspace spanned by the eigenvectors of corresponding to the d + 1 smallest eigenvalues. If < /4, and 4 2( - + 2 ) < ( - 2 )3 , then ( . dist(V , N ( )) /2 - )2 + 2 Theorem 4.2 Let and be defined in Theorem 4.1. A is the matrix computed by the simultaneous affine transformation (Algorithm I in section 3) Let i () be the i-th smallest singular value of a matrix. Denote ) [ 1 T T . = d+2 ( e, T01 ], -[e, T02 ] , = 2 min (A) If < /4, then F dist(V , span(A)) 1 d+2 () + /2 ( /2 - )2\n\nrom Theorems 4.1 and 4.2 we conclude directly that + /2 1 dist(span(A), N ( )) + d+2 () ( /2 - )2\n\n(\n\n2\n\n - 2 )2 + 4 2\n\n.\n\n\f\n5\n\nPartitioning the Domains\n\nTo apply the domain decomposition methods, we need to partition the given set of data points into several domains making use of the k nearest neighbor graph imposed on the data points. This reduces the problem to a graph partition problem and many techniques such as spectral graph partitioning and METIS [3, 5] can be used. In our experiments, we have used a particularly simple approach: we use the reverse Cuthill-McKee method [4] to order the vertices of the k -NN graph and then partition the vertices into domains (for details see Test 2 in the next section). Once we have partitioned the whole domain into multiple overlapping subdomains we can use the following two approaches to glue them together. Successive gluing. Here we glue the subdomains one by one as follows. Initially set T (1) = T1 and I (1) = I1 , and then glue the patch Tk to T (k-1) and obtain the larger one T (k) for k = 2, . . . , K , and so on. The index set of T (k) is given by I (k) = I (k-1) Ik . (k) Clearly the overlapping set of T (k-1) and Tk is I0 = I (k-1) Ik . Recursive gluing. Here at the leaf level, we divide the subdomains into several pairs, (0) (0) (1) say (T2i-1 , T2i ), 1 = 1, 2, . . .. Then glue each pair to be a larger subdomain Ti and continue. The recursive gluing method is obviously parallelizable.\n\n6\n\nNumerical Experiments\n\nIn this section we report numerical experiments for the proposed domain decomposition methods for manifold learning. This efficiency and effectiveness of the methods clearly depend on the accuracy of the computed embeddings for subdomains, the sizes of the subdomains, and the sizes of the overlaps of the subdomains. Test 1. Our first test data set is sampled from a Swiss-roll as follows xi = [ti cos(ti ), hi , ti sin(ti )]T , i = 1, . . . , N = 2000,\n [ 32 , 92 ]\n\n(6.5)\n\nand [0, 21], rewhere ti and hi are uniformly randomly chosen in the intervals spectively. Let i be the arc length of the corresponding spiral curve [t cos(t), t sin(t)]T from t0 = 32 to ti . max = maxi i . To compare the CPU time of the domain decomposition methods, we simply partition the -interval [0, max ] into k subintervals (ai-1 , ai ] with equal length and also partition the h-interval into kh subintervals (bj -1 , bj ]. Let Dij = (ai-1 , ai ] (bj -1 , bj ] and Sij (r) be the balls centered at (ai , bj ) with radius r. We set the subdomains as Xij = {xk | (k , hk ) Dij Sij (r)}. Clearly r determines the size of overlapping parts of Xij with Xi+1,j , Xi,j +1 , Xi+1,j +1 . The submatrices Xij are ordered as X1,1 , X1,2 , . . . , X1,kh , X2,1 , . . . and denoted as Xk , k = 1, . . . , K = k kh . We first compute the K local 2-D embeddings T1 , . . . , TK by applying LTSA on the sample data sets Xk for the subdomains. Then those local coordinate embeddings Tk are aligned by the successive one-sided affine transformation algorithm by adding subdomain Tk one by one. Table 1 lists the total CPU time for the successive domain decomposition algorithm, including the time for computing the embeddings {Tk } for the subdomains, for different parameters k and kh with the parameter r = 5. In Table 2, we list the CPU time for the recursive gluing approach taking into account the parallel procedure. As a comparison, the CPU time of LTSA applying to the whole data points is 6.23 seconds.\n\n\f\nTable 1: CPU Time (seconds) of the successive domain decomposition algorithm. kh =2 3 4 5 6 k = 3 1.89 1.70 1.64 1.61 1.64 167 1.67 1.61 1.70 1.77 4 5 1.66 1.59 1.67 1.78 1.86 163 1.66 1.75 1.89 2.09 6 7 1.59 1.70 1.84 2.02 2.23 8 1.58 1.80 1.94 2.22 2.44 9 1.63 1.83 2.06 2.31 2.66 10 1.63 1.86 2.38 2.56 2.94 Table 2: CPU Time (seconds) of the parallel recursive domain decomposition. kh =2 3 4 5 6 k = 3 0.52 0.34 0.27 0.19 017 4 0.53 0.23 0.20 0.17 0.13 5 0.31 0.17 0.19 0.17 0.14 6 0.25 0.19 0.16 0.13 0.14 7 0.20 0.16 0.14 0.14 0.11 0.20 0.17 0.16 0.14 0.14 8 9 0.19 0.16 0.14 0.14 0.14 10 0.19 0.16 0.17 0.19 0.13 Test 2. The symmetric reverse Cuthill-McKee permutation (symrcm) is an algorithm for ordering the rows and columns of a symmetric sparse matrix [4]. It tends to move the nonzero elements of the sparse matrix towards the main diagonals of the matrix. We use Matlab's symrcm to the adjacency matrix of the k-nearest-neighbor graph of the data points to reorder them. Denote by X the reordered data set. We then partition the whole sample points into K = 16 subsets Xi = X (:, si : ei ) with si = max{1, (i - 1)m - 20}, ei = min{im + 20, N }, and m = N/K = 125. It is known that the t-h parameters in (6.5) represent an isometric parametrization of the swiss-roll surface. We have shown that within the errors made in computing the local embeddings, LTSA can recover the isometric parametrization up to an affine transformation ~ [11]. We denote by T (k) = ceT + F T (k) the optimal approximation to T (:, I (k) ) within affine transformations, ~ T (:, I (k) ) - T (k) F = min T (:, I (k) ) - (ceT + F T (k) ) F .\nc,F\n\nWe denote by k the average of relative errors k = 1 |I (k) | i ~ T (:, i) - T (k) (:, i) 2 . T (:, i) 2\n\nI (k)\n\nIn the left panel of Figure 1 we plot the initial embedding errors for the subdomains (blue bar), the error of LTSA applied to the whole data set (red bar), and the errors k of the successive gluing (red line). The successive gluing method gives an embedding with an acceptable accuracy comparing with the accuracy obtained by applying LTSA to the whole data set. As shown in the error analysis, the errors in successive gluing will increase when the initial errors for the subdomains increase. To show it more clearly, we also plot the k for the recursive gluing method in the right panel of Figure 1. Acknowledgment. The work of first author was supported in part by by NSFC (project 60372033), the Special Funds for Major State Basic Research Projects (project\n\n\f\n4.5\n\nx 10\n\n-3\n\n4.5\n\nx 10\n\n-3\n\n4\n\nsuccessive alignment subdomains whole domain\n\n4\n\n3.5\n\n3.5\n\nroot 4 root 3 root 2 root 1 subdomains whole domain\n\n3\n\n3\n\n2.5\n\n2.5\n\n2\n\n2\n\n1.5\n\n1.5\n\n1\n\n1\n\n0.5\n\n0.5\n\n0\n\n0\n\n2\n\n4\n\n6\n\n8 k\n\n10\n\n12\n\n14\n\n16\n\n18\n\n0\n\n0\n\n2\n\n4\n\n6\n\n8 k\n\n10\n\n12\n\n14\n\n16\n\n18\n\nFigure 1: Relative errors for the successive (left) and recursive (right) approaches. G19990328), and NSF grant CCF-0305879. The work of second author was supported in part by NSF grants DMS-0311800 and CCF-0430349.\n\nReferences\n[1] M. Brand. Charting a manifold. Advances in Neural Information Processing Systems 15, MIT Press, 2003. [2] D. Donoho and C. Grimes. Hessian Eigenmaps: new tools for nonlinear dimensionality reduction. Proceedings of National Academy of Science, 5591-5596, 2003. [3] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czech. Math. J. 25:619637, 1975. [4] A. George and J. W. Liu. Computer Solution of Large Sparse Positive Definite Matrices. Prentice Hall, 1981. [5] METIS. http://www-users.cs.umn.edu/karypis/metis/. [6] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290: 23232326, 2000. [7] B. Smith, P. Bjorstad and W. Gropp Domain Decomposition, Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996. [8] G.W. Stewart and J.G. Sun. Matrix Perturbation Theory. Academic Press, New York, 1990. [9] J. Tenenbaum, V. De Silva and J. Langford. A global geometric framework for nonlinear dimension reduction. Science, 290:23192323, 2000. [10] A. Toselli and O. Widlund. Domain Decomposition Methods - Algorithms and Theory. Springer, 2004. [11] H. Zha and Z. Zhang. Spectral analysis of alignment in manifold learning. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), 2005. [12] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Scientific Computing. 26:313-338, 2005.\n\n\f\n", "award": [], "sourceid": 2818, "authors": [{"given_name": "Zhenyue", "family_name": "Zhang", "institution": null}, {"given_name": "Hongyuan", "family_name": "Zha", "institution": null}]}