{"title": "Selecting the independent coordinates of manifolds with large aspect ratios", "book": "Advances in Neural Information Processing Systems", "page_first": 1088, "page_last": 1097, "abstract": "Many manifold embedding algorithms fail apparently when the data manifold has a large aspect ratio (such as a long, thin strip). Here, we formulate success and failure in terms of finding a smooth embedding, showing also that the problem is pervasive and more complex than previously recognized. Mathematically, success is possible under very broad conditions, provided that embedding is done by carefully selected eigenfunctions of the Laplace-Beltrami operator $\\Delta_\\M$. Hence, we propose a bicriterial Independent Eigencoordinate Selection (IES) algorithm that selects smooth embeddings with few eigenvectors. The algorithm is grounded in theory, has low computational overhead, and is successful on synthetic and large real data.", "full_text": "Selecting the independent coordinates of manifolds\n\nwith large aspect ratios\n\nDepartment of Electrical & Computer Engineering\n\nYu-Chia Chen\n\nUniversity of Washington\n\nSeattle, WA 98195\nyuchaz@uw.edu\n\nMarina Meil\u02d8a\n\nDepartment of Statistics\nUniversity of Washington\n\nSeattle, WA 98195\nmmp2@uw.edu\n\nAbstract\n\nMany manifold embedding algorithms fail apparently when the data manifold has\na large aspect ratio (such as a long, thin strip). Here, we formulate success and\nfailure in terms of \ufb01nding a smooth embedding, showing also that the problem is\npervasive and more complex than previously recognized. Mathematically, success\nis possible under very broad conditions, provided that embedding is done by care-\nfully selected eigenfunctions of the Laplace-Beltrami operator M. Hence, we\npropose a bicriterial Independent Eigencoordinate Selection (IES) algorithm that\nselects smooth embeddings with few eigenvectors. The algorithm is grounded in\ntheory, has low computational overhead, and is successful on synthetic and large\nreal data.\n\n1 Motivation\n\nWe study a well-documented de\ufb01ciency of manifold learning algorithms. Namely, as shown in\n[GZKR08], algorithms such as Laplacian Eigenmaps (LE), Local Tangent Space Alignment (LTSA),\nHessian Eigenmaps (HLLE), and Diffusion Maps (DM) fail spectacularly when the data has a large\naspect ratio, that is, it extends much more in one geodesic direction than in others. This problem,\nillustrated by the strip in Figure 1, was studied in [GZKR08] from a linear algebraic perspective;\n[GZKR08] show that, especially when noise is present, the problem is pervasive.\nIn the present paper, we revisit the problem from a differential geometric perspective. First, we de-\n\ufb01ne failure not as distortion, but as drop in the rank of the mapping represented by the embedding\nalgorithm. In other words, the algorithm fails when the map is not invertible, or, equivalently,\nwhen the dimension dim (M) < dimM = d, where M represents the idealized data manifold,\nand dim denotes the intrinsic dimension. Figure 1 demonstrates that the problem is \ufb01xed by choos-\ning the eigenvectors with care. We call this problem the Independent Eigencoordinate Selection\n(IES) problem, formulate it and explain its challenges in Section 3.\nOur second main contribution (Section 4) is to design a bicriterial method that will select from a set\nof coordinate functions 1, . . . m, a subset S of small size that provides a smooth full-dimensional\nembedding of the data. The IES problem requires searching over a combinatorial number of sets. We\nshow (Section 4) how to drastically reduce the computational burden per set for our algorithm. Third,\nwe analyze the proposed criterion under asymptotic limit (Section 5). Finally (Section 6), we show\nexamples of successful selection on real and synthetic data. The experiments also demonstrate that\nusers of manifold learning for other than toy data must be aware of the IES problem and have tools\nfor handling it. Notations table, proofs, a library of hard examples, extra experiments and analyses\nare in Supplements A\u2013H; Figure/Table/Equation references with pre\ufb01x S are in the Supplement.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f2 Background on manifold learning\n\nManifold learning (ML) and intrinsic geometry Suppose we observe data X 2 Rn\u21e5D, with data\npoints denoted by xi 2 RD 8 i 2 [n], that are sampled from a smooth1 d-dimensional submanifold\nM\u21e2 RD. Manifold Learning algorithms map xi, i 2 [n] to yi = (xi) 2 Rs, where d \uf8ff s \u2327 D,\nthus reducing the dimension of the data X while preserving (some of) its properties. Here we present\nthe LE/DM algorithm, but our results can be applied to other ML methods with slight modi\ufb01cation.\nThe DM [CL06, NLCK06] algorithm embeds the data by solving the minimum eigen-problem of the\nrenormalized graph Laplacian [CL06] matrix L. The desired m dimensional embedding coordinates\nare obtained from the second to m + 1-th principal eigenvectors of graph Laplacian L, with 0 =\n0 < 1 \uf8ff . . . \uf8ff m, i.e., yi = (1(xi), . . . m(xi)) (see also Supplement B).\nTo analyze ML algorithms, it is useful to consider the limit of the mapping when the data is the\nentire manifold M. We denote this limit also by , and its image by (M) 2 Rm. For standard\nalgorithms such as LE/DM, it is known that this limit exists [CL06, BN07, HAvL05, HAvL07,\nTHJ10]. One of the fundamental requirements of ML is to preserve the neighborhood relations in\nthe original data. In mathematical terms, we require that : M! (M) is a smooth embedding,\ni.e., that is a smooth function (i.e. does not break existing neighborhood relations) whose Jacobian\nD(x) is full rank d at each x 2M (i.e. does not create new neighborhood relations).\nThe pushforward Riemannian metric A smooth does not typically preserve geometric quanti-\nties such as distances along curves in M. These concepts are captured by Riemannian geometry, and\nwe additionally assume that (M, g) is a Riemannian manifold, with the metric g induced from RD.\nOne can always associate with (M) a Riemannian metric g\u21e4, called the pushforward Riemannian\nmetric [Lee03], which preserves the geometry of (M, g); g\u21e4 is de\ufb01ned by\n\n(1)\n\nintrinsic dimension d\n\nhu, vig\u21e4(x) =\u2326D1(x)u, D1(x)v\u21b5g(x) for all u, v 2T (x)(M)\nAlgorithm 1: RMETRIC\nInput : Embedding Y 2 Rn\u21e5m, Laplacian L,\n1 for all yi 2 Y, k = 1 ! m, l = 1 ! m do\n[ \u02dcH(i)]kl =Pj6=i Lij(yjl yil)(yjk yik)\n2\n3 end\n4 for i = 1 ! n do\nU(i), \u2303(i) REDUCEDRANKSVD( \u02dcH(i), d)\nH(i) = U(i)\u2303(i)U(i)>\nG(i) = U(i)\u23031(i)U(i)>\n\nIn the above, TxM, T(x)(M) are\ntangent subspaces, D1(x) maps vec-\ntors from T(x)(M) to TxM, and\nh,i is the Euclidean scalar product.\nFor each (xi), the associated push-\nforward Riemannian metric expressed\nin the coordinates of Rm, is a sym-\nmetric, semi-positive de\ufb01nite m \u21e5 m\nmatrix G(i) of rank d. The scalar\nproduct hu, vig\u21e4(xi) takes the form\nu>G(i)v. Given an embedding Y =\n(X), G(i) can be estimated by Algo-\nrithm 1 (RMETRIC) of [PM13]. The\nRMETRIC also returns the co-metric\nH(i), which is the pseudo-inverse of\nthe metric G(i), and its Singular Value\nDecomposition \u2303(i), U(i) 2 Rm\u21e5d. The latter represents an orthogonal basis of T(x)((M)).\n3\n\n5\n6\n7\n8 end\nReturn: G(i), H(i) 2 Rm\u21e5m, U(i) 2 Rm\u21e5d,\n\nIES problem, related work, and challenges\n\n\u2303(i) 2 Rd\u21e5d, for i 2 [n]\n\nAn example Consider a continuous two dimensional strip with width W , height H, and as-\npect ratio W/H 1, parametrized by coordinates w 2 [0, W ], h 2 [0, H]. The eigenval-\nues and eigenfunctions of the Laplace-Beltrami operator with von Neumann boundary condi-\ntions [Str07] are k1,k2 = k1\u21e1\nW 2\nW cos k2\u21e1h\nH .\n\nEigenfunctions 1,0, 0,1 are in bijection with the w, h coordinates (and give a full rank em-\nbedding), while the mapping by 1,0, 2,0 provides no extra information regarding the second\ndimension h in the underlying manifold (and is rank 1). Theoretically, one can choose as\ncoordinates eigenfunctions indexed by (k1, 0), (0, k2), but, in practice, k1, and k2 are usually\n\nH 2, respectively k1,k2(w, h) = cos k1\u21e1w\n\n+ k2\u21e1\n\n1In this paper, a smooth function or manifold will be assumed to be of class at least C3.\n\n2\n\n\funknown, as the eigenvalues are index by their\nrank 0 = 0 < 1 \uf8ff 2 \uf8ff\u00b7\u00b7\u00b7\n. For a two\ndimensional strip, it is known [Str07] that 1,0\nalways corresponds to 1 and 0,1 corresponds\nto (dW/He). Therefore, when W/H > 2, the\nmapping of the strip to R2 by 1, 2 is low\nrank, while the mapping by 1, dW/He is full\nrank. Note that other mappings of rank 2 exist,\ne.g., 1, dW/He+2 (k1 = k2 = 1 in Figure 1b).\nThese embeddings re\ufb02ect progressively higher\nfrequencies, as the corresponding eigenvalues\ngrow larger.\n\n(b)\n\n(a)\nFigure 1:\n(a) Eigenfunction 1,0 versus 2,0\n(curve) or 0,1 (two dimensional manifold). (b)\nEigenfunction 1,0 versus 1,1. All three mani-\nfolds are colored by the parameterization h.\n\nPrior work [GZKR08] is the \ufb01rst work to give the IES problem a rigurous analysis. Their paper\nfocuses on rectangles, and the failure illustrated in Figure 1a is de\ufb01ned as obtaining a mapping Y =\n(X) that is not af\ufb01nely equivalent with the original data. They call this the Price of Normalization\nand explain it in terms of the variances along w and h. [DTCK18] is the \ufb01rst to frame the failure\nin terms of the rank of S = {k : k 2 S \u2713 [m]}, calling it the repeated eigendirection problem.\nThey propose a heuristic, LLRCOORDSEARCH, based on the observation that if k is a repeated\neigendirection of 1,\u00b7\u00b7\u00b7 , k1, one can \ufb01t k with local linear regression on predictors [k1] with\nlow leave-one-out errors rk. A sequential algorithm [BM17] with an unpredictability constraint in\nthe eigenproblem has also been proposed. Under their framework, the k-th coordinate k is obtained\nfrom the top eigenvector of the modi\ufb01ed kernel matrix \u02dcKk, which is constructed by the original\nkernel K and 1,\u00b7\u00b7\u00b7 , k1.\nExistence of solution Before trying to \ufb01nd an algorithmic solution to the IES problem, we ask\nthe question whether this is even possible, in the smooth manifold setting. Positive answers are\ngiven in [Por16], which proves that isometric embeddings by DM with \ufb01nite m are possible, and\nmore recently in [Bat14], which proves that any closed, connected Riemannian manifold M can be\nsmoothly embedded by its Laplacian eigenfunctions [m] into Rm for some m, which depends only\non the intrinsic dimension d of M, the volume of M, and lower bounds for injectivity radius and\nRicci curvature. The example in Figure 1a demonstrates that, typically, not all m eigenfunctions\nare needed. I.e., there exists a set S \u21e2 [m], so that S is also a smooth embedding. We follow\nIt is not known how to \ufb01nd an independent S\n[DTCK18] in calling such a set S independent.\nanalytically for a given M, except in special cases such as the strip. In this paper, we propose a\n\ufb01nite sample and algorithmic solution, and we support it with asymptotic theoretical analysis.\n\nThe IES Problem We are given data X, and the output of an embedding algorithm (DM for sim-\nplicity) Y = (X) = [1,\u00b7\u00b7\u00b7 , m] 2 Rn\u21e5m. We assume that X is sampled from a d-dimensional\nmanifold M, with known d, and that m is suf\ufb01ciently large so that (M) is a smooth embedding.\nFurther, we assume that there is a set S \u2713 [m], with |S| = s \uf8ff m, so that S is also a smooth\nembedding of M. We propose to \ufb01nd such set S so that the rank of S is d on M and S varies as\nslowly as possible.\nChallenges (1) Numerically, and on a \ufb01nite sample, distiguishing between a full rank mapping and a\nrank-defective one is imprecise. Therefore, we substitute for rank the volume of a unit parallelogram\nin T(xi)(M). (2) Since is not an isometry, we must separate the local distortions introduced\nby from the estimated rank of at x. (3) Finding the optimal balance between the above desired\nproperties. (4) In [Bat14] it is strongly suggested that s the number of eigenfunctions needed may\nexceed the Whitney embedding dimension (\uf8ff 2d), and that this number may depend on injectivity\nradius, aspect ratio, and so on. Supplement G shows an example of a \ufb02at 2-manifold, the strip with\ncavity, for which s > 2. In this paper, we assume that s and m are given and focus on selecting S\nwith |S| = s; for completeness, in Supplement G we present a heuristic to select s.\n(Global) functional dependencies, knots and crossings Before we proceed, we describe three\ndifferent ways a mapping (M) can fail to be invertible. The \ufb01rst, (global) functional dependency\nis the case when rank D< d on an open subset of M, or on all of M (yellow curve in Figure\n1a); this is the case most widely recognized in the literature (e.g., [GZKR08, DTCK18]). The knot\nis the case when rank D< d at an isolated point (Figure 1b). Third, the crossing (Figure S8 in\n\n3\n\n1011,01012,0/0,11011,01011,1\fSupplement H) is the case when : M! (M) is not invertible at x, but M can be covered with\nopen sets U such that the restriction : U ! (U ) has full rank d. Combinations of these three\nexemplary cases can occur. The criteria and approach we de\ufb01ne are based on the (surrogate) rank\nof , therefore they will not rule out all crossings. We leave the problem of crossings in manifold\nembeddings to future work, as we believe that it requires an entirely separate approach (based, e.g.,\nor the injectivity radius or density in the co-tangent bundle rather than differential structure).\n\n4 Criteria and algorithm\n\nA geometric criterion We start with the main idea in evaluating the quality of a subset S of\ncoordinate functions. At each data point i, we consider the orthogonal basis U(i) 2 Rm\u21e5d of\nthe d dimensional tangent subspace T(xi)(M). The projection of the columns of U(i) onto the\nsubspace T(xi)S(M) is U(i)[S, :] \u2318 US(i). The following Lemma connects US(i) and the\nco-metric HS(i) de\ufb01ned by S, with the full H(i).\nLemma 1. Let H(i) = U(i)\u2303(i)U(i)> be the co-metric de\ufb01ned by embedding , S \u2713 [m], HS(i)\nand US(i) de\ufb01ned above. Then HS(i) = US(i)\u2303(i)US(i)> = H(i)[S, S].\nThe proof is straightforward and left to the reader. Note that Lemma 1 is responsible for the ef\ufb01-\nciency of the search over sets S, given that the push-forward co-metric HS can be readily obtained\nk (i) the k-th column of US(i). We further normalize each uS\nas a submatrix of H. Denote by uS\nk\nto length 1 and de\ufb01ne the normalized projected volume Volnorm(S, i) =\n. Con-\nceptually, Volnorm(S, i) is the volume spanned by a (non-orthonormal) \u201cbasis\u201d of unit vectors in\nTS (xi)S(M); Volnorm(S, i) = 1 when US(i) is orthogonal, and it is 0 when rank HS(i) < d.\nIn Figure 1a, the Volnorm({1, 2}) with {1,2} = {1,0, 2,0} is close to zero, since the projec-\ntion of the two tangent vectors is parallel to the yellow curve; however Volnorm({1,dw/he}, i)\nis almost 1, because the projections of the tangent vectors U(i) will be (approximately) orthogo-\nnal. Hence, Volnorm(S, i) away from 0 indicates a non-singular S at i, and we use the average\nlog Volnorm(S, i), which penalizes values near 0 highly, as the rank quality R(S) of S.\nHigher frequency S maps with high R(S) may exist, being either smooth, such as the embeddings\nof the strip mentioned previously, or containing knots involving only small fraction of points, such\nas 1,0,1,1 in Figure 1a. To choose the lowest frequency, slowest varying smooth map, a regular-\nization term consisting of the eigenvalues k, k 2 S, of the graph Laplacian L is added, obtaining\nthe criterion\n\npdet(US (i)>US (i))\nQd\n\nk=1 kuS\n\nk (i)k2\n\n1\nn\n\nnXi=1\n\nlogqdet (US(i)>US(i))\n}\n\ni=1 R1(S;i)\n\n1\nn\n\n\n\nnXi=1\n\n|\n\ndXk=1\nlog kuS\n{z\nnPn\n\nk (i)k2\n\n\u21e3Xk2S\n\nk\n\n(2)\n\nL(S; \u21e3) =\n\n|\n\nR1(S)= 1\n\n{z\nnPn\nSearch algorithm With this criterion,\nthe IES problem turns into a subset selec-\ntion problem parametrized by \u21e3\n\nS\u21e4(\u21e3) =\n\nargmax\n\nL(S; \u21e3)\n\n(3)\n\nS\u2713[m];|S|=s;12S\n\nNote that we force the \ufb01rst coordinate 1\nto always be chosen, since this coordinate\ncannot be functionally dependent on pre-\nvious ones, and, in the case of DM, it also\nhas lowest frequency. Note also that R1\nand R2 are both submodular set function\n(proof in Supplement C.3). For large s\nand d, algorithms for optimizing over the\ndifference of submodular functions can\nbe used (e.g., see [IB12]). For the experi-\nments in this paper, we have m = 20 and\n\n}\n\nR2(S)= 1\n\ni=1 R2(S;i)\nAlgorithm 2: INDEIGENSEARCH\nInput : Data X, bandwith \", intrinsic dimension d,\n\nembedding dimension s, regularizer \u21e3\n1 Y 2 Rn\u21e5m, L, 2 Rm DIFFMAP(X,\" )\n2 U(i),\u00b7\u00b7\u00b7 , U(n) RMETRIC(Y, L, d)\n3 for S 2{ S0 \u2713 [m] : |S0| = s, 1 2 S0} do\n\n4\n5\n6\n7\n\nR1(S) 0; R2(S) 0\nfor i = 1,\u00b7\u00b7\u00b7 , n do\nUS(i) U(i)[S, :]\n2n \u00b7 log detUS(i)>US(i)\nR1(S) += 1\nn \u00b7Pd\nR2(S) += 1\nL(S; \u21e3) = R1(S) R2(S) \u21e3Pk2S k\n\n8\n9\n10\n11 end\n12 S\u21e4 = argmaxS L(S; \u21e3)\nReturn: Independent eigencoordinates set S\u21e4\n\nk=1 log kuS\n\nk (i)k2\n\nend\n\n4\n\n\fd, s = 2 \u21e0 4, which enables us to use exhaustive search to handle (3). The exact search algorithm\nis summarized in Algorithm 2 INDEIGENSEARCH. A greedy variant is also proposed and analyzed\nin Supplement D. Note that one might be able to search in the continuous space of all s-projections.\nWe conjecture the objective function (2) will be a difference of convex function and leave the details\nas future work2.\n\nmultiple lines (each correspond to a set S) with slopes Pk2S k and intercepts R(S). The larger\n\nRegularization path and choosing \u21e3 According to (2), the optimal subset S\u21e4 depends on the\nparameter \u21e3. The regularization path `(\u21e3) = maxS\u2713[m];|S|=s;12S L(S; \u21e3) is the upper envelope of\n\u21e3 is, the more the lower frequency subset penalty prevails, and for suf\ufb01ciently large \u21e3 the algorithm\nwill output [s]. In the supervised learning framework, the regularization parameters are often chosen\nby cross validation. Here we propose a second criterion, that effectively limits how much R(S) may\nbe ignored, or alternatively, bounds \u21e3 by a data dependent quantity. De\ufb01ne the leave-one-out regret\nof point i as follows\n\n(4)\n\nD(S, i) = R(Si\n\n\u21e4; [n]\\{i}) R(S; [n]\\{i}) with Si\n\n\u21e4 = argmaxS\u2713[m];|S|=s;12SR(S; i)\n\n|T|Pi2T R1(S; i) R2(S; i) for some subset T \u2713 [n]. The\nIn the above, we denote R(S; T ) = 1\nquantity D(S, i) in (4) measures the gain in R if all the other points [n]\\{i} choose the optimal\n. If the regret D(S, i) is larger than zero, it indicates that the alternative choice might\nsubset Si\nnPi D(S, i)\n\u21e4\nbe better compared to original choice S. Note that the mean value for all i, i.e., 1\n. Therefore, it might not fa-\ndepends also on the variability of the optimal choice of points i, Si\n\u21e4\nInstead, we propose to inspect the distribution of\nvor an S, if S is optimal for every i 2 [n].\nD(S, i), and remove the sets S for which \u21b5\u2019s percentile are larger than zero, e.g., \u21b5 = 75%,\nrecursively from \u21e3 = 1 in decreasing order. Namely, the chosen set is S\u21e4 = S\u21e4(\u21e30) with\n\u21e30 = max\u21e30 PERCENTILE({D(S\u21e4(\u21e3), i)}n\ni=1,\u21b5 ) \uf8ff 0. The optimal \u21e3\u21e4 value is simply chosen to be\nthe midpoint of all the \u21e3\u2019s that outputs set S\u21e4 i.e., \u21e3\u21e4 = 1\n2 (\u21e30 + \u21e300), where \u21e300 = min\u21e30 S\u21e4(\u21e3) =\nS\u21e4(\u21e30). The procedure REGUPARAMSEARCH is summarized in Algorithm S5.\n5 R as Kullbach-Leibler divergence\n\nIn this section we analyze R in its population version, and show that it is reminiscent of a Kullbach-\nLeibler divergence between unnormalized measures on S(M). The population version of the reg-\nularization term takes the form of a well-known smoothness penalty on the embedding coordinates\nS. Proofs of the theorems can be found in Supplement C.\nVolume element and the Riemannian metric Consider a Riemannian manifold (M, g) mapped\nby a smooth embedding S into (S(M), g\u21e4S ), S : M! Rs, where g\u21e4S is the push-forward\nmetric de\ufb01ned in (1). A Riemannian metric g induces a Riemannian measure on M, with volume\nelement pdet g. Denote now by \u00b5M, respectively \u00b5S (M) the Riemannian measures corresponding\nto the metrics induced on M, S(M) by the ambient spaces RD, Rs; let g be the former metric.\nLemma 2. Let S, , S, HS(x), US(x), \u2303(x) be de\ufb01ned as in Section 4 and Lemma 1. For sim-\nplicity, we denote by HS(y) \u2318 HS(1\nS (y)), and similarly for US(y), \u2303(y). Assume that S is a\nsmooth embedding. Then, for any measurable function f : M! R,\n\nZM\n\nf (x)d\u00b5M(x) =ZS (M)\n\nf (1\n\nS (y))jS(y)d\u00b5S (M)(y),\n\n(5)\n\n(6)\n\nwith\n\njS(y) = 1/ Vol(US(y)\u23031/2\n\nS (y)).\n\nAsymptotic limit of R We now study the \ufb01rst term of our criterion in the limit of in\ufb01nite sample\nsize. We make the following assumptions.\nAssumption 1. The manifold M is compact of class C3, and there exists a set S, with |S| = s so\nthat S is a smooth embedding of M in Rs.\n\n2We thank the anonymous reviewer who made this suggestion.\n\n5\n\n\fAssumption 2. The data are sampled from a distribution on M continuous with respect to \u00b5M,\nwhose density is denoted by p.\nAssumption 3. The estimate of HS in Algorithm 1 computed w.r.t. the embedding S is consistent.\n\nWe know from [Bat14] that Assumption 1 is satis\ufb01ed for the DM/LE embedding. The remaining\nassumptions are minimal requirements ensuring that limits of our quantities exist. Now consider the\nsetting in Sections 3, in which we have a larger set of eigenfunctions, [m] so that [m] contains the\n\nset S of Assumption 1. Denote by \u02dc|S(y) =Qd\n\nhere k = [\u2303]kk.\nTheorem 3 (Limit of R). Under Assumptions 1\u20133,\n\nk=1||uS\n\nk (y)||k(y))1/21 a new volume element,\n\nand\n\nlim\nn!1\n\n1\n\nnXi\n\nln R(S, xi) = R(S,M),\n\nR(S,M) = ZS (M)\n\nln\n\njS(y)\n\u02dc|S(y)\n\njS(y)p(1\n\nS (y))d\u00b5S (M)(y)\n\ndef\n\n= D(pjSkp\u02dc|S)\n\n(7)\n\n(8)\n\nThe expression D(\u00b7k\u00b7) represents a Kullbach-Leibler divergence. Note that jS \u02dc|S, which implies\nthat D is always positive, and that the measures de\ufb01ned by pjS, p\u02dc|S normalize to different values.\nBy de\ufb01nition, local injectivity is related to the volume element j. Intuitively, pjS is the observation\nand p\u02dcjS, where \u02dcjS is the minimum attainable for jS, is the model; the objective itself is looking for\na view S of the data that agrees with the model.\nIt is known that k, the k-th eigenvalue of the Laplacian, converges under certain technical condi-\ntions [BN07] to an eigenvalue of the Laplace-Beltrami operator M and that\n2d\u00b5(M).\n\nk(M) = hk, Mki =ZM k grad k(x)k2\n\nHence, a smaller value for the regularization term encourages the use of slow varying coordinate\nfunctions, as measured by the squared norm of their gradients, as in equation (9). Hence, under\nAssumptions 1, 2, 3, L converges to\n\n(9)\n\nL(S,M) = D(pjSkp\u02dc|S) \u2713 \u21e3\n\n1(M)\u25c6Xk2S\n\nk(M).\n\n(10)\n\nSince eigenvalues scale with the volume of M, the rescaling of \u21e3 in comparison with equation (2)\nmakes the \u21e3 above adimensional.\n\n6 Experiments\n\nWe demonstrate the proposed algorithm on three synthetic datasets, one where the minimum em-\nbedding dimension s equals d (D1 long strip), and two (D7 high torus and D13 three torus) where\ns > d. The complete list of synthetic manifolds (transformations of 2 dimensional strips, 3 di-\nmensional cubes, two and three tori, etc.) investigated can be found in Supplement H and Table\nS2. The examples have (i) aspect ratio of at least 4 (ii) points sampled non-uniformly from the un-\nderlying manifold M, and (iii) Gaussian noise added. The sample size of the synthetic datasets is\nn = 10, 000 unless otherwise stated. Additionally, we analyze several real datasets from chemistry\nand astronomy. All embeddings are computed with the DM algorithm, which outputs m = 20 eigen-\nvectors. Hence, we examine 171 sets for s = 3 and 969 sets for s = 4. No more than 2 to 5 of these\nsets appear on the regularization path. Detailed experimental results are in Table S3. In this section,\nwe show the original dataset X, the embedding S\u21e4, with S\u21e4 selected by INDEIGENSEARCH and\n\u21e3\u21e4 from REGUPARAMSEARCH, and the maximizer sets on the regularization path with box plots of\nD(S, i) as discussed in Section 4. The \u21b5 threshold for REGUPARAMSEARCH is set to 75%. The\nkernel bandwidth \" for synthetic datasets is chosen manually. For real datasets, \" is optimized as\nin [JMM17]. All the experiments are replicated for more than 5 times, and the outputs are similar\nbecause of the large sample size n.\n\n6\n\n\fOriginal data X\n\nEmbedding S\u21e4\n\nRegularization path\n\n1\n\nD\n\n7\n\nD\n\n3\n1\n\nD\n\nFigure 2: Experimental result for synthetic datasets. Rows correspond to different synthetic datasets\n(please refer to Table S2). Optimal subset S\u21e4 is selected by INDEIGENSEARCH.\n\n(a) Embedding [3]\n\n(b) L({1, i, j})\n\n(c) S1 = {1,4,6}\n\n(d) S2 = {1,5,7}\n\n(f) L({1, 3}) = 0.39\n\n(h) rk vs. k\n\n(e) L({1, 2}) = 1.24\nFigure 3: First row: Chloromethane dataset; second row: SDSS dataset in (e), (f) and (g), (h) show\nthe example when LLR failed.\n(c) and (d) are embeddings with top two ranked subsets S1 and S2,\ncolored by the distances between C and two different Cl \u2013 , respectively. (e) and (f) are embeddings\nof {1,2} (suboptimal set) and {1,3} (maximizer of L), respectively (values shown in caption).\n\n(g) Subset {1,2,5} by LLR\n\nSynthetic manifolds The results of synthetic manifolds are in Figure 2. (i) Manifold with s = d.\nThe \ufb01rst synthetic dataset we considered, D1, is a two dimensional strip with aspect ratio W/H =\n2\u21e1. Left panel of the top row shows the scatter plot of such dataset. From the theoretical analysis\nin Section 3, the coordinate set that corresponds to slowest varying unique eigendirection is S =\n{1,dW/He} = {1, 7}. Middle panel, with S\u21e4 = {1, 7} selected by INDEIGENSEARCH with \u21e3 cho-\nsen by REGUPARAMSEARCH, con\ufb01rms this. The right panel shows the box plot of {D(S, i)}n\ni=1.\nAccording to the proposed procedure, we eliminate S0 = {1, 2} since D(S0, i) 0 for almost all\nthe points. (ii) Manifold with s > d. The second data D7 is displayed in the left panel of the second\nrow. Due to the mechanism we used to generate the data, the resultant torus is non-uniformly dis-\ntributed along the z axis. Middle panel is the embedding of the optimal coordinate set S\u21e4 = {1, 4, 5}\nselected by INDEIGENSEARCH. Note that the middle region (in red) is indeed a two dimensional\nnarrow tube when zoomed in. The right panel indicates that both {1, 2, 3} and {1, 2, 4} (median\n\n7\n\n102100102104\u21e3{1,2}{1,7}202D(S,i)1010107104101102\u21e3{1,2,3}{1,2,4}{1,4,5}101D(S,i)102101100101102\u21e3{1,2,3,4}{1,2,3,5}{1,2,5,10}{1,5,6,10}2024D(S,i)\fis around zero) should be removed. The optimal regularization parameter is \u21e3\u21e4 \u21e1 7. The result of\nthe third dataset D13, three torus, is in the third row of the \ufb01gure. We displayed only projections\nof the penultimate and the last coordinate of original data X and embedding S\u21e4 (which is {5, 10})\ncolored by \u21b51 of (S15) in the left and middle panel to conserve space. A full combinations of coor-\ndinates can be found in Figure S5. The right panel implies one should eliminate the set {1, 2, 3, 4}\nand {1, 2, 3, 5} since both of them have more than 75% of the points such that D(S, i) 0. The\n\ufb01rst remaining subset is {1, 2, 5, 10}, which yields an optimal regularization parameter \u21e3\u21e4 \u21e1 5.\nMolecular dynamics dataset [FTP16] The dataset has size n \u21e1 30, 000 and ambient dimension\nD = 40, with the intrinsic dimension estimate be \u02c6d = 2 (see Supplement H.1 for details). The\nembedding with coordinate set S = [3] is shown in Figure 3a. The \ufb01rst three eigenvectors pa-\nrameterize the same directions, which yields a one dimensional manifold in the \ufb01gure. Top view\n(S = [2]) of the \ufb01gure is a u-shaped structure similar to the yellow curve in Figure 1a. The heat map\nof L({1, i, j}) for different combinations of coordinates in Figure 3b con\ufb01rms that L for S = [3] is\nlow and that 1, 2 and 3 give a low rank mapping. The heat map also shows high L values for\nS1 = {1, 4, 6} or S2 = {1, 5, 7}, which correspond to the top two ranked subsets. The embeddings\nwith S1, S2 are in Figures 3c and 3d, respectively. In this case, we obtain two optimal S sets due to\nthe data symmetry.\n\n3 [AAMA+09], preprocessed as in\nGalaxy spectra from the Sloan Digital Sky survey (SDSS)\n[MMVZ16]. We display a sample of n = 50, 000 points from the \ufb01rst 0.3 million points which\ncorrespond to closer galaxies. Figures 3e and 3f show that the \ufb01rst two coordinates are almost\ndependent; the embedding with S\u21e4 = {1, 3} is selected by INDEIGENSEARCH with d = 2. Both\nplots are colored by the blue spectrum magnitude, which is correlated to the number of young stars\nin the galaxy, showing that this galaxy property varies smoothly and non-linearly with 1, 3, but is\nnot smooth w.r.t. 1, 2.\n\nComparison with [DTCK18] The LLRCOORDSEARCH method outputs similar candidate coor-\ndinates as our proposed algorithm most of the time (see Table S3). However, the results differ for\nhigh torus as in Figure 3. Figure 3h is the leave one out (LOO) error rk versus coordinates. The\ncoordinates chosen by LLRCOORDSEARCH was S = {1, 2, 5}, as in Figure 3g. The embedding is\nclearly shown to be suboptimal, for it failed to capture the cavity within the torus. This is because the\nalgorithm searches in a sequential fashion; the noise eigenvector 2 in this example appears before\nthe signal eigenvectors e.g., 4 and 5.\n\nAdditional experiments with real data are shown in Table 1. Not surprisingly, for most real data\nsets we examined, the independent coordinates are not the \ufb01rst s. They also show that the algorithm\nscales well and is robust to the noise present in real data.\n\nTable 1: Results for other real datasets. Columns from left to right are sample size n, ambient\ndimension of data D, average degree of neighbor graph degavg, (s, d) and runtime for IES, and the\nchosen set S\u21e4, respectively. Last three datasets are from [CTS+17].\n\nSDSS (full)\nAspirin\nEthanol\nMalondialdehyde\n\nn\n298,511\n211,762\n555,092\n993,237\n\nD degavg\n144.91\n101.03\n107.27\n106.51\n\n3750\n244\n102\n96\n\n(s, d)\n(2, 2)\n(4, 3)\n(3, 2)\n(3, 2)\n\nt (sec)\n106.05\n85.11\n233.16\n459.53\n\nS\u21e4\n(1, 3)\n(1, 2, 3, 7)\n(1, 2, 4)\n(1, 2, 3)\n\nThe asymptotic runtime of LLRCOORDSEARCH has quadratic dependency on n, while for our\nalgorithm is linear in n. Details of runtime analysis are Supplement F. LLRCOORDSEARCH was\ntoo slow to be tested on the four larger datasets (see also Figure S1).\n\n3The Sloan Digital Sky Survey data can be downloaded from https://www.sdss.org\n\n8\n\n\f7 Conclusion\n\nAlgorithms that use eigenvectors, such as DM, are among the most promising and well studied in\nML. It is known since [GZKR08] that when the aspect ratio of a low dimensional manifold exceeds\na threshold, the choice of eigenvectors becomes non-trivial, and that this threshold can be as low\nas 2. Our experimental results con\ufb01rm the need to augment ML algorithms with IES methods in\norder to successfully apply ML to real world problems. Surprisingly, the IES problem has received\nlittle attention in the ML literature, to the extent that the dif\ufb01culty and complexity of the problem\nhave not been recognized. Our paper advances the state of the art by (i) introducing for the \ufb01rst\ntime a differential geometric de\ufb01nition of the problem, (ii) highlighting geometric factors such as\ninjectivity radius that, in addition to aspect ratio, in\ufb02uence the number of eigenfunctions needed for\na smooth embedding, (iii) constructing selection criteria based on intrinsic manifold quantities, (iv)\nwhich have analyzable asymptotic limits, (v) can be computed ef\ufb01ciently, and (vi) are also robust to\nthe noise present in real scienti\ufb01c data. The library of hard synthetic examples we constructed will\nbe made available along with the python software implementation of our algorithms.\n\nAcknowledgements\n\nThe authors acknowledge partial support from the U.S. Department of Energy, Solar Energy Tech-\nnology Of\ufb01ce award DE-EE0008563 and from the NSF DMS PD 08-1269 and NSF IIS-0313339\nawards. They are grateful to the Tkatchenko and Pfaendtner labs and in particular to Stefan Chmiela\nand Chris Fu for providing the molecular dynamics data and for many hours of brainstorming and\nadvice.\n\nReferences\n[AAMA+09] Kevork N Abazajian, Jennifer K Adelman-McCarthy, Marcel A Ag\u00a8ueros, Sahar S\nAllam, Carlos Allende Prieto, Deokkeun An, Kurt SJ Anderson, Scott F Anderson,\nJames Annis, Neta A Bahcall, et al. The seventh data release of the sloan digital sky\nsurvey. The Astrophysical Journal Supplement Series, 182(2):543, 2009.\n\n[Bat14] Jonathan Bates. The embedding dimension of laplacian eigenfunction maps. Applied\n\nand Computational Harmonic Analysis, 37(3):516\u2013530, 2014.\n\n[BM17] Yochai Blau and Tomer Michaeli. Non-redundant spectral dimensionality reduction.\nIn Joint European Conference on Machine Learning and Knowledge Discovery in\nDatabases, pages 256\u2013271. Springer, 2017.\n\n[BN07] Mikhail Belkin and Partha Niyogi. Convergence of laplacian eigenmaps.\n\nIn\nB. Sch\u00a8olkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information\nProcessing Systems 19, pages 129\u2013136. MIT Press, 2007.\n\n[CL06] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic\n\nAnalysis, 30(1):5\u201330, 2006.\n\n[CTS+17] Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T\nSch\u00a8utt, and Klaus-Robert M\u00a8uller. Machine learning of accurate energy-conserving\nmolecular force \ufb01elds. Science advances, 3(5):e1603015, 2017.\n\n[Dry16] I. L. (Ian L.) Dryden. Statistical shape analysis : with applications in R. Wiley\nseries in probability and statistics. Wiley, Chichester, West Sussex, England, 2nd ed.\nedition, 2016.\n\n[DS13] Sanjoy Dasgupta and Kaushik Sinha. Randomized partition trees for exact nearest\n\nneighbor search. In Conference on Learning Theory, pages 317\u2013337, 2013.\n\n[DTCK18] Carmeline J Dsilva, Ronen Talmon, Ronald R Coifman, and Ioannis G Kevrekidis.\nParsimonious representation of nonlinear dynamical systems through manifold learn-\ning: A chemotaxis case study. Applied and Computational Harmonic Analysis,\n44(3):759\u2013773, 2018.\n\n9\n\n\f[FTP16] Kelly L. Fleming, Pratyush Tiwary, and Jim Pfaendtner. New approach for inves-\ntigating reaction dynamics and rates with ab initio calculations. Jornal of Physical\nChemistry A, 120(2):299\u2013305, 2016.\n\n[GZKR08] Yair Goldberg, Alon Zakai, Dan Kushnir, and Yaacov Ritov. Manifold learning: The\nprice of normalization. Journal of Machine Learning Research, 9(Aug):1909\u20131939,\n2008.\n\n[Har98] David A Harville. Matrix algebra from a statistician\u2019s perspective, 1998.\n\n[HAvL05] Matthias Hein, Jean-Yves Audibert, and Ulrike von Luxburg. From graphs to man-\nifolds - weak and strong pointwise consistency of graph laplacians.\nIn Learning\nTheory, 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy,\nJune 27-30, 2005, Proceedings, pages 470\u2013485, 2005.\n\n[HAvL07] Matthias Hein, Jean-Yves Audibert, and Ulrike von Luxburg. Graph laplacians and\ntheir convergence on random neighborhood graphs. Journal of Machine Learning\nResearch, 8:1325\u20131368, 2007.\n\n[HHJ90] Roger A Horn, Roger A Horn, and Charles R Johnson. Matrix analysis. Cambridge\n\nuniversity press, 1990.\n\n[IB12] Rishabh Iyer and Jeff Bilmes. Algorithms for approximate minimization of the differ-\nence between submodular functions, with applications. In Proceedings of the Twenty-\nEighth Conference on Uncertainty in Arti\ufb01cial Intelligence, UAI\u201912, pages 407\u2013417,\nArlington, Virginia, United States, 2012. AUAI Press.\n\n[JMM17] Dominique Joncas, Marina Meila, and James McQueen. Improved graph laplacian\nvia geometric self-consistency. In Advances in Neural Information Processing Sys-\ntems, pages 4457\u20134466, 2017.\n\n[LB05] Elizaveta Levina and Peter J Bickel. Maximum likelihood estimation of intrinsic\ndimension. In Advances in neural information processing systems, pages 777\u2013784,\n2005.\n\n[Lee03] John M. Lee. Introduction to smooth manifolds, 2003.\n\n[MMVZ16] James McQueen, Marina Meil\u02d8a, Jacob VanderPlas, and Zhongyue Zhang. Mega-\nman: Scalable manifold learning in python. Journal of Machine Learning Research,\n17(148):1\u20135, 2016.\n\n[NLCK06] Boaz Nadler, Stephane Lafon, Ronald Coifman, and Ioannis Kevrekidis. Diffusion\nmaps, spectral clustering and eigenfunctions of Fokker-Planck operators. In Y. Weiss,\nB. Sch\u00a8olkopf, and J. Platt, editors, Advances in Neural Information Processing Sys-\ntems 18, pages 955\u2013962, Cambridge, MA, 2006. MIT Press.\n\n[NWF78] George L Nemhauser, Laurence A Wolsey, and Marshall L Fisher. An analysis of\napproximations for maximizing submodular set functionsi. Mathematical program-\nming, 14(1):265\u2013294, 1978.\n\n[PM13] D. Perraul-Joncas and M. Meila. Non-linear dimensionality reduction: Riemannian\nmetric estimation and the problem of geometric discovery. ArXiv e-prints, May 2013.\n[Por16] Jacobus W Portegies. Embeddings of riemannian manifolds with heat kernels and\neigenfunctions. Communications on Pure and Applied Mathematics, 69(3):478\u2013518,\n2016.\n\n[Str07] Walter A Strauss. Partial differential equations: An introduction. Wiley, 2007.\n[THJ10] Daniel Ting, Ling Huang, and Michael I. Jordan. An analysis of the convergence of\ngraph laplacians. In Proceedings of the 27th International Conference on Machine\nLearning (ICML-10), pages 1079\u20131086, 2010.\n\n10\n\n\f", "award": [], "sourceid": 635, "authors": [{"given_name": "Yu-Chia", "family_name": "Chen", "institution": "University of Washington"}, {"given_name": "Marina", "family_name": "Meila", "institution": "University of Washington"}]}