{"title": "Global Versus Local Methods in Nonlinear Dimensionality Reduction", "book": "Advances in Neural Information Processing Systems", "page_first": 721, "page_last": 728, "abstract": null, "full_text": "Global Versus Local Methods\n\nin Nonlinear Dimensionality Reduction\n\nVin de Silva\n\nDepartment of Mathematics,\n\nStanford University,\nStanford. CA 94305\n\nsilva@math.stanford.edu\n\nJoshua B. Tenenbaum\n\nDepartment of Brain and Cognitive Sciences,\n\nMassachusetts Institute of Technology,\n\nCambridge. MA 02139\njbt@ai.mit.edu\n\nAbstract\n\nRecently proposed algorithms for nonlinear dimensionality reduction fall\nbroadly into two categories which have different advantages and disad-\nvantages: global (Isomap [1]), and local (Locally Linear Embedding [2],\nLaplacian Eigenmaps [3]). We present two variants of Isomap which\ncombine the advantages of the global approach with what have previ-\nously been exclusive advantages of local methods: computational spar-\nsity and the ability to invert conformal maps.\n\n1 Introduction\n\nIn this paper we discuss the problem of nonlinear dimensionality reduction (NLDR): the\ntask of recovering meaningful low-dimensional structures hidden in high-dimensional data.\nAn example might be a set of pixel images of an individual\u2019s face observed under differ-\nent pose and lighting conditions; the task is to identify the underlying variables (pose an-\ngles, direction of light, etc.) given only the high-dimensional pixel image data. In many\ncases of interest, the observed data are found to lie on an embedded submanifold of the\nhigh-dimensional space. The degrees of freedom along this submanifold correspond to the\nunderlying variables. In this form, the NLDR problem is known as \u201cmanifold learning\u201d.\n\nClassical techniques for manifold learning, such as principal components analysis (PCA)\nor multidimensional scaling (MDS), are designed to operate when the submanifold is em-\nbedded linearly, or almost linearly, in the observation space. More generally there is a\nwider class of techniques, involving iterative optimization procedures, by which unsatis-\nfactory linear representations obtained by PCA or MDS may be \u201cimproved\u201d towards more\nsuccessful nonlinear representations of the data. These techniques include GTM [4], self\norganising maps [5] and others [6,7]. However, such algorithms often fail when nonlinear\nstructure cannot simply be regarded as a perturbation from a linear approximation; as in\nthe Swiss roll of Figure 3. In such cases, iterative approaches tend to get stuck at locally\noptimal solutions that may grossly misrepresent the true geometry of the situation.\n\nRecently, several entirely new approaches have been devised to address this problem. These\nmethods combine the advantages of PCA and MDS\u2014computational ef\ufb01ciency; few free\nparameters; non-iterative global optimisation of a natural cost function\u2014with the ability to\nrecover the intrinsic geometric structure of a broad class of nonlinear data manifolds.\n\n\fThese algorithms come in two \ufb02avors: local and global. Local approaches (LLE [2], Lapla-\ncian Eigenmaps [3]) attempt to preserve the local geometry of the data; essentially, they\nseek to map nearby points on the manifold to nearby points in the low-dimensional rep-\nresentation. Global approaches (Isomap [1]) attempt to preserve geometry at all scales,\nmapping nearby points on the manifold to nearby points in low-dimensional space, and\nfaraway points to faraway points.\n\nThe principal advantages of the global approach are that it tends to give a more faithful\nrepresentation of the data\u2019s global structure, and that its metric-preserving properties are\nbetter understood theoretically. The local approaches have two principal advantages: (1)\ncomputational ef\ufb01ciency: they involve only sparse matrix computations which may yield\na polynomial speedup; (2) representational capacity: they may give useful results on a\nbroader range of manifolds, whose local geometry is close to Euclidean, but whose global\ngeometry may not be.\n\nIn this paper we show how the global geometric approach, as implemented in Isomap,\ncan be extended in both of these directions. The results are computational ef\ufb01ciency and\nrepresentational capacity equal to or in excess of existing local approaches (LLE, Laplacian\nEigenmaps), but with the greater stability and theoretical tractability of the global approach.\nConformal Isomap (or C-Isomap) is an extension of Isomap which is capable of learning the\nstructure of certain curved manifolds. This extension comes at the cost of making a uniform\nsampling assumption about the data. Landmark Isomap (or L-Isomap) is a technique for\napproximating a large global computation in Isomap by a much smaller set of calculations.\nMost of the work focuses on a small subset of the data, called the landmark points.\n\nThe remainder of the paper is in two sections. In Section 2, we describe a perspective on\nmanifold learning in which C-Isomap appears as the natural generalisation of Isomap. In\nSection 3 we derive L-Isomap from a landmark version of classical MDS.\n\n2 Isomap for conformal embeddings\n\n2.1 Manifold learning and geometric invariants\n\n\u0002\u0004\u0003\n\n\t\b\n\n, and let\n\nbased on a given set\n\n\f\u000e\r\u000f\u0001\nof observed data in\n\nWe can view the problem of manifold learning as an attempt to invert a generative model\nfor a set of observations. Let \n-dimensional domain contained in the Euclidean\n. The object of\nspace\n\u0002\u000b\n\n.\nmanifold learning is to recover \nThe observed data arise as follows. Hidden data\n, and are\nthen mapped by\n\nbe a\nbe a smooth embedding, for some\nand\n\n\u0005\u0007\u0006\n\u0010\u0012\u0017\u0018\u0013\u0019\u0015\nto become the observed data, so\n\u0010\u0012\u0011\n\nare generated randomly in \n\u0013\u001b\u001a\n\nif we are to relate the\nThe problem as stated is ill-posed: some restriction is needed on\nitself. We\nobserved geometry of the data to the structure of the hidden variables\n\u0010!\u0017\u0018\u0013\"\u0015\nis an isometric embedding in the sense of\nwill discuss two possibilities. The \ufb01rst is that\npreserves in\ufb01nitesmal lengths and angles. The second possi-\nRiemannian geometry; so\nis a conformal embedding; it preserves angles but not lengths. Equivalently,\nbility is that\nat every point\nget mag-\n\u0017$#\nni\ufb01ed in length by a factor\n. The class of conformal embeddings includes all isometric\nembeddings as well as many other families of maps, including stereographic projections\nsuch as the Mercator projection.\n\nsuch that in\ufb01nitesimal vectors at\n\nthere is a scalar\n\n\u0010\u0012\u0011\u0014\u0013\u0016\u0015\n\u0013 \u001f\n\n%&\u001c'\u0017\n\n)(\n\n.\n\n\u0005\u001d\u001c\u001e\u0017\n\nand \n\n%*\u001c\u001e\u0017\n\nare invariant under the mapping\n\nOne approach to solving a manifold learning problem is to identify which aspects of the\ngeometry of \nis an isometric embed-\nding then by de\ufb01nition in\ufb01nitesimal distances are preserved. But more is true. The length\nof a path in \nis de\ufb01ned by integrating the in\ufb01nitesimal distance metric along the path.\nThe same is true in\n, then\nis the same length as the shortest path\nthe shortest path between\n\nare two points in \n\npreserves path lengths. If\n\n. For example, if\n\n, so\n\nand\n\nlying inside \n\n\u0017,+.-\n\n\u0005\u001d\u001c\n\n\u0001\n\u0005\n\u0002\n\n\u0005\n\u0015\n\u0005\n\u0005\n\u0005\n\u0005\n\n\u001f\n\u0017\n\u001f\n\u0005\n\u0005\n\n\u001f\n\u0005\n\u0017\n-\n\falong\n\nbetween\n\u0005\u001d\u001c'\u0017\nsion is that \nIsomap exploits this idea by constructing the geodesic metric for\nmatrix, using the observed data alone.\n\n. Thus geodesic distances are preserved. The conclu-\n, regarded as metric spaces under geodesic distance.\napproximately as a\n\nand\n\u0005\u001d\u001c\nis isometric with\n\n\u0005\u001d\u001c\n\n\u0005\u001d\u001c\n\n\u0005\u001d\u001c\n\n%*\u001c\u001e\u0017\n\n, it is natural to try to estimate\n\nTo solve the conformal embedding problem, we need to identify an observable geometric\ninvariant of conformal maps. Since conformal maps are locally isometric up to a scale\nfactor\nin the observed data. By\nrescaling, we can then restore the original metric structure of the data and proceed as in\nIsomap. We can do this by noting that a conformal map\nby a\nfactor\n, the local density of the\ncan be estimated\nobserved data will be \u0001\u0003\u0002\n%*\u001c\u001e\u0017\nin terms of the observed local data density, provided that the original sampling is uniform.\nC-Isomap implements a version of this idea which is independent of the dimension\n\n\u0005\u001d\u001c\u001e\u0017\n. Hence if the hidden data are sampled uniformly in \n\nrescales local volumes in \n\n. It follows that the conformal factor\n\nat each point\n\n%*\u001c\u001e\u0017\n\n%*\u001c\u001e\u0017\n\n%&\u001c'\u0017\n\n\u001f.\u0003\n\n\u001f\u0016\u0003\n\n.\n\nThis uniform sampling assumption may appear to be a severe restriction, but we believe\nit re\ufb02ects a necessary tradeoff in dealing with a larger class of maps. Moreover, as we\nillustrate below, our algorithm appears in practice to be robust to moderate violations of\nthis assumption.\n\n2.2 The Isomap and C-Isomap algorithms\n\nThere are three stages to Isomap [1]:\n\n1. Determine a neighbourhood graph\nFor example,\nmight contain\n\u0011,\u0013\n\u0011\u0006\u0005\n(and vice versa). Alternatively,\nfor some\n\n.\n\niff\n\nof the observed data\n\u0011\u0006\u0005\n\nis one of the\n\nmight contain the edge\n\niff\n\nin a suitable way.\n\u0010!\u0011\nnearest neighbours of\n\u0011,\u0013\n\u0011\u0014\u0013'\u0011\b\u0005\n\u0011\b\u0005\f\t\u000e\r\u0010\u000f\n\u0013 \u0011\u0006\u0005\n, or by some other useful\n\n\u0013\u000b\n\n,\n\nto \ufb01nd a new embed-\n\n2. Compute shortest paths in the graph for all pairs of data points. Each edge\n\nin the graph is weighted by its Euclidean length\nmetric.\n\n\u0011\u0014\u0013\u0006\n\u000b\u0011\b\u0005\f\t\n3. Apply MDS to the resulting shortest-path distance matrix\n\nding of the data in Euclidean space, approximating \n\n.\n\nThe premise is that local metric information (in this case, lengths of edges\nin the\n\u0011\u0014\u0013\nneighbourhood graph) is regarded as a trustworthy guide to the local metric structure in the\noriginal (latent) space. The shortest-paths computation then gives an estimate of the global\nmetric structure, which can be fed into MDS to produce the required embedding.\n\n\u0011\u0006\u0005\n\nIt is known that Step 2 converges on the true geodesic structure of the manifold given\nsuf\ufb01cient data, and thus Isomap yields a faithful low-dimensional Euclidean embedding\nwhenever the function\nTheorem. Let  be sampled from a bounded convex region in\nfunction\n\r)(\n+\u0017\u0016\n\n, for a suitable choice of neighbourhood size parameter\n\nis an isometry. More precisely, we have (see [8]):\n\n, with respect to a density\n. Given\n\n-smooth isometric embedding of that region in\n\n, we have\n\n. Let\n\nbe a\n\n\u0012\u0014\u0013\n\nor\n\n\u001c\u001e\u0017\n\nrecovered distance\noriginal distance\n\n\u0015\u0019\u0018\n, provided that the sample size is suf\ufb01ciently large.\n\n\u0001\u001b\u001a\n\n[The\n\nwith probability at least \u0001\nformula is taken to hold for all pairs of points simultaneously.]\nC-Isomap is a simple variation on Isomap. Speci\ufb01cally, we use the\nin Step 1, and replace Step 2 with the following:\n\n\u001c\u0016\n\n-neighbours method\n\n\u001f\n-\n\u001f\n\n\u001f\n\n\u001f\n\n\u001f\n\u001f\n\u001f\n\u001f\n\u0005\n\u001f\n\u0001\n\u0004\n\u0013\n\u0015\n\u0004\n\u0007\n\u0004\n\t\n\u0011\n\u000f\n\u0011\n\t\n\f\n\u0005\n\u0002\n\u0003\n\u0011\n\u001a\n\u0011\n\u001f\n\u0005\n\u0002\n\n\u0015\n\u000f\n\u0007\n\u0001\n\n\u0018\n\u0015\n\u0007\n\f2a. Compute shortest paths in the graph for all pairs of data points. Each edge\n\nthe graph is weighted by\nof\n\nto its\n\n\u0011\u0006\u0005\f\t\nnearest neighbours.\n\n\u0011,\u0013\n\n\u001c\u0004\u0003\n\n\u001c\u0006\u0005\n\n. Here \u0002\n\nin\nis the mean distance\n\n\u0011\u0006\u0005\n\n\u001c\u0004\u0003\n\n+\u0017\u0016\n\n.\n\nUsing similar arguments to those in [8], one can prove a convergence theorem for C-\nIsomap. The exact formula for the weights is not critical in the asymptotic analysis. The\npoint is that the rescaling factor \u0001\nis an asymptotically accurate approximation\nto the conformal scaling factor in the neighbourhood of\nTheorem. Let \na\nchoice of neighbourhood size parameter\n\n-smooth conformal embedding of that region in\n, we have\n\nbe sampled uniformly from a bounded convex region in\n\n\u0011\b\u0005\n. Given \u0015\n\nbe\n, for a suitable\n\n. Let\n\nand\n\n\u0011\u0014\u0013\n\n\u001c\u0004\u0003\n\n\u001c\u0006\u0005\n\nrecovered distance\noriginal distance\n\n\u0015\u0019\u0018\n, provided that the sample size is suf\ufb01ciently large.\n\n\u0001\u001b\u001a\n\nwith probability at least \u0001\nIt is possible but unpleasant to \ufb01nd explicit lower bounds for the sample size. Qualita-\ntively, we expect to require a larger sample size for C-Isomap since it depends on two\napproximations\u2014local data density and geodesic distance\u2014rather than one. In the special\ncase where the conformal embedding is actually an isometry, it is therefore preferable to\nuse Isomap rather than C-Isomap. This is borne out in practice.\n\n2.3 Examples\n\nWe ran C-Isomap, Isomap, MDS and LLE on three \u201c\ufb01shbowl\u201d examples with different data\ndistributions, as well as a more realistic simulated data set. We refer to Figure 1.\nFishbowls: These three datasets differ only in the probability density used to generate\nthe points. For the conformal \ufb01shbowl (column 1), 2000 points were generated randomly\nand then projected stereographically (hence conformally\nuniformly in a circular disk \nmapped) onto a sphere. Note the high concentration of points near the rim. There is no\nmetrically faithful way of embedding a curved \ufb01shbowl inside a Euclidean plane, so clas-\nsical MDS and Isomap cannot succeed. As predicted, C-Isomap does recover the original\n(as does LLE). Contrast with the uniform \ufb01shbowl (column 2), with data\ndisk structure of \npoints sampled using a uniform measure on the \ufb01shbowl itself. In this situation C-Isomap\nbehaves like Isomap, since the rescaling factor is approximately constant; hence it is unable\nto \ufb01nd a topologically faithful 2-dimensional representation. The offset \ufb01shbowl (column 3)\nusing a shallow\nis a perturbed version of the conformal \ufb01shbowl; points are sampled in \nGaussian offset from center, then stereographically projected onto a sphere. Although the\ntheoretical conditions for perfect recovery are not met, C-Isomap is robust enough to \ufb01nd a\ntopologically correct embedding. LLE, in contrast, produces topological errors and metric\n(columns 2 and 3).\ndistortion in both cases where the data are not uniformly sampled in \nFace images: Arti\ufb01cial images of a face were rendered as \u0001\b\u0007\n\t\f\u000b\n\u0001\r\u0007\u000e\t pixel images and ras-\nterized into 16384-dimensional vectors. The images varied randomly and independently\nin two parameters: left-right pose angle\n. There is a natural\nfamily of conformal transformations for this data manifold, if we ignore perspective distor-\ntions in the closest images: namely\n, which has the effect of shrinking\nor magnifying the apparent size of images by a constant factor. Sampling uniformly in\nand in\ngives a data set approximately satisfying the required conditions for C-Isomap.\nWe generated 2000 face images in this way, spanning the range indicated by Figure 2. All\nfour algorithms returned a two-dimensional embedding of the data. As expected, C-Isomap\nreturns the cleanest embedding, separating the two degrees of freedom reliably along the\nhorizontal and vertical axes. Isomap returns an embedding which narrows predictably as\nthe face gets further away. The LLE embedding is highly distorted.\n\nand distance from camera\n\n, for \u0015\n\n\u000f(\n\n\u0017\u0011\u0010\n\n\u0012\u0014\u0013\u0016\u0015\n\n\u0011\n\u0013\n\t\n\n\u0002\n\u0001\n\u0002\n\u001f\n\u0002\n\u001f\n\u001f\n\u0011\n\u0013\n\u0007\n\u0002\n\u001f\n\u0002\n\u001f\n\u0002\n\u0003\n\u0005\n\u0012\n\u0013\n\u0002\n\n(\n\u0007\n\u0001\n\n\u0018\n\u0015\n\n\u0016\n\u000f\n\u0017\n\b\n\u0015\n\u0017\n\u000f\n\u0017\n\fconformal fishbowl\n\nuniform fishbowl\n\noffset fishbowl\n\nface images\n\nMDS\n\nMDS\n\nMDS\n\nMDS\n\nIsomap: k = 15\n\nIsomap: k = 15\n\nIsomap: k = 15\n\nIsomap: k = 15\n\nC\u2212Isomap: k = 15\n\nC\u2212Isomap: k = 15\n\nC\u2212Isomap: k = 15\n\nC\u2212Isomap: k = 15\n\nLLE: k = 15\n\nLLE: k = 15\n\nLLE: k = 15\n\nLLE: k = 15\n\nFigure 1: Four dimensionality reduction algorithms (MDS, Isomap, C-Isomap, and LLE)\nare applied to three versions of a toy \u201c\ufb01shbowl\u201d dataset, and to a more complex data mani-\nfold of face images.\n\nFigure 2: A set of 2000 face images were randomly generated, varying independently in\ntwo parameters: distance and left-right pose. The four extreme cases are shown.\n\n\f3 Isomap with landmark points\n\n(\n\n\f\u0002\u0001\n\n\u001c\u0004\u0006\u0005\n\n\u001c\u0004\t\u0005\n\n\u0012\u0006\u0013\n\u0015\b\n\n. Instead of computing\n\nshortest-paths distance matrix\n\ncomputations in LLE and Laplacian Eigenmaps are sparse (hence considerably cheaper).\n\n matrix and has complexity\u0003\n\n. Using Floyd\u2019s algorithm this is \u0003\n\ntrix\ncedure LMDS (Landmark MDS), we \ufb01nd a Euclidean embedding of the data using\ninstead of\n\n; this can be\nby implementing Dijkstra\u2019s algorithm with Fibonacci heaps\nis the neighbourhood size). The second bottleneck is the MDS eigenvalue calculation,\n. In contrast, the eigenvalue\n\nThe Isomap algorithm has two computational bottlenecks. The \ufb01rst is calculating the\nimproved to \u0003\n\u0007\u0007\nwhich involves a full\nL-Isomap addresses both of these inef\ufb01ciencies at once. We designate\n of the data points to\nbe landmark points, where\n\f\u000b\r\n ma-\n, we compute the\n\n\u0001 of distances from each data point to the landmark points only. Using a new pro-\n\f\u000f\u000e\u0011\u0010\n\u000e\u0007\u0010\n. This leads to an enormous savings when \n\nis much less than \ncan be computed using Dijkstra in\u0003\n\u001c\u0013\n\u000b\u0013\u0014\n\f\u000f\u000e\u0011\u0010\ntime, and LMDS runs in\u0003\nLMDS is feasible precisely because we expect the data to have a low-dimensional embed-\nding. The \ufb01rst step is to apply classical MDS to the landmark points only, embedding them\n\u0002\u0017\u0015 by using its known dis-\n\u0002\u0016\u0015 . Each remaining point\nfaithfully in\ntances from the landmark points as constraints. This is analogous to the Global Positioning\nSystem technique of using a \ufb01nite number of distance readings to identify a geographic\n\u0001 and the landmarks are in general position, then there are enough\nuniquely. The landmark points may be chosen randomly, with \n\nlocation. If\n\u0019\u0018\u001b\u001a\ntaken to be suf\ufb01ciently larger than the minimum\u001a\n\ncan now be located in\n\nconstraints to locate\n\nto ensure stability.\n\n, since\n.\n\n\u0007\u0011\n\u0012\n\n\u0012\u0006\u0013\n\u0015\b\n\n3.1 The Landmark MDS algorithm\n\nde\ufb01ned by the formula\n\nLMDS begins by applying classical MDS [9,10] to the landmarks-only distance matrix\n\nis the \u201ccentering\u201d matrix\n. Next \ufb01nd the eigenvalues and eigenvectors of\nfor\nthe corresponding eigenvectors (written as column vectors); non-positive eigenvalues are\n\n. We recall the procedure. The \ufb01rst step is to construct an \u201cinner-product\u201d matrix \u001c\u001d\u000e\n\f\u000f\u000e\nis the matrix of squared distances and\u001e\n\n\u001f\u001e\n\u0002\n\u0007 ; here \"\u000e\n\u000e! \"\u000e\nfor the positive eigenvalues (labelled so that \u0015('\n\u0015\u0011, ), and-\n\u001c&\u000e\n\u001f\"\u0013\n. Write \u0015\n\u00180/\nthe required optimal\u001a -dimensional embedding vectors are given\nignored. Then for\u001a\n\nas the columns of the matrix:\n\n\u0018*)\u0014)+)\u0007\u0018\n\n\u001a%$\u0012\u0013\n\n\u001c#\u001e\n\n\u0001\u0003\u0002\n\n;=<\n\n\u00156'87\n...\n\n.:9\n.\u00119\n\nThe embedded data are automatically mean-centered with principal components aligned\nwith the axes, most signi\ufb01cant \ufb01rst.\ndimensional embedding is perfect; otherwise there is no exact Euclidean embedding.\n\nThe second stage of LMDS is to embed the remaining points in\ncolumn vector of squared distances between a data point\n\nand the landmark points. The\n\nIf \u001c?\u000e has no negative eigenvalues, then the / -\n\u0002@\u0015 . Let \"A denote the\nA by the formula:\n\u001c\u0011D\nis the pseudoinverse transpose of1\n.:9\n.:9\n...\n.\u00119\n\n \"A&\u001f\n\u00156'\n\n:\n\nembedding vector -\nwhere D\n\nis the column mean of \n\nis related linearly to \n1CB\n\u000e and1\n1CB\n\n\u000b\n\n\u001f\n\u001c\n\u0013\n\u001f\n\u0007\n\u000b\n\u001f\n\f\n\u0001\n\u000b\n\f\n\u0001\n\f\n\u0001\n\u0001\n\u001c\n\u001f\n\u001f\n\u0011\n\u001a\n\u0011\n\u001a\n\u0001\n\u001a\n\u001e\n\u000e\n\u000e\n\u000e\n\u0005\n\u0005\n\n\u0013\n\u0018\n\u0015\n\u0013\n.\n\u0013\n1\n\u001a\n2\n3\n3\n3\n4\n5\n-\n'\n5\n\u0015\n\u0013\n7\n-\n.\n9\n\u0013\n5\n\u0015\n\u0015\n7\n-\n\u0015\n<\n<\n>\n\u0011\n\u0011\n-\n\u0011\n\u001a\n\u0001\n\u0007\n \n\u000e\n\n \n\u000e\nB\n\u001a\n2\n3\n3\n3\n4\n-\n'\n\u0002\n5\n-\n\u0013\n\u0002\n5\n\u0015\n\u0013\n-\n\u0015\n\u0002\n5\n\u0015\n\u0015\n;\n<\n<\n<\n>\n\fOriginal points\n\nL\u2212Isomap: k=8\n20 landmarks\n\nL\u2212Isomap: k=8\n10 landmarks\n\nL\u2212Isomap: k=8\n\n4 landmarks\n\nL\u2212Isomap: k=8\n\n3 landmarks\n\nSwiss roll embedding\n\nLLE: k=18\n\nLLE: k=14\n\nLLE: k=10\n\nLLE: k=6\n\nFigure 3: L-Isomap is stable over a wide range of values for the sparseness parameter \n\n(the number of landmarks). Results from LLE are shown for comparision.\n\nThe \ufb01nal (optional) stage is to use PCA to realign the data with the coordinate axes. A full\ndiscussion of LMDS will appear in [11]. We note two results here:\n\n1. If\n\nis a landmark point, then the embedding given by LMDS is consistent with\n\nthe original MDS embedding.\n\nuration in\n\n2. If the distance matrix\n\ncan be represented exactly by a Euclidean con\ufb01g-\n\n\u0002\u0016\u0015 , and if the landmarks are chosen so that their af\ufb01ne span in that\ncon\ufb01guration is\u001a -dimensional (i.e. in general position), then LMDS will recover\n\nthe con\ufb01guration exactly, up to rotation and translation.\n\n\u000e\u0011\u0010\n\nA good way to satisfy the af\ufb01ne span condition is to pick\u001a\n\nlandmarks randomly, plus\na few extra for stability. This is important for Isomap, where the distances are inherently\nslightly noisy. The robustness of LMDS to noise depends on the matrix norm\n\nis very small, then all the landmarks lie close to a hyperplane and LMDS\nperforms poorly with noisy data. In practice, choosing a few extra landmark points gives\nsatisfactory results.\n\n\u0015 . If \u0015\n\n3.2 Example\n\nFigure 3, shows the results of testing L-Isomap on a Swiss roll data set. 2000 points were\ngenerated uniformly in a rectangle (top left) and mapped into a Swiss roll con\ufb01guration\nin\nneighbourhood parameter is not too large (in this case\n\n\u0002\u0017\u0005 . Ordinary Isomap recovers the rectangular structure correctly provided that the\nthat this peformance is not signi\ufb01cantly degraded when L-Isomap is used. For each \n\nchose\n\n\t works). The tests show\n, we\nlandmark points at random; even down to 4 landmarks the embedding closely ap-\nproximates the (non-landmark) Isomap embedding. The con\ufb01guration of three landmarks\nwas chosen especially to illustrate the af\ufb01ne distortion that may arise if the landmarks lie\nclose to a subspace (in this case, a line). For three landmarks chosen at random, results are\ngenerally much better.\n\nIn contrast, LLE is unstable under changes in its sparseness parameter\nsize). To be fair,\nparameter for LLE. In L-Isomap, these two roles are separately ful\ufb01lled by\n\n(neighbourhood\nis principally a topological parameter and only incidentally a sparseness\n\n.\n\nand\n\n\u0011\n\f\n\u0001\n\u001a\n\u0001\n\n1\nB\n\n\u001a\n\u0001\n\u0002\n5\n\u0015\n\u0015\n\u0007\n\u001a\n\u0007\n\u0007\n\u0007\n\f4 Conclusion\n\nLocal approaches to nonlinear dimensionality reduction such as LLE or Laplacian Eigen-\nmaps have two principal advantages over a global approach such as Isomap: they tolerate a\ncertain amount of curvature and they lead naturally to a sparse eigenvalue problem. How-\never, neither curvature tolerance nor computational sparsity are explicitly part of the for-\nmulation of the local approaches; these features emerge as byproducts of the goal of trying\nto preserve only the data\u2019s local geometric structure. Because they are not explicit goals but\nonly convenient byproducts, they are not in fact reliable features of the local approach. The\nconformal invariance of LLE can fail in sometimes surprising ways, and the computational\nsparsity is not tunable independently of the topological sparsity of the manifold. In con-\ntrast, we have presented two extensions to Isomap that are explicitly designed to remove\na well-characterized form of curvature and to exploit the computational sparsity intrinsic\nto low-dimensional manifolds. Both extensions are amenable to algorithmic analysis, with\nprovable conditions under which they return accurate results; and they have been tested\nsuccessfully on challenging data sets.\n\nAcknowledgments\n\nThis work was supported in part by NSF grant DMS-0101364, and grants from Schlum-\nberger, MERL and the DARPA Human ID program. The authors wish to thank Thomas\nVetter for providing the range and texture maps for the synthetic face; and Lauren Schmidt\nfor her help in rendering the actual images using Curious Labs\u2019 \u201cPoser\u201d software.\n\nReferences\n\n[1] Tenenbaum, J.B., de Silva, V. & Langford, J.C (2000) A global geometric framework for nonlinear\ndimensionality reduction. Science 290: 2319\u20132323.\n\n[2] Roweis, S. & Saul, L. (2000) Nonlinear dimensionality reduction by locally linear embedding.\nScience 290: 2323\u20132326.\n\n[3] Belkin, M. & Niyogi, P. (2002) Laplacian eigenmaps and spectral techniques for embedding and\nclustering. In T.G. Dietterich, S. Becker and Z. Ghahramani (eds.), Advances in Neural Information\nProcessing Systems 14. MIT Press.\n\n[4] Bishop, C., Svensen, M. & Williams, C. (1998) GTM: The generative topographic mapping.\nNeural Computation 10(1).\n\n[5] Kohonen, T. (1984) Self Organisation and Associative Memory. Springer-Verlag, Berlin.\n\n[6] Bregler, C. & Omohundro, S.M. (1995) Nonlinear image interpolation using manifold learning. In\nG. Tesauro, D.S. Touretzky & T.K. Leen (eds.), Advances in Neural Information Processing Systems\n7: 973\u2013980. MIT Press.\n\n[7] DeMers, D. & Cottrell, G. (1993) Non-linear dimensionality reduction In S. Hanson, J. Cowan &\nL. Giles (eds.), Advances in Neural Information Processing Systems 5: 580\u2013590. Morgan-Kaufmann.\n\n[8] Bernstein, M., de Silva, V., Langford, J.C. & Tenenbaum, J.B. (December 2000) Graph ap-\nproximations to geodesics on embedded manifolds. Preprint may be downloaded at the URL:\nhttp://isomap.stanford.edu/BdSLT.pdf\n\n[9] Torgerson, W.S. (1958) Theory and Methods of Scaling. Wiley, New York.\n\n[10] Cox, T.F. & Cox M.A.A. (1994) Multidimensional Scaling. Chapman & Hall, London.\n\n[11] de Silva, V. & Tenenbaum, J.B. (in preparation) Sparse multidimensional scaling using landmark\npoints.\n\n\f", "award": [], "sourceid": 2141, "authors": [{"given_name": "Vin", "family_name": "Silva", "institution": null}, {"given_name": "Joshua", "family_name": "Tenenbaum", "institution": null}]}