{"title": "Convergence of Laplacian Eigenmaps", "book": "Advances in Neural Information Processing Systems", "page_first": 129, "page_last": 136, "abstract": null, "full_text": "Convergence of Laplacian Eigenmaps\n\nMikhail Belkin\n\nDepartment of Computer Science\n\nOhio State University\nColumbus, OH 43210\n\nPartha Niyogi\n\nDepartment of Computer Science\n\nThe University of Chicago\n\nHyde Park, Chicago, IL 60637.\n\nmbelkin@cse.ohio-state.edu\n\nniyogi@cs.uchicago.edu\n\nAbstract\n\nGeometrically based methods for various tasks of machine learning have\nattracted considerable attention over the last few years. In this paper we\nshow convergence of eigenvectors of the point cloud Laplacian to the eigen-\nfunctions of the Laplace-Beltrami operator on the underlying manifold, thus\nestablishing the \ufb01rst convergence results for a spectral dimensionality re-\nduction algorithm in the manifold setting.\n\n1 Introduction\n\nThe last several years have seen signi\ufb01cant activity in geometrically motivated approaches\nto data analysis and machine learning. The unifying premise behind these methods is\nthe assumption that many types of high-dimensional natural data lie on or near a low-\ndimensional manifold. Collectively this class of learning algorithms is often referred to as\nmanifold learning algorithms. Some recent manifold algorithms include Isomap [14] and\nLocally Linear Embedding (LLE) [13].\n\nIn this paper we provide a theoretical analysis for the Laplacian Eigenmaps introduced in [2],\na framework based on eigenvectors of the graph Laplacian associated to the point-cloud data.\nMore speci\ufb01cally, we prove that under certain conditions, eigenvectors of the graph Laplacian\nconverge to eigenfunction of the Laplace-Beltrami operator on the underlying manifold.\nWe note that in mathematics the manifold Laplacian is a classical object of di\ufb00erential\ngeometry with a rich tradition of inquiry. It is one of the key objects associated to a general\ndi\ufb00erentiable Riemannian manifold. Indeed, several recent manifold learning algorithms are\nclosely related to the Laplacian. The eigenfunction of the Laplacian are also eigenfunctions\nof heat di\ufb00usions, which is the point of view explored by Coifman and colleagues at Yale\nUniversity in a series of recent papers on data analysis (e.g., [6]). Hessian Eigenmaps\napproach which uses eigenfunctions of the Hessian operator for data representation was\nproposed by Donoho and Grimes in [7]. Laplacian is the trace of the Hessian. Finally, as\nobserved in [2], the cost function that is minimized to obtain the embedding of LLE is an\napproximation to the squared Laplacian.\n\nIn the manifold learning setting, the underlying manifold is usually unknown. Therefore\nfunctional maps from the manifold need to be estimated using point cloud data. The com-\nmon approximation strategy in these methods is to construct an adjacency graph associated\nto a point cloud. The underlying intuition is that since the graph is a proxy for the manifold,\ninference based on the structure of the graph corresponds to the desired inference based on\nthe geometric structure of the manifold. Theoretical results to justify this intuition have\nbeen developed over the last few years. Building on recent results on functional convergence\nof approximation for the Laplace-Beltrami operator using heat kernels and results on consis-\ntency of eigenfunctions for empirical approximations of such operators, we show convergence\nof the Laplacian Eigenmaps algorithm. We note that in order to prove convergence of a\n\n\fspectral method, one needs to demonstrate convergence of the empirical eigenvalues and\neigenfunctions. To our knowledge this is the \ufb01rst complete convergence proof for a spectral\nmanifold learning method.\n\n1.1 Prior and Related Work\n\nThis paper relies on results obtained in [3, 1] for functional convergence of operators. It\nturns out, however, that considerably more careful analysis is required to ensure spectral\nconvergence, which is necessary to guarantee convergence of the corresponding algorithms.\nTo the best of our knowledge previous results are not su\ufb03cient to guarantee convergence\nfor any spectral method in the manifold setting.\n\nLafon in [10] generalized pointwise convergence results from [1] to the important case of\nan arbitrary probability distribution on the manifold. We also note [4], where a similar\nresult is shown for the case of a domain in Rn. Those results were further generalized and\npresented with an empirical pointwise convergence theorem for the manifold case in [9]. We\nobserve that the arguments in this paper are likely to allow one to use these results to show\nconvergence of eigenfunctions for a wide class of probability distributions on the manifold.\nEmpirical convergence of spectral clustering for a \ufb01xed kernel parameter t was analyzed\nin [11] and is used in this paper. However the geometric case requires t \u2192 0. The results in\nthis paper as well as in [3, 1] are for the case of a uniform probability distribution on the\nmanifold. Recently [8] provided deeper probabilistic analysis in that case.\n\nFinally we point out that while the analogies between the geometry of manifolds and the ge-\nometry of graphs are well-known in spectral graph theory and in certain areas of di\ufb00erential\ngeometry (see, e.g., [5]) the exact nature of that parallel is usually not made precise.\n\n2 Main Result\n\nThe main result of this paper is to show convergence of eigenvectors of graph Laplacian\nassociated to a point cloud dataset to eigenfunctions of the Laplace-Beltrami operator when\nthe data is sampled from a uniform probability distribution on an embedded manifold.\nIn what follows we will assume that the manifold M is a compact in\ufb01nitely di\ufb00erentiable\nRiemannian submanifold of RN without boundary. Recall now that the Laplace-Beltrami\noperator \u2206 on M is a di\ufb00erential operator \u2206 : C 2 \u2192 L2 de\ufb01ned as\n\n\u2206f = \u2212 div (\u2207f )\n\nwhere \u2207f is the gradient vector \ufb01eld and div denotes divergence.\n\u2206 is a positive semi-de\ufb01nite self-adjoint operator and has a discrete spectrum on a compact\nmanifold. We will generally denote its ith smallest eigenvalue by \u03bbi and the corresponding\neigenfunction by ei. See [12] for a thorough introduction to the subject.\nWe de\ufb01ne the operator Lt : L2(M) \u2192 L2(M) as follows (\u00b5 is the standard measure):\n\nLt(f )(p) = (4\u03c0t)\u2212 k+2\n\ne\u2212 kp\u2212qk2\n\n4t\n\nf (p) d\u00b5q \u2212ZM\n\ne\u2212 kp\u2212qk2\n\n4t\n\nf (q) d\u00b5q(cid:19)\n\nIf xi are the data points, the corresponding empirical version is given by\ne\u2212 kp\u2212xik2\n\ne\u2212 kp\u2212xik2\n\n(4\u03c0t)\u2212 k+2\n\n\u02c6Lt\n\nn(f )(p) =\n\n4t\n\nf (p) \u2212Xi\n\n4t\n\nf (xi)!\n\nThe operator \u02c6Lt\nn is (the extension of) the point cloud Laplacian that forms the basis of\nthe Laplacian Eigenmaps algorithm for manifold learning. It is easy to see that it acts by\nmatrix multiplication on functions restricted to the point cloud, with the matrix being the\ncorresponding graph Laplacian. We will assume that xi are randomly i.i.d. sampled from\nM according to the uniform distribution.\nOur main theorem shows that that there is a way to choose a sequence tn, such that the\neigenfunctions of the empirical operators \u02c6Ltn\nn converge to the eigenfunctions of the Laplace-\nBeltrami operator \u2206 in probability.\n\n2 (cid:18)ZM\n Xi\n\n2\n\nn\n\n\fn,i be the ith eigenvalue of \u02c6Lt\n\nTheorem 2.1 Let \u03bbt\nn,i be the corresponding eigenfunc-\ntion (which, for each \ufb01xed i, will be shown to exist for t su\ufb03ciently small). Let \u03bbi and ei\nbe the corresponding eigenvalue and eigenfunction of \u2206 respectively. Then there exists a\nsequence tn \u2192 0, such that\n\nn and et\n\n\u03bbtn\nn,i = \u03bbi\n\nlim\nn\u2192\u221e\nn\u2192\u221eketn\nn,i(x) \u2212 ei(x)k2 = 0\n\nlim\n\nwhere the limits are in probability.\n\n3 Overview of the proof\n\nThe proof of the main theorem consists of two main parts. One is spectral convergence of\nthe functional approximation Lt to \u2206 as t \u2192 0 and the other is spectral convergence of the\nempirical approximation \u02c6Lt\nn to Lt as the number of data points n tends to in\ufb01nity. These\ntwo types of convergence are then put together to obtain the main Theorem 2.1.\n\nPart 1. The more di\ufb03cult part of the proof is to show convergence of eigenvalues and\neigenfunctions of the functional approximation Lt to those of \u2206 as t \u2192 0. To demonstrate\nconvergence we will take a di\ufb00erent functional approximation 1\u2212Ht\nof \u2206, where Ht is the\nheat operator. While 1\u2212Ht\ndoes not converge uniformly to \u2206 they share an eigenbasis and\nfor each \ufb01xed i the ith eigenvalue of 1\u2212Ht\nconverges to the ith eigenvalue of \u2206. We will then\nconsider the operator Rt = 1\u2212Ht\nt \u2212 Lt. A careful analysis of this operator, which constitutes\nthe bulk of the proof paper, shows that Rt is a small relatively bounded perturbation of 1\u2212Ht\n,\nin the sense that for any function f we have\nfk2 \u226a 1 as t \u2192 0. This will imply spectral\n\nkRtfk2\nk 1\u2212Ht\n\nt\n\nt\n\nt\n\nt\n\nt\n\nconvergence and lead to the following\n\nTheorem 3.1 Let \u03bbi, \u03bbt\nfunctions of \u2206 and Lt respectively. Then\n\ni, ei, et\n\ni be the ith smallest eigenvalues and the corresponding eigen-\n\nlim\n\nt\u21920|\u03bbi \u2212 \u03bbt\nt\u21920kei \u2212 et\n\nlim\n\ni| = 0\nik2 = 0\n\nPart 2. The second part is to show that the eigenfunctions of the empirical operator \u02c6Lt\nn\nconverge to eigenfunctions of Lt as n \u2192 \u221e in probability. That result follows readily from\nthe previous work in [11] together with the analysis of the essential spectrum of Lt. The\nfollowing theorem is obtained:\n\nTheorem 3.2 For a \ufb01xed su\ufb03ciently small t, let \u03bbt\nand Lt respectively. Let et\n\nn,i and et\n\ni be the corresponding eigenfunctions. Then\n\nn,i and \u03bbt\n\ni be the ith eigenvalue of \u02c6Lt\n\nn\n\n\u03bbt\nn,i = \u03bbt\n\ni\n\nlim\nn\u2192\u221e\nn,i(x) \u2212 et\n\ni(x)k2 = 0\n\nlim\n\nn\u2192\u221eket\n\nassuming that \u03bbt\n\ni \u2264 1\n\n2t . The convergence is almost sure.\n\nObserve that this implies convergence for any \ufb01xed i as soon as t is su\ufb03ciently small.\n\nSymbolically these two theorems can be represented by top line of the following diagram:\n\nEig \u02c6Lt\nn\n\nn \u2192 \u221e\n\n............................................................................................................................................................................... ............\n............................................................................................................................................................................... ............\n\nprobabilistic\n\nEig Lt\n\nt \u2192 0\n\n............................................................................................................................................................................... ............\n............................................................................................................................................................................... ............\ndeterministic\n\n.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................\n\n............\n\nEig \u2206\n\nn \u2192 \u221e tn \u2192 0\n\n\fAfter demonstrating two types of convergence results in the top line of the diagram a simple\nargument shows that a sequence tn can be chosen to guarantee convergence as in the \ufb01nal\nTheorem 2.1 and provides the bottom arrow.\n\n4 Spectral Convergence of Functional Approximations.\n\n4.1 Main Objects and the Outline of the Proof\n\nLet M be a compact smooth smoothly embedded k-dimensional manifold in RN with the\ninduced Riemannian structure and the corresponding induced measure \u00b5.\nAs above, we de\ufb01ne the operator Lt : L2(M) \u2192 L2(M) as follows:\n\nLt(f )(x) = (4\u03c0t)\u2212 k+2\n\n2 (cid:18)ZM\n\ne\u2212 kx\u2212yk2\n\n4t\n\nf (x) d\u00b5y \u2212ZM\n\ne\u2212 kx\u2212yk2\n\n4t\n\nf (y) d\u00b5y(cid:19)\n\nAs shown in previous work, this operator serves as a functional approximation to the\nLaplace-Beltrami operator on M. The purpose of this paper is to extend the previous results\nto the eigenvalues and eigenfunctions, which turn out to need some careful estimates.\n\nWe start by reviewing certain properties of the Laplace-Beltrami operator and its connection\nto the heat equation. Recall that the heat equation on the manifold M is given by\n\n\u2206h(x, t) =\n\n\u2202h(x, t)\n\n\u2202t\n\nwhere h(x, t) is the heat at time t at point x. Let f (x) = h(x, 0) be the initial heat\ndistribution. We observe that from the de\ufb01nition of the derivative\n\n\u2206f = lim\nt\u21920\n\n1\nt\n\n(h(x, t) \u2212 f (x))\n\nIt is well-known (e.g., [12]) that the solution to the heat equation at time t can be written\nas\n\nHtf (x) := h(x, t) =ZM\n\nH t(x, y)f (y)d\u00b5y\n\nt\n\n1\u2212Ht\n\nand that eigenfunctions of Ht and hence eigenfunction of 1\u2212Ht\n\nHere Ht is the heat operator and H t(x, y) is the heat kernel of M. It is also well-known\nthat the heat operator Ht can be written as Ht = e\u2212t\u2206. We immediately see that \u2206 =\nlimt\u21920\ncoincide with\nis equal to 1\u2212e\u2212t\u03bbi\neigenfunctions of the Laplace operator. The ith eigenvalue of 1\u2212Ht\n,\nwhere \u03bbi as usual is the ith eigenvalue of \u2206.\nIt is easy to observe that once the heat kernel H t(x, y) is known, \ufb01nding the Laplace operator\nposes no di\ufb03culty:\n\nt\n\nt\n\nt\n\n\u2206f = lim\nt\u21920\n\n1\n\nt (cid:18)f (x) \u2212ZM\n\nH t(x, y)f (y) d\u00b5y(cid:19) = lim\n\nt\u21920(cid:18) 1 \u2212 Ht\n\nt (cid:19) f\n\n(1)\n\nReconstructing the Laplacian from a point cloud is possible because of the fundamental fact\nthat the manifold heat kernel H t(x, y) can be approximated by the ambient space Gaussian\nand hence Lt is an approximation to 1\u2212Ht\nand can be shown to converge for a \ufb01xed f to\n\u2206. This pointwise operator convergence is discussed in [10, 3, 1].\n\nt\n\nTo obtain convergence of eigenfunctions, however, one typically needs the stronger uniform\nIf An is a sequence of operators, we say that An \u2192 A uniformly in L2 if\nconvergence.\nsupkfk2=1 kAnf \u2212 Afk2 \u2192 0. This is su\ufb03cient for convergence of eigenfunctions and other\nspectral properties.\n\nIt turns out that this type of convergence does not hold for functional approximation Lt\nas t \u2192 0, which presents a serious technical obstruction to proving convergence of spectral\nproperties. To observe that Lt does not converge uniformly to \u2206, observe that while 1\u2212Ht\n\nt\n\n\fconverges to \u2206 for each \ufb01xed function f , even this convergence is not uniform.\nIndeed,\nfor a small t, we can always choose a su\ufb03ciently large \u03bbi \u226b 1/t and the corresponding\neigenfunction ei of \u2206, s.t.\n\n(cid:18) 1 \u2212 Ht\n(cid:13)(cid:13)(cid:13)(cid:13)\n\nt\n\n1\nt\n\n\u2212 \u2206(cid:19) ei(cid:13)(cid:13)(cid:13)(cid:13)2\n\nt\n\n=(cid:12)(cid:12)(cid:12)(cid:12)\n\n(1 \u2212 e\u2212t\u03bbi) \u2212 \u03bbi(cid:12)(cid:12)(cid:12)(cid:12)\n\n\u2248(cid:12)(cid:12)(cid:12)(cid:12)\n\n1\n\nt \u2212 \u03bbi(cid:12)(cid:12)(cid:12)(cid:12)\n\n\u226b 1\n\nSince Lt is an approximation to 1\u2212Ht\n, uniform convergence cannot be expected and the\nstandard perturbation theory techniques do not apply. To overcome this obstacle we need\nthe two following key ingredients:\nObservation 1. Eigenfunctions of 1\u2212Ht\n\ncoincide with eigenfunctions of \u2206.\n\nt\n\nObservation 2. Lt is a small relatively bounded perturbation of 1\u2212Ht\n\nt\n\n.\n\nWhile the \ufb01rst of these observations is immediate, the second is the technical core of this\nwork. The relative boundedness of the perturbation will imply convergence of eigenfunctions\nof Lt to those of 1\u2212Ht\n\nand hence, by the Observation 1, to eigenfunctions of \u2206.\n\nt\n\nWe now de\ufb01ne the perturbation operator\n\nRt =\n\n1 \u2212 Ht\n\nt\n\n\u2212 Lt\n\nThe relative boundedness of the self-adjoint perturbation operator Rt is formalized as fol-\nlows:\n\nTheorem 4.1 For any 0 < \u01eb < 2\nsmall\n\nk+2 there exists a constant C, such that for all t su\ufb03ciently\n\nIn particular\n\nlim\nt\u21920\nand hence Rt is dominated by 1\u2212Ht\n\nt\n\n|hRtf, fi|\nh 1\u2212Ht\n\nt\n\nk+2\n\n2 \u01eb(cid:17)\n\n2\n\nk+2 \u2212\u01eb, t\n\nf, fi \u2264 C max(cid:16)t\nhRtf, fi\nh 1\u2212Ht\nf, fi\n\nsup\nkfk2=1\non L2 as t tends to 0.\n\nt\n\n= 0\n\nThis result implies that for small values of t, bottom eigenvalues and eigenfunction of Lt\nare close to those of 1\u2212Ht\n, which in turn implies convergence. To establish this result, we\nwill need two key estimates on the size of the perturbation Rt in two di\ufb00erent norms.\n\nt\n\nProposition 4.2 Let f \u2208 L2. There exists C \u2208 R, such that for all su\ufb03ciently small values\nof t\n\nkRtfk2 \u2264 Ckfk2\n\n+1, where H\nProposition 4.3 Let f \u2208 H\nsuch that for all su\ufb03ciently small values of t\n\nk\n2\n\nk\n2\n\n+1 is a Sobolev space. Then there is C \u2208 R,\n\nkRtfk2 \u2264 C\u221atkfkH\n\nk\n2\n\n+1\n\nIn what follows we give the proof of the Theorem 4.1 assuming the two Propositions above.\nThe proof of the Propositions requires technical estimates of the heat kernel and can be\nfound the longer version of the paper enclosed.\n\n4.2 Proof of Theorem 4.1.\n\nLemma 4.4 Let e be an eigenvector of \u2206 with the eigenvalue \u03bb. Then for some universal\nconstant C\n\nkekH\n\nk\n2\n\n+1 \u2264 C\u03bb\n\nk+2\n\n4\n\n(2)\n\n\fThe details can be found in the long version. Now we can proceed with the\n\nProof: [Theorem 4.1]\n\nLet ei(x) be the ith eigenfunction of \u2206 and let \u03bbi be the corresponding eigenvalue. Recall\nthat ei form an orthonormal basis of L2(M). Thus any function f \u2208 L2(M) can be written\nuniquely as f (x) =P\u221ei=0 aiei(x) whereP a2\ni < \u221e. For technical resons we will assume that\nall our functions are perpendicular to the constant and the lowest eigenvalue is nonzero.\n\nRecall also that\n\nHtf = exp(\u2212t\u2206)f,\n\nHtei = exp(\u2212t\u03bbi)ei,\n\n1 \u2212 Ht\n\nt\n\nei =\n\n1 \u2212 e\u2212\u03bbit\n\nt\n\nei\n\n(3)\n\nNow let us \ufb01x t and consider the function \u03c6(x) = 1\u2212e\u2212xt\nthat \u03c6 is a concave and increasing function of x.\nPut x0 = 1/\u221at. We have:\n\nt\n\nfor positive x. It is easy to check\n\n\u03c6(0) = 0\n\n\u03c6(x0) =\n\n1 \u2212 e\u2212\u221at\n\nt\n\n\u03c6(x0)\n\nx0\n\n=\n\n1 \u2212 e\u2212\u221at\n\n\u221at\n\nSplitting the positive real line in two intervals [0, x0], [x0,\u221e) and using concavity and\nmonotonicity we observe that\n\n\u03c6(x) \u2265 min 1 \u2212 e\u2212\u221at\n\u221at\n\nx,\n\n1 \u2212 e\u2212\u221at\n\nt\n\n!\n\n1\u2212e\u2212\u221at\n\nNote that limt\u21920\nTherefore for t su\ufb03ciently small\n\n= 1.\n\n\u221at\n\n\u03c6(x) \u2265 min(cid:18) 1\n\n2\n\nx,\n\n1\n\n2\u221at(cid:19)\n\nThus\n\n(cid:28) 1 \u2212 Ht\n\nt\n\nei, ei(cid:29) =\n\n1 \u2212 e\u2212\u03bbit\n\nt\n\n\u2265\n\n1\n2\n\nmin(cid:18)\u03bbi,\n\n1\n\n\u221at(cid:19)\n\n(4)\n\nNow take f \u2208 L2, f (x) = P\u221e1 aiei(x). Without a loss of generality we can assume that\nkfk2 = 1. Taking \u03b1 > 0, we split f as a sum of f1 and f2 as following:\n\naiei,\n\nf1 = X\u03bbi\u2264\u03b1\n\naiei\n\nf2 = X\u03bbi>\u03b1\n\nIt is clear that f = f1 + f2 and, since f1 and f2 are orthogonal, kfk2\nwill now deal separately with f1 and with f2.\n\n2 = kf1k2\n\n2 + kf2k2\n\n2. We\n\nFrom the inequality (4) above, we observe that\n\n(cid:28) 1 \u2212 Ht\n\nt\n\nf, f(cid:29) \u2265\n\n1\n2\n\n\u03bb1\n\nOn the other hand, from the inequality (2), we see that if ei is a basis element present in\nthe basis expansion of f1,\n\nSince \u2206 acts by rescaling basis elements, we have kf1kH\nTherefore by Proposition 4.3 for t su\ufb03ciently small and some constant C\u2032\n\n+1 \u2264 C\u03b1\n\n4 .\n\nk\n2\n\nk+2\n\nk\n2\n\n+1\n\nkeik\n\nH \u2264 C\u03b1\n\nk+2\n\n4\n\nkRtf1k2 \u2264 C\u2032\u221at\u03b1\n\nk+2\n\n4\n\n(5)\n\n\fHence we see that\n\nkRtf1k2\nh 1\u2212Ht\n\nf, fi \u2264\n\nt\n\n\u221at \u03b1\n\nk+2\n\n4\n\n2C\u2032\n\u03bb1\n\n(6)\n\nConsider now the second summand f2. Recalling that f2 only has basis components with\neigenvalues greater than \u03b1 and using the inequality (4) we see that\n1\n\n(cid:28) 1 \u2212 Ht\n\nt\n\nf, f(cid:29) \u2265(cid:28) 1 \u2212 Ht\n\nt\n\nf2, f2(cid:29) \u2265\n\n1\n2\n\nmin(cid:18)\u03b1,\n\n\u221at(cid:19)kf2k2\n\n2\n\n(7)\n\nOn the other hand, by Proposition 4.2\n\nkRtf2k2 \u2264 C1kf2k2\n\n2\n\nThus\n\nFinally, collecting inequalities 6 and 9 we see:\n\nt\n\nt\n\n|hRtf2, f2i|\nf, fi \u2264 kRtf2k2\nh 1\u2212Ht\nh 1\u2212Ht\nf, fi \u2264 kRtf1k + kRtf2k\nf, fi\n\nf2, f2i \u2264 C\u20321 max(cid:18) 1\n\u2264 C(cid:18)max(cid:18) 1\n\nh 1\u2212Ht\n\n\u03b1\n\nt\n\n\u03b1\n\n|hRtf, fi|\nh 1\u2212Ht\n\nt\n\n,\u221at(cid:19)\n\n,\u221at(cid:19) + \u221at \u03b1\n\nwhere C is a constant independent of t and \u03b1.\nChoosing \u03b1 = t\u2212 2\n\n+\u01eb where 0 < \u01eb < 2\n\nk+2\n\nk+2 yields the desired result.\n\n(8)\n\n(9)\n\n(10)\n\n(cid:3)\n\nk+2\n\n4 (cid:19)\n\n5 Spectral Convergence of Empirical Approximation\n\nProposition 5.1 For t su\ufb03ciently small\n\nwhere SpecEss denotes the essential spectrum of the operator.\n\nSpecEss (Lt) \u2282(cid:18) 1\n\n2\n\nt\u22121,\u221e(cid:19)\n\nProof: As noted before Ltf is a di\ufb00erence of a multiplication operator and a compact\noperator\n\nwhere\n\nLtf (p) = g(p)f (p) \u2212 Kf\ne\u2212 kp\u2212qk2\n\ng(p) = (4\u03c0t)\u2212 k+2\n\n4t\n\n(11)\n\nd\u00b5q\n\n2 ZM\n\nand Kf is a convolution with a Gaussian. As noted in [11], it is a fact in basic perturbation\ntheory SpecEss (Lt) = rg g where rg g is the range of the function g : M \u2192 R. To estimate\nrg g observe \ufb01rst that\n\nWe thus see that for t su\ufb03ciently small\n\nand hence g(t) > 1\n\n2 t\u22121.\n\n(4\u03c0t)\u2212 k\n\nlim\nt\u2192\u221e\n\ne\u2212 kp\u2212qk2\n\n4t\n\nd\u00b5q = 1\n\n2 ZM\n2 ZM\n\n(4\u03c0t)\u2212 k\n\ne\u2212 kp\u2212yk2\n\n4t\n\nd\u00b5y >\n\n1\n2\n\n(cid:3)\n\nLemma 5.2 Let et be an eigenfunction of Lt, Ltet = \u03bbtet, \u03bbt < 1\n\n2 t\u22121. Then et \u2208 C\u221e.\n\nWe see that Theorem 3.2 follows easily:\n\nProof: [Theorem 3.2] By the Proposition 5.1 we see that the part of the spectrum of Lt\nbetween 0 and 1\n2 t\u22121 is discrete. It is a standard fact of functional analysis that such points\nare eigenvalues and there are corresponding eigenspaces of \ufb01nite dimension. Consider now\ni \u2208 [0, 1\n\u03bbt\ni. The Theorem 4 then follows from\nTheorem 23 and Proposition 25 in [11], which show convergence of spectral properties for\nthe empirical operators.\n(cid:3)\n\n2 t\u22121] and the corresponding eigenfunction et\n\n\f6 Main Theorem\n\nWe are \ufb01nally in position to prove the main Theorem 4.1: Proof: [Theorem 4.1] From\nTheorems 3.2 and 3.1 we obtain the following convergence results:\n\nEig \u02c6Lt\nn\n\n..................................................................................................................................... ............\n\nn \u2192 \u221e\n\nEig Lt\n\n..................................................................................................................................... ............\n\nt \u2192 0\n\nEig \u2206\n\nwhere the \ufb01rst convergence is almost surely for \u03bbi \u2264 1\nwe can choose t\u2032 < 2\u03bb\u22121\nusing the \ufb01rst arrow, we see that\n\n, s.t. for all t < t\u2032 we have kei \u2212 et\n2o = 0\nThus for any p > 0 and for each t there exists an N , s.t. P{ket\nn,i \u2212 eik2 > \u01eb} < p Inverting\nthis relationship, we see that for any N and for any probability p(N ) there exists a tN , s.t.\n\n2 t\u22121. Given any i \u2208 N and any \u01eb > 0,\n2 . On the other hand, by\n\nn,i \u2212 et\n\nik2 \u2265\n\nlim\nn\u2192\u221e\n\nPnket\n\ni\n\nik2 < \u01eb\n\n\u01eb\n\nn,i \u2212 eik2 > \u01eb} < p(N )\nMaking p(N ) tend to zero, we obtain convergence in probability.\n\n\u2200n>N P{ketN\n\n(cid:3)\n\nReferences\n\n[1] M. Belkin, Problems of Learning on Manifolds, Univ. of Chicago, Ph.D. Diss., 2003.\n[2] M. Belkin, P. Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and\n\nClustering, NIPS 2001.\n\n[3] M. Belkin, P. Niyogi, Towards a Theoretical Foundation for Laplacian-Based Manifold\n\nMethods, COLT 2005.\n\n[4] O. Bousquet, O. Chapelle, M. Hein, Measure Based Regularization, NIPS 2003.\n[5] F. R. K. Chung. (1997). Spectral Graph Theory. Regional Conference Series in Mathe-\n\nmatics, number 92.\n\n[6] R.R.Coifman, S. Lafon, A. Lee, M. Maggioni, B. Nadler, F. Warner and S. Zucker,\nGeometric di\ufb00usions as a tool for harmonic analysis and structure de\ufb01nition of data,\nsubmitted to the Proceedings of the National Academy of Sciences (2004).\n\n[7] D. L. Donoho, C. E. Grimes, Hessian Eigenmaps: new locally linear embedding tech-\n\nniques for high-dimensional data, PNAS, vol. 100 pp. 5591-5596.\n\n[8] E. Gine, V. Kolchinski, Empirical Graph Laplacian Approximation of Laplace-Beltrami\n\nOperators: Large Sample Results, preprint.\n\n[9] M. Hein, J.-Y. Audibert, U. von Luxburg, From Graphs to Manifolds \u2013 Weak and\n\nStrong Pointwise Consistency of Graph Laplacians, COLT 2005.\n\n[10] S. Lafon, Di\ufb00usion Maps and Geodesic Harmonics, Ph.D.Thesis, Yale University, 2004.\n\n[11] U. von Luxburg, M. Belkin, O. Bousquet, Consistency of Spectral Clustering, Max\n\nPlanck Institute for Biological Cybernetics Technical Report TR 134, 2004.\n\n[12] S. Rosenberg, The Laplacian on a Riemannian Manifold, Cambridge Univ. Press, 1997.\n[13] Sam T. Roweis, Lawrence K. Saul. (2000). Nonlinear Dimensionality Reduction by\n\nLocally Linear Embedding, Science, vol 290.\n\n[14] J.B.Tenenbaum, V. de Silva, J. C. Langford. (2000). A Global Geometric Framework\n\nfor Nonlinear Dimensionality Reduction, Science, Vol 290.\n\n\f", "award": [], "sourceid": 2989, "authors": [{"given_name": "Mikhail", "family_name": "Belkin", "institution": null}, {"given_name": "Partha", "family_name": "Niyogi", "institution": null}]}