{"title": "Random Projections for Manifold Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 641, "page_last": 648, "abstract": "We propose a novel method for {\\em linear} dimensionality reduction of manifold modeled data. First, we show that with a small number $M$ of {\\em random projections} of sample points in $\\reals^N$ belonging to an unknown $K$-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number random projections required is linear in $K$ and logarithmic in $N$, meaning that $K<M\\ll N$. To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs.", "full_text": "Random Projections for Manifold Learning\n\nChinmay Hegde\nECE Department\nRice University\nch3@rice.edu\n\nMichael B. Wakin\nEECS Department\n\nUniversity of Michigan\n\nwakin@eecs.umich.edu\n\nRichard G. Baraniuk\n\nECE Department\nRice University\n\nrichb@rice.edu\n\nAbstract\n\nWe propose a novel method for linear dimensionality reduction of manifold mod-\neled data. First, we show that with a small number M of random projections of\nsample points in RN belonging to an unknown K-dimensional Euclidean mani-\nfold, the intrinsic dimension (ID) of the sample set can be estimated to high accu-\nracy. Second, we rigorously prove that using only this set of random projections,\nwe can estimate the structure of the underlying manifold. In both cases, the num-\nber of random projections required is linear in K and logarithmic in N , meaning\nthat K < M \u226a N . To handle practical situations, we develop a greedy algorithm\nto estimate the smallest size of the projection space required to perform manifold\nlearning. Our method is particularly relevant in distributed sensing systems and\nleads to signi\ufb01cant potential savings in data acquisition, storage and transmission\ncosts.\n\n1 Introduction\n\nRecently, we have witnessed a tremendous increase in the sizes of data sets generated and processed\nby acquisition and computing systems. As the volume of the data increases, memory and processing\nrequirements need to correspondingly increase at the same rapid pace, and this is often prohibitively\nexpensive. Consequently, there has been considerable interest in the task of effective modeling of\nhigh-dimensional observed data and information; such models must capture the structure of the\ninformation content in a concise manner.\n\nA powerful data model for many applications is the geometric notion of a low-dimensional man-\nifold. Data that possesses merely K \u201cintrinsic\u201d degrees of freedom can be assumed to lie on a\nK-dimensional manifold in the high-dimensional ambient space. Once the manifold model is iden-\nti\ufb01ed, any point on it can be represented using essentially K pieces of information. Thus, algorithms\nin this vein of dimensionality reduction attempt to learn the structure of the manifold given high-\ndimensional training data.\n\nWhile most conventional manifold learning algorithms are adaptive (i.e., data dependent) and non-\nlinear (i.e., involve construction of a nonlinear mapping), a linear, nonadaptive manifold dimen-\nsionality reduction technique has recently been introduced that employs random projections [1].\nConsider a K-dimensional manifold M in the ambient space RN and its projection onto a random\nsubspace of dimension M = CK log(N ); note that K < M \u226a N . The result of [1] is that the\npairwise metric structure of sample points from M is preserved with high accuracy under projection\nfrom RN to RM .\n\n\f(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: Manifoldlearningusing randomprojections. (a)Inputdataconsistingof1000imagesofashifted\ndisk,eachofsizeN = 64\u00d764 = 4096. (b)True\u03b81 and\u03b82 valuesofthesampleddata. (c,d)Isomapembedding\nlearnedfrom(c)originaldatain RN,and(d)arandomlyprojectedversionofthedatainto RM withM = 15.\n\nThis result has far reaching implications. Prototypical devices that directly and inexpensively ac-\nquire random projections of certain types of data (signals, images, etc.) have been developed [2, 3];\nthese devices are hardware realizations of the mathematical tools developed in the emerging area of\nCompressed Sensing (CS) [4, 5]. The theory of [1] suggests that a wide variety of signal processing\ntasks can be performed directly on the random projections acquired by these devices, thus saving\nvaluable sensing, storage and processing costs.\n\nThe advantages of random projections extend even to cases where the original data is available in\nthe ambient space RN . For example, consider a wireless network of cameras observing a scene. To\nperform joint image analysis, the following steps might be executed:\n\n1. Collate: Each camera node transmits its respective captured image (of size N ) to a central\n\nprocessing unit.\n\n2. Preprocess: The central processor estimates the intrinsic dimension K of the underlying\n\nimage manifold.\n\n3. Learn: The central processor performs a nonlinear embedding of the data points \u2013 for\ninstance, using Isomap [6] \u2013 into a K-dimensional Euclidean space, using the estimate of\nK from the previous step.\n\nIn situations where N is large and communication bandwidth is limited, the dominating costs will be\nin the \ufb01rst transmission/collation step. On the one hand, to reduce the communication needs one may\nperform nonlinear image compression (such as JPEG) at each node before transmitting to the central\nprocessing. But this requires a good deal of processing power at each sensor, and the compression\nwould have to be undone during the learning step, thus adding to overall computational costs. On the\nother hand, every camera could encode its image by computing (either directly or indirectly) a small\nnumber of random projections to communicate to the central processor. These random projections\nare obtained by linear operations on the data, and thus are cheaply computed. Clearly, in many\nsituations it will be less expensive to store, transmit, and process such randomly projected versions\nof the sensed images. The question now becomes: how much information about the manifold is\nconveyed by these random projections, and is any advantage in analyzing such measurements from\na manifold learning perspective?\n\nIn this paper, we provide theoretical and experimental evidence that reliable learning of a K-\ndimensional manifold can be performed not just in the high-dimensional ambient space RN but also\nin an intermediate, much lower-dimensional random projection space RM , where M = CK log(N ).\nSee, for example, the toy example of Figure 1. Our contributions are as follows. First, we present a\ntheoretical bound on the minimum number of measurements per sample point required to estimate\nthe intrinsic dimension (ID) of the underlying manifold, up to an accuracy level comparable to that\nof the Grassberger-Procaccia algorithm [7, 8], a widely used geometric approach for dimensionality\nestimation. Second, we present a similar bound on the number of measurements M required for\nIsomap [6] \u2013 a popular manifold learning algorithm \u2013 to be \u201creliably\u201d used to discover the nonlinear\nstructure of the manifold. In both cases, M is shown to be linear in K and logarithmic in N . Third,\nwe formulate a procedure to determine, in practical settings, this minimum value of M with no a\npriori information about the data points. This paves the way for a weakly adaptive, linear algorithm\n(ML-RP) for dimensionality reduction and manifold learning.\n\nThe rest of the paper is organized as follows. Section 2 recaps the manifold learning approaches we\nutilize. In Section 3 presents our main theoretical contributions, namely, the bounds on M required\nto perform reliable dimensionality estimation and manifold learning from random projections. Sec-\n\n\ftion 4 describes a new adaptive algorithm that estimates the minimum value of M required to provide\na faithful representation of the data so that manifold learning can be performed. Experimental re-\nsults on a variety of real and simulated data are provided in Section 5. Section 6 concludes with\ndiscussion of potential applications and future work.\n\n2 Background\n\nAn important input parameter for all manifold learning algorithms is the intrinsic dimension (ID) of\na point cloud. We aim to embed the data points in as low-dimensional a space as possible in order to\navoid the curse of dimensionality. However, if the embedding dimension is too small, then distinct\ndata points might be collapsed onto the same embedded point. Hence a natural question to ask is:\ngiven a point cloud in N -dimensional Euclidean space, what is the dimension of the manifold that\nbest captures the structure of this data set? This problem has received considerable attention in the\nliterature and remains an active area of research [7, 9, 10].\n\nFor the purposes of this paper, we focus our attention on the Grassberger-Procaccia (GP) [7] algo-\nrithm for ID estimation. This is a widely used geometric technique that takes as input the set of\npairwise distances between sample points. It then computes the scale-dependent correlation dimen-\nsion of the data, de\ufb01ned as follows.\n\nDe\ufb01nition 2.1 Suppose X = (x1, x2, ..., xn) is a \ufb01nite dataset of underlying dimension K. De\ufb01ne\n\nCn(r) =\n\n1\n\nn(n \u2212 1)Xi6=j\n\nIkxi\u2212xj k<r,\n\nwhere I is the indicator function. The scale-dependent correlation dimension of X is de\ufb01ned as\n\nbDcorr(r1, r2) =\n\nlog Cn(r1) \u2212 log Cn(r2)\n\nlog r1 \u2212 log r2\n\n.\n\nThe best possible approximation to K (call this bK) is obtained by \ufb01xing r1 and r2 to the biggest\n\nrange over which the plot is linear and the calculating Dcorr in that range. There are a number of\npractical issues involved with this approach; indeed, it has been shown that geometric ID estimation\nalgorithms based on \ufb01nite sampling yield biased estimates of intrinsic dimension [10, 11]. In our\ntheoretical derivations, we do not attempt to take into account this bias; instead, we prove that\nthe effect of running the GP algorithm on a suf\ufb01cient number of random projections produces a\ndimension estimate that well-approximates the GP estimate obtained from analyzing the original\npoint cloud.\n\nIsomap [6], Locally Linear Embedding (LLE) [12], and Hessian Eigenmaps [13], among many\n\nThe estimate bK of the ID of the point cloud is used by nonlinear manifold learning algorithms (e.g.,\nothers) to generate a bK-dimensional coordinate representation of the input data points. Our main\n\nanalysis will be centered around Isomap. Isomap attempts to preserve the metric structure of the\nmanifold, i.e., the set of pairwise geodesic distances of any given point cloud sampled from the\nmanifold. In essence, Isomap approximates the geodesic distances using a suitably de\ufb01ned graph\nand performs classical multidimensional scaling (MDS) to obtain a reduced K-dimensional repre-\nsentation of the data [6]. A key parameter in the Isomap algorithm is the residual variance, which is\nequivalent to the stress function encountered in classical MDS. The residual variance is a measure\nof how well the given dataset can be embedded into a Euclidean space of dimension K. In the next\nsection, we prescribe a speci\ufb01c number of measurements per data point so that performing Isomap\non the randomly projected data yields a residual variance that is arbitrarily close to the variance\nproduced by Isomap on the original dataset.\n\nWe conclude this section by revisiting the results derived in [1], which form the basis for our de-\nvelopment. Consider the effect of projecting a smooth K-dimensional manifold residing in RN\nonto a random M -dimensional subspace (isomorphic to RM ). If M is suf\ufb01ciently large, a stable\nnear-isometric embedding of the manifold in the lower-dimensional subspace is ensured. The key\nadvantage is that M needs only to be linear in the intrinsic dimension of the manifold K. In addition,\nM depends only logarithmically on other properties of the manifold, such as its volume, curvature,\netc. The result can be summarized in the following theorem.\n\n\fTheorem 2.2 [1] Let M be a compact K-dimensional manifold in RN having volume V and\ncondition number 1/\u03c4. Fix 0 < \u01eb < 1 and 0 < \u03c1 < 1. Let \u03a6 be a random orthoprojector1 from RN\nto RM and\n\nM \u2265 O(cid:18) K log(N V \u03c4 \u22121) log(\u03c1\u22121)\n\n\u01eb2\n\n(cid:19) .\n\n(1)\n\nSuppose M < N . Then, with probability exceeding 1 \u2212 \u03c1, the following statement holds: For every\npair of points x, y \u2208 M, and i \u2208 {1, 2},\n\n(1 \u2212 \u01eb)r M\n\nN\n\n\u2264\n\ndi(\u03a6x, \u03a6y)\n\ndi(x, y)\n\n\u2264 (1 + \u01eb)r M\n\nN\n\n.\n\n(2)\n\nwhere d1(x, y) (respectively, d2(x, y)) stands for the geodesic (respectively, \u21132) distance between\npoints x and y.\n\nThe condition number \u03c4 controls the local, as well as global, curvature of the manifold \u2013 the smaller\nthe \u03c4, the less well-conditioned the manifold with higher \u201ctwistedness\u201d [1]. Theorem 2.2 has been\nproved by \ufb01rst specifying a \ufb01nite high-resolution sampling on the manifold, the nature of which\ndepends on its intrinsic properties; for instance, a planar manifold can be sampled coarsely. Then the\nJohnson-Lindenstrauss Lemma [14] is applied to these points to guarantee the so-called \u201cisometry\nconstant\u201d \u01eb, which is nothing but (2).\n\n3 Bounds on the performance of ID estimation and manifold learning\n\nalgorithms under random projection\n\nWe saw above that random projections essentially ensure that the metric structure of a high-\ndimensional input point cloud (i.e., the set of all pairwise distances between points belonging to the\ndataset) is preserved up to a distortion that depends on \u01eb. This immediately suggests that geometry-\nbased ID estimation and manifold learning algorithms could be applied to the lower-dimensional,\nrandomly projected version of the dataset.\n\nThe \ufb01rst of our main results establishes a suf\ufb01cient dimension of random projection M required to\nmaintain the \ufb01delity of the estimated correlation dimension using the GP algorithm. The proof of\nthe following is detailed in [15].\n\nTheorem 3.1 Let M be a compact K-dimensional manifold in RN having volume V and condi-\ntion number 1/\u03c4. Let X = {x1, x2, ...} be a sequence of samples drawn from a uniform density\n\nsupported on M. Let bK be the dimension estimate of the GP algorithm on X over the range\n\n(rmin, rmax). Let \u03b2 = ln(rmax/rmin) . Fix 0 < \u03b4 < 1 and 0 < \u03c1 < 1. Suppose the following\ncondition holds:\n\nLet \u03a6 be a random orthoprojector from RN to RM with M < N and\n\nrmax < \u03c4 /2\n\n(3)\n\nM \u2265 O(cid:18) K log(N V \u03c4 \u22121) log(\u03c1\u22121)\n\n\u03b22\u03b42\n\n(cid:19) .\n\n(4)\n\nLet bK\u03a6 be the estimated correlation dimension on \u03a6X in the projected space over the range\n(rminpM/N , rmaxpM/N ). Then, bK\u03a6 is bounded by:\n\n(5)\n\nwith probability exceeding 1 \u2212 \u03c1.\n\n(1 \u2212 \u03b4)bK \u2264 bK\u03a6 \u2264 (1 + \u03b4)bK\n\nTheorem 3.1 is a worst-case bound and serves as a suf\ufb01cient condition for stable ID estimation using\nrandom projections. Thus, if we choose a suf\ufb01ciently small value for \u03b4 and \u03c1, we are guaranteed\nestimation accuracy levels as close as desired to those obtained with ID estimation in the original\n\nsignal space. Note that the bound on bK\u03a6 is multiplicative. This implies that in the worst case, the\n\n1Such a matrix is formed by orthogonalizing M vectors of length N having, for example, i.i.d. Gaussian or\n\nBernoulli distributed entries.\n\n\fnumber of projections required to estimate bK\u03a6 very close to bK (say, within integer roundoff error)\n\nbecomes higher with increasing manifold dimension K.\nThe second of our main results prescribes the minimum dimension of random projections required\nto maintain the residual variance produced by Isomap in the projected domain within an arbitrary\nadditive constant of that produced by Isomap with the full data in the ambient space. This proof of\nthis theorem [15] relies on the proof technique used in [16].\n\nTheorem 3.2 Let M be a compact K-dimensional manifold in RN having volume V and condition\nnumber 1/\u03c4. Let X = {x1, x2, ..., xn} be a \ufb01nite set of samples drawn from a suf\ufb01ciently \ufb01ne\ndensity supported on M. Let \u03a6 be a random orthoprojector from RN to RM with M < N . Fix\n0 < \u01eb < 1 and 0 < \u03c1 < 1. Suppose\n\nM \u2265 O(cid:18) K log(N V \u03c4 \u22121) log(\u03c1\u22121)\n\n\u01eb2\n\n(cid:19) .\n\nDe\ufb01ne the diameter \u0393 of the dataset as follows:\n\n\u0393 = max\n\n1\u2264i,j\u2264n\n\ndiso(xi, xj)\n\nwhere diso(x, y) stands for the Isomap estimate of the geodesic distance between points x and y.\nDe\ufb01ne R and R\u03a6 to be the residual variances obtained when Isomap generates a K-dimensional\nembedding of the original dataset X and projected dataset \u03a6X respectively. Under suitable con-\nstructions of the Isomap connectivity graphs, R\u03a6 is bounded by:\n\nR\u03a6 < R + C\u03932\u01eb\n\nwith probability exceeding 1 \u2212 \u03c1. C is a function only on the number of sample points n.\n\nSince the choice of \u01eb is arbitrary, we can choose a large enough M (which is still only logarithmic\nin N ) such that the residual variance yielded by Isomap on the randomly projected version of the\ndataset is arbitrarily close to the variance produced with the data in the ambient space. Again,\nthis result is derived from a worst-case analysis. Note that \u0393 acts as a measure of the scale of the\ndataset. In practice, we may enforce the condition that the data is normalized (i.e., every pairwise\ndistance calculated by Isomap is divided by \u0393). This ensures that the K-dimensional embedded\nrepresentation is contained within a ball of unit norm centered at the origin.\n\nThus, we have proved that with only an M -dimensional projection of the data (with M \u226a N )\nwe can perform ID estimation and subsequently learn the structure of a K-dimensional manifold,\nup to accuracy levels obtained by conventional methods. In Section 4, we utilize these suf\ufb01ciency\nresults to motivate an algorithm for performing practical manifold structure estimation using random\nprojections.\n\n4 How many random projections are enough?\n\nIn practice, it is hard to know or estimate the parameters V and \u03c4 of the underlying manifold. Also,\n\nsince we have no a priori information regarding the data, it is impossible to \ufb01x bK and R, the outputs\n\nof GP and Isomap on the point cloud in the ambient space. Thus, often, we may not be able \ufb01x a\nde\ufb01nitive value for M . To circumvent this problem we develop the following empirical procedure\nthat we dub it ML-RP for manifold learning using random projections.\n\nWe initialize M to a small number, and compute M random projections of the data set X =\n{x1, x2, ..., xn} (here n denotes the number of points in the point cloud). Using the set \u03a6X =\n\n{\u03a6x : x \u2208 X}, we estimate the intrinsic dimension using the GP algorithm. This estimate, say bK,\nis used by the Isomap algorithm to produce an embedding into bK-dimensional space. The resid-\n\nual variance produced by this operation is recorded. We then increment M by 1 and repeat the\nentire process. The algorithm terminates when the residual variance obtained is smaller than some\ntolerance parameter \u03b4. A full length description is provided in Algorithm 1.\nThe essence of ML-RP is as follows. A suf\ufb01cient number M of random projections is determined by\na nonlinear procedure (i.e., sequential computation of Isomap residual variance) so that conventional\n\n\fAlgorithm 1 ML-RP\n\nM \u2190 1\n\u03a6 \u2190 Random orthoprojector of size M \u00d7 N .\nwhile residual variance \u2265 \u03b4 do\n\nRun the GP algorithm on \u03a6X.\n\nUse ID estimate (bK) to perform Isomap on \u03a6X.\n\nCalculate residual variance.\nM \u2190 M + 1\nAdd one row to \u03a6\n\nend while\nreturn M\n\nreturn bK\n\n(a)\n\n(b)\n\nFigure 2: PerformanceofIDestimationusingGPasafunctionofrandomprojections. Samplesizen=1000,\nambient dimension N = 150. (a) Estimated intrinsic dimension for underlying hyperspherical manifolds of\nincreasing dimension. The solidlineindicates thevalue of theIDestimateobtained by GP performed on the\noriginaldata. (b)MinimumnumberofprojectionsrequiredforGPtoworkwith90%accuracyascomparedto\nGPonnativedata.\n\nmanifold learning does almost as well on the projected dataset as the original. On the other hand,\nthe random linear projections provide a faithful representation of the data in the geodesic sense.\nIn this manner, ML-RP helps determine the number of rows that \u03a6 requires in order to act as an\noperator that preserves metric structure. Therefore, ML-RP can be viewed as an adaptive method\nfor linear reduction of data dimensionality. It is only weakly adaptive in the sense that only the\nstopping criterion for ML-RP is determined by monitoring the nature of the projected data.\n\nThe results derived in Section 3 can be viewed as convergence proofs for ML-RP. The existence of\na certain minimum number of measurements for any chosen error value \u03b4 ensures that eventually,\nM in the ML-RP algorithm is going to become high enough to ensure \u201cgood\u201d Isomap performance.\nAlso, due to the built-in parsimonious nature of ML-RP, we are ensured to not \u201covermeasure\u201d the\nmanifold, i.e., just the requisite numbers of projections of points are obtained.\n\n5 Experimental results\n\nThis section details the results of simulations of ID estimation and subsequent manifold learning on\nreal and synthetic datasets. First, we examine the performance of the GP algorithm on random pro-\njections of K-dimensional dimensional hyperspheres embedded in an ambient space of dimension\nN = 150. Figure 2(a) shows the variation of the dimension estimate produced by GP as a function\nof the number of projections M . The sampled dataset in each of the cases is obtained from drawing\nn = 1000 samples from a uniform distribution supported on a hypersphere of corresponding dimen-\nsion. Figure 2(b) displays the minimum number of projections per sample point required to estimate\nthe scale-dependent correlation dimension directly from the random projections, up to 10% error,\nwhen compared to GP estimation on the original data.\n\nWe observe that the ID estimate stabilizes quickly with increasing number of projections, and indeed\nconverges to the estimate obtained by running the GP algorithm on the original data. Figure 2(b)\nillustrates the variation of the minimum required projection dimension M vs. K, the intrinsic dimen-\n\n\fFigure 3: Standarddatabases. AmbientdimensionforthefacedatabaseN=4096;ambientdimensionforthe\nhandrotationdatabasesN=3840.\n\nFigure 4: Performance of ML-RP on the above databases. (left)ML-RP on the face database (N = 4096).\nGoodapproximatesareobtainedfor M > 50. (right)ML-RPonthehandrotationdatabase(N = 3840). For\nM > 60,theIsomapvarianceisindistinguishablefromthevarianceobtainedintheambientspace.\n\nsion of the underlying manifold. We plot the intrinsic dimension of the dataset against the minimum\n\nnumber of projections required such that bK\u03a6 is within 10% of the conventional GP estimate bK (this\n\nis equivalent to choosing \u03b4 = 0.1 in Theorem 3.1). We observe the predicted linearity (Theorem 3.1)\nin the variation of M vs K.\nFinally, we turn our attention to two common datasets (Figure 3) found in the literature on dimension\nestimation \u2013 the face database2 [6], and the hand rotation database [17].3 The face database is a\ncollection of 698 arti\ufb01cial snapshots of a face (N = 64 \u00d7 64 = 4096) varying under 3 degrees of\nfreedom: 2 angles for pose and 1 for lighting dimension. The signals are therefore believed to reside\non a 3D manifold in an ambient space of dimension 4096. The hand rotation database is a set of\n90 images (N = 64 \u00d7 60 = 3840) of rotations of a hand holding an object. Although the image\nappearance manifold is ostensibly one-dimensional, estimators in the literature always overestimate\nits ID [11].\n\nRandom projections of each sample in the databases were obtained by computing the inner product\nof the image samples with an increasing number of rows of the random orthoprojector \u03a6. We\nnote that in the case of the face database, for M > 60, the Isomap variance on the randomly\nprojected points closely approximates the variance obtained with full image data. This behavior of\nconvergence of the variance to the best possible value is even more sharply observed in the hand\nrotation database, in which the two variance curves are indistinguishable for M > 60. These results\nare particularly encouraging and demonstrate the validity of the claims made in Section 3.\n\n6 Discussion\n\nOur main theoretical contributions in this paper are the explicit values for the lower bounds on the\nminimum number of random projections required to perform ID estimation and subsequent manifold\nlearning using Isomap, with high guaranteed accuracy levels. We also developed an empirical greedy\nalgorithm (ML-RP) for practical situations. Experiments on simple cases, such as uniformly gener-\nated hyperspheres of varying dimension, and more complex situations, such as the image databases\ndisplayed in Figure 3, provide suf\ufb01cient evidence of the nature of the bounds described above.\n\n2http://isomap.stanford.edu\n3http://vasc.ri.cmu.edu//idb/html/motion/hand/index.html. Note that we use a subsampled version of the\n\ndatabase used in the literature, both in terms of resolution of the image and sampling of the manifold.\n\n\fThe method of random projections is thus a powerful tool for ensuring the stable embedding of low-\ndimensional manifolds into an intermediate space of reasonable size. The motivation for developing\nresults and algorithms that involve random measurements of high-dimensional data is signi\ufb01cant,\nparticularly due to the increasing attention that Compressive Sensing (CS) has received recently. It\nis now possible to think of settings involving a huge number of low-power devices that inexpen-\nsively capture, store, and transmit a very small number of measurements of high-dimensional data.\nML-RP is applicable in all such situations. In situations where the bottleneck lies in the transmission\nof the data to the central processing node, ML-RP provides a simple solution to the manifold learn-\ning problem and ensures that with minimum transmitted amount of information, effective manifold\nlearning can be performed. The metric structure of the projected dataset upon termination of ML-\nRP closely resembles that of the original dataset with high probability; thus, ML-RP can be viewed\nas a novel adaptive algorithm for \ufb01nding an ef\ufb01cient, reduced representation of data of very large\ndimension.\n\nReferences\n\n[1] R. G. Baraniuk and M. B. Wakin. Random projections of smooth manifolds. 2007. To appear\n\nin Foundations of Computational Mathematics.\n\n[2] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and\nR. G. Baraniuk. An architecture for compressive imaging. In IEEE International Conference\non Image Processing (ICIP), pages 1273\u20131276, Oct. 2006.\n\n[3] S. Kirolos, J.N. Laska, M.B. Wakin, M.F. Duarte, D.Baron, T. Ragheb, Y. Massoud, and R.G.\nBaraniuk. Analog-to-information conversion via random demodulation. In Proc. IEEE Dallas\nCircuits and Systems Workshop (DCAS), 2006.\n\n[4] E. J. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc-\ntion from highly incomplete frequency information. IEEE Trans. Info. Theory, 52(2):489\u2013509,\nFeb. 2006.\n\n[5] D. L. Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4):1289\u20131306, September\n\n2006.\n\n[6] J. B. Tenenbaum, V.de Silva, and J. C. Landford. A global geometric framework for nonlinear\n\ndimensionality reduction. Science, 290:2319\u20132323, 2000.\n\n[7] P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. Physica D\n\nNonlinear Phenomena, 9:189\u2013208, 1983.\n\n[8] J. Theiler. Statistical precision of dimension estimators. Physical Review A, 41(6):3038\u20133051,\n\n1990.\n\n[9] F. Camastra. Data dimensionality estimation methods: a survey. Pattern Recognition, 36:2945\u2013\n\n2954, 2003.\n\n[10] J. A. Costa and A. O. Hero. Geodesic entropic graphs for dimension and entropy estimation in\n\nmanifold learning. IEEE Trans. Signal Processing, 52(8):2210\u20132221, August 2004.\n\n[11] E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances\n\nin NIPS, volume 17. MIT Press, 2005.\n\n[12] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Sci-\n\nence, 290:2323\u20132326, 2000.\n\n[13] D. Donoho and C. Grimes. Hessian eigenmaps: locally linear embedding techniques for high\n\ndimensional data. Proc. of National Academy of Sciences, 100(10):5591\u20135596, 2003.\n\n[14] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of the JL lemma. Technical Report\n\nTR-99-006, University of California, Berkeley, 1999.\n\n[15] C. Hegde, M. B. Wakin, and R. G. Baraniuk. Random projections for manifold learning -\n\nproofs and analysis. Technical Report TREE 0710, Rice University, 2007.\n\n[16] M. Bernstein, V. de Silva, J. Langford, and J. Tenenbaum. Graph approximations to geodesics\n\non embedded manifolds, 2000. Technical report, Stanford University.\n\n[17] B. K\u00b4egl. Intrinsic dimension estimation using packing numbers. In Advances in NIPS, vol-\n\nume 14. MIT Press, 2002.\n\n\f", "award": [], "sourceid": 1100, "authors": [{"given_name": "Chinmay", "family_name": "Hegde", "institution": null}, {"given_name": "Michael", "family_name": "Wakin", "institution": null}, {"given_name": "Richard", "family_name": "Baraniuk", "institution": null}]}