{"title": "Dimensionality reduction: theoretical perspective on practical measures", "book": "Advances in Neural Information Processing Systems", "page_first": 10577, "page_last": 10589, "abstract": "Dimensionality reduction plays a central role in real-world applications for Machine Learning, among many fields. In particular, metric dimensionality reduction where data from a general metric is mapped into low dimensional space, is often used as a first step before applying machine learning algorithms. In almost all these applications the quality of the embedding is measured by various average case criteria. Metric dimensionality reduction has also been studied in Math and TCS, within the extremely fruitful and influential field of metric embedding. Yet, the vast majority of theoretical research has been devoted to analyzing the worst case behavior of embeddings and therefore has little relevance to practical settings. The goal of this paper is to bridge the gap between theory and practice view-points of metric dimensionality reduction, laying the foundation for a theoretical study of more practically oriented analysis. This paper can be viewed as providing a comprehensive theoretical framework addressing a line of research initiated by VL [NeuroIPS' 18] who have set the goal of analyzing different distortion measurement criteria, with the lens of Machine Learning applicability, from both theoretical and practical perspectives.\nWe complement their work by considering some important and vastly used average case criteria, some of which originated within the well-known Multi-Dimensional Scaling framework. While often studied in practice, no theoretical studies have thus far attempted at providing rigorous analysis of these criteria. In this paper we provide the first analysis of these, as well as the new distortion measure developed by [VL18] designed to possess Machine Learning desired properties. Moreover, we show that all measures considered can be adapted to possess similar qualities. The main consequences of our work are nearly tight bounds on the absolute values of all distortion criteria, as well as first approximation algorithms with provable guarantees.", "full_text": "Dimensionality reduction: theoretical perspective on\n\npractical measures\n\nYair Bartal\u2217\n\nDepartment of Computer Science\nHebrew University of Jerusalem\n\nJerusalem, Israel\n\nyair@cs.huji.ac.il\n\nNova Fandina\n\nfandina@cs.huji.ac.il\n\nDepartment of Computer Science\nHebrew University of Jerusalem\n\nJerusalem, Israel\n\nOfer Neiman\n\nDepartment of Computer Science\nBen Gurion University of the Negev\n\nBeer-Sheva, Israel\n\nneimano@cs.bgu.ac.il\n\nAbstract\n\nDimensionality reduction plays a central role in real world applications for Machine\nLearning, among many \ufb01elds. In particular, metric dimensionality reduction, where\ndata from a general metric is mapped into low dimensional space, is often used\nas a \ufb01rst step before applying machine learning algorithms. In almost all these\napplications the quality of the embedding is measured by various average case\ncriteria. Metric dimensionality reduction has also been studied in Math and TCS,\nwithin the extremely fruitful and in\ufb02uential \ufb01eld of metric embedding. Yet, the\nvast majority of theoretical research has been devoted to analyzing the worst case\nbehavior of embeddings, and therefore has little relevance to practical settings. The\ngoal of this paper is to bridge the gap between theory and practice view-points of\nmetric dimensionality reduction, laying the foundation for a theoretical study of\nmore practically oriented analysis.\nThis paper can be viewed as providing a comprehensive theoretical framework for\nanalyzing different distortion measurement criteria, with the lens of practical appli-\ncability, and in particular for Machine Learning. The need for this line of research\nwas recently raised by Chennuru Vankadara and von Luxburg in (13)[NeurIPS\u2019 18],\nwho emphasized the importance of pursuing it from both theoretical and practical\nperspectives.\nWe consider some important and vastly used average case criteria, some of which\noriginated within the well-known Multi-Dimensional Scaling framework. While\noften studied in practice, no theoretical studies have thus far attempted at providing\nrigorous analysis of these criteria. In this paper we provide the \ufb01rst analysis of these,\nas well as the new distortion measure developed in (13) designed to posses Machine\nLearning desired properties. Moreover, we show that all measures considered can\nbe adapted to posses similar qualities. The main consequences of our work are\nnearly tight bounds on the absolute values of all distortion criteria, as well as \ufb01rst\napproximation algorithms with provable guarantees.\nAll our theoretical results are backed by empirical experiments.\n\n\u2217Author names are ordered alphabetically.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\f1\n\nIntroduction\n\nMetric Embedding plays an important role in a vast range of application areas such as machine\nlearning, computer vision, computational biology, networking, statistics, data mining, neuroscience\nand mathematical psychology, to name a few. Perhaps the most signi\ufb01cant is the application of metric\ndimensionality reduction for large data sets, where the data is represented by points in a metric space.\nIt is desirable to ef\ufb01ciently embed the data into low dimensional space which would allow compact\nrepresentation, ef\ufb01cient use of resources, ef\ufb01cient access and interpretation, and enable operations to\nbe carried out signi\ufb01cantly faster.\nIn machine learning, this task is often used as a preliminary step before applying various machine\nlearning algorithms, and sometimes refereed to as unsupervised metric dimensionality reduction.\nSome studies of dimensionality reduction within ML include (10; 11; 13; 42; 49). Moreover, there are\nnumerous practical studies of metric embedding and dimensionality reduction appearing in a plethora\nof papers ranging in a wide scope of research areas including work on Internet coordinate systems,\nfeature extraction, similarity search, visual recognition, and computational biology applications; the\npapers (25; 41; 24; 5; 21; 17; 45; 48; 51; 52; 44; 32; 12; 11; 47) are just a small sample.\nIn nearly all practical applications of metric embedding and dimensionality reduction methods, the\nfundamental criterion for measuring the quality of the embedding is its average performance over all\npairs, where the measure of quality per pair is often the distortion, the square distortion and similar\nrelated notions. Such experimental results often indicate that the quality of metric embeddings and\ndimensionality reduction techniques behave very well in practice.\nIn contrast, the classic theory of metric embedding has mostly failed to address this phenomenon.\nDeveloped over the past few decades by both mathematicians and theoretical computer scientists\n(see (26; 34; 27) for surveys), it has been extremely fruitful in analyzing the worst case distortion of\nembeddings. However, worst case analysis results often exhibit extremely high lower bounds. Indeed,\nin most cases, the worst case bounds are growing, in terms of both distortion and dimension, as a\nfunction of the size of the space. Such bounds are often irrelevant in practical terms.\nThese concerns were recently raised in the context of Machine Learning in (13) (NeurIPS\u201918),\nstressing the desire for embeddings into constant dimension with constant distortion. The authors of\n(13) state the necessity for a systematic study of different average distortion measures. Their main\nmotivation is to examine the relevance of these measures for machine learning applications. Here, the\n\ufb01rst step is made to tackle this challenge.\nThe goal of this paper is to bridge between theory and practice outlook on metric embedding and\ndimensionality reduction.\nIn particular, providing the \ufb01rst comprehensive rigorous analysis of\nthe most basic practically oriented average case quality measurement criteria, using methods and\ntechniques developed within the classic theory of metric embedding, thereby providing new insights\nfor both theory and practice.\nWe focus on some of the most basic and commonly used average distortion measurement criteria:\nMoments analysis: moments of distortion and Relative Error. The most basic average case\nperformance criterion is the average distortion. More generally, one could study all q-moments of\nthe distortion for every 1 \u2264 q \u2264 \u221e. This notion was \ufb01rst studied in (1). For a non-contractive\nembedding f, whose distortion for a pair of points u, v is denoted distf (u, v):\nDe\ufb01nition 1 ((cid:96)q-distortion). Let (X, dX ) and (Y, dY ) be any metric spaces, and f : X \u2192 Y be an\n\n(cid:1) and q \u2265 1, the (cid:96)q-distortion of f with respect to \u03a0 is\n\nembedding. For any distribution \u03a0 over(cid:0)X\n\n2\n\nde\ufb01ned by: (cid:96)q-dist (\u03a0)(f ) = (E\u03a0 [(distf (u, v))q])\n\n1\n\nq , (cid:96)\u221e-dist (\u03a0)(f ) = sup\u03a0(u,v)(cid:54)=0 {distf (u, v)}.\n\nThe most natural case is where \u03a0 is the uniform distribution (and will be omitted from the notation). In\norder for this de\ufb01nition to extend to handle embeddings in their full generality and address important\napplications such as dimensionality reduction, it turns out that one should remove the assumption\nthat the embedding is non-contractive.\nWe therefore naturally extend the above de\ufb01nition to deal with arbitrary embeddings by let-\nting distf (u, v) = max{expansf (u, v), contrf (u, v)}, where expansf (u, v) = dY (f (u),f (v))\n,\n\ndX (u,v)\n\n2\n\n\fdY (f (u),f (v)). In the full version (in supplementary materials) we provide justi\ufb01ca-\n\ncontrf (u, v) = dX (u,v)\ntion of the necessity of this de\ufb01nition. Observe that this de\ufb01nition is not scale invariant2.\nIn many practical cases, where we may expect a near isometry for most pairs, the moments of\ndistortion may not be sensitive enough and more delicate measures of quality, which examine directly\nthe pairwise additive error, may be desired. The relative error measure (REM), commonly used\nin network applications (45; 44; 15) is the most natural choice. It turns out that this measure can\nbe viewed as the moment of distortion about 1. This gives rise to the following generalization of\nDe\ufb01nition 1:\nDe\ufb01nition 2 ((cid:96)q-distortion about c, REM). For c \u2265 0, the (cid:96)q-distortion of f about c is given by:\n\n(cid:96)q-dist (\u03a0)\n\n(c) (f ) = (E\u03a0 [|distf (u, v) \u2212 c|q])\n\n1\n\nq , REM (\u03a0)\n\nq\n\n(f ) = (cid:96)q-dist (\u03a0)\n\n(1) (f ).\n\nAdditive distortion measures: Stress and Energy. Multi Dimensional Scaling (see (16; 8)) is a\nwell-established methodology aiming at embedding a metric representing the relations between\nobjects into (usually Euclidean) low-dimensional space, to allow feature extraction often used for\nindexing, clustering, nearest neighbor searching and visualization in many application areas, including\nmachine learning (42). Several average additive error criteria for the embedding\u2019s quality have been\nsuggested in the context of MDS over the years. Perhaps the most popular is the stress measure going\nback to (30). For duv = dX (u, v) and \u02c6duv = dY (f (u), f (v)), for normalized nonnegative weights\n\u03a0(u, v) (or distribution) we de\ufb01ne the following natural generalizations, which include the classic\nKruskal stress Stress\u2217\n2(f ) and normalized stress Stress2(f ) measures, as well as other common\n\n(cid:18) E\u03a0[| \u02c6duv\u2212duv|q]\n\n(cid:19)1/q\n\nE\u03a0[(duv)q]\n\n, and\n\nvariants in the literature (e.g. (23; 46; 20; 9; 50; 11)): Stress(\u03a0)\n\n(f ) =\n\nq\n\n(cid:17)1/q\n\n(cid:16) E\u03a0[| \u02c6duv\u2212duv|q]\n(cid:17)q(cid:105)(cid:17)1/q\n\n(cid:16)\n\n(cid:104)(cid:16)| \u02c6duv\u2212duv|\n\nduv\n\nq\n\n(f ) =\n\nStress\u2217(\u03a0)\n. Another popular and widely used additive error measure is\nenergy and its special case, Sammon cost (see e.g. (43; 7; 14; 36; 37; 12)). We de\ufb01ne the following\ngeneralizations, which include some common variants (e.g. (41; 45; 44; 33)): Energy(\u03a0)\n(f ) =\n\nE\u03a0[( \u02c6duv)q]\n\nq\n\n(cid:16)\n\n(cid:104)(cid:16) | \u02c6duv\u2212duv|q\n\n(cid:17)q(cid:105)(cid:17)1/q\n\nE\u03a0\n\n, and REM (\u03a0)\n\n(f ) =\n\nq\n\nE\u03a0\n\nmin{ \u02c6duv,duv}\n\n.\n(f ) \u2264 (cid:96)q-dist (\u03a0)(f ).\n(f ) \u2264 REM (\u03a0)\n(f ) are equivalent via a simple transfor-\n\nIt immediately follows from the de\ufb01nitions that: Energy(\u03a0)\nand Energy(\u03a0(cid:48))\nAlso it\u2019s not hard to observe that Stress(\u03a0)\nmation of weights.\nML motivated measure: \u03c3-Distortion. Recently published paper (13) studies various existing\nand commonly used quality criteria in terms of their relevance in machine learning applications.\nParticularly, the authors suggest a new measure, \u03c3- distortion, which is claimed to possess all the\nnecessary properties for machine learning applications. We consider a generalized version of \u03c3-\n\nq\n\nq\n\nq\n\nq\n\n(cid:1),\n\ndistortion3. Let (cid:96)r-expans(f ) = ((cid:0)n\n(cid:1)\u22121(cid:80)\n(cid:104)(cid:12)(cid:12)(cid:12) expansf (u,v)\n\n(\u03a0)(f ) =\n\n(cid:16)\n\nE\u03a0\n\n2\n\nu(cid:54)=v(expansf (u, v))r)1/r. For a distribution \u03a0 over(cid:0)X\n(cid:12)(cid:12)(cid:12)q(cid:105)(cid:17)1/q\n\n2\n\n(cid:96)r-expans (f )\n\nlet \u03a6\u03c3,q,r\n(for q = 2, r = 1 this is the square root of\nthe measure de\ufb01ned by (13)). We show that the tools we develop in this paper can be applied to\n\u03c3-distortion to obtain theoretical bounds on its value.\nWe further show (Section7), generalizing (13), that all other average distortion measures considered\nhere can be easily adapted to satisfy similar ML motivated properties.\nA basic contribution of our paper is showing deeper tight relations between these different objective\nfunctions, and further developing properties and tools for analyzing embeddings for these measures.\nWhile these measures have been extensively studied from a practical point of view, and many\nheuristics are known in the literature, almost nothing is known in terms of rigorous analysis and\nabsolute bounds. Moreover, many real-world misconceptions exist about what dimension may be\nnecessary for good embeddings. In this paper we present the \ufb01rst theoretical analysis of all these\n\n\u2212 1\n\n2We note that if one desires scale invariability it may always be achieved by de\ufb01ning the scale-invariant\nmeasure to be the minimization of the measure over all possible scaling of the embedding. For simplicity we\nfocus on the non-scalable version\n\n3It is easy to verify that the general version satis\ufb01es all the properties considered in (13).\n\n3\n\n\fmeasures providing absolute bounds that shed light on these questions. We exhibit approximation\nalgorithms for optimizing these measures, and further applications.\nIn this paper we focus only on analyzing objective measures that attempt to preserve metric structure.\nAs a result, some popular objective measures used in applied settings are beyond the scope of this\npaper, this includes the widely used t-SNE heuristic (which aims at re\ufb02ecting the cluster structure of\nthe data, and generally does not preserve metric structure), and various heuristics with local structure\nobjectives. When validating our theoretical \ufb01ndings experimentally (Section 6), we chose to compare\nour results with the most common in practice heuristics PCA/classical-MDS and Isomap amongst the\nvarious methods that appear in the literature.\nMoment analysis of dimensionality reduction. The main theoretical question our paper studies is:\nProblem 1 ((k, q)-Dimension Reduction). Given a dimension bound k and 1 \u2264 q \u2264 \u221e, what is\nthe least \u03b1(k, q) such that every \ufb01nite subset of Euclidean space embeds into k dimensions with\nMeasureq \u2264 \u03b1(k, q) ?\nThis question can be phrased for each Measureq of practical importance. A stronger demand would\nbe to require a single embedding to simultaneously achieve best possible bounds for all values of q.\nWe answer Problem 1 by providing (almost tight for most of the values of k and q) upper and\nlower bounds on \u03b1(k, q). In particular we prove that the Johnson-Lindenstrauss (JL) dimensionality\nreduction achieves bounds in terms of q and k that dramatically outperform a widely used in practice\nPCA algorithm. Moreover, our experiments show that the same holds for the Isomap and classical\nMDS methods.\nThe bounds we obtain provide several interesting conclusions regarding the expected behavior of\ndimensionality reduction methods. As expected, the bound for the JL method is improving as k\ngrows, con\ufb01rming the intuition expressed in (13). Yet, countering their intuition, the bound does\nnot increase as a function of the original dimension d. A phase transition, exhibited in our bounds,\nprovides guidance on how to choose the target dimension k.\nAnother consequence arises by combining our result with the embedding of (1), by composing it with\nthe JL: we obtain an embedding of general spaces into a constant dimensional Euclidean space with\nconstant distortion, for all discussed measures (presented in the full version). Here, the dimension is\nconstant even if the original space is not doubling, improving on the result obtained in (13).\nApproximation algorithms. The bounds achieved for the Euclidean (k, q)-dimension reduction are\nthen applied to provide the \ufb01rst approximation algorithms for embedding general metric spaces into\nlow dimensional Euclidean space, for all the various distortion criteria. This is based on composing\nconvex programming with the JL-transform. It should be stressed that such a composition may not\nnecessarily work in general, however, we are able to show that this yields ef\ufb01cient approximation\nalgorithms for all the criteria considered in this paper.\nThe results on the JL transform yield bounds on distance oracles. In the full version, we provide\nadditional applications, including metric hyper sketching, a generalization of standard sketching.\nEmpirical Experiments. We validate our theoretical \ufb01ndings experimentally on various randomly\ngenerated Euclidean and non-Euclidean metric spaces, in Section 6. In particular, as predicted by our\nlower bounds, the phase transition is clearly seen in the JL, PCA and Isomap embeddings for all the\nmeasurement criteria. Moreover, in our simulations the JL based approximation algorithm (as well as\nthe JL itself, when applied on Euclidean metrics) has shown dramatically better performance than the\nPCA and Isomap heuristics for all distortion measures, indicating that the JL-based approximation\nalgorithm is a preferable choice when the preservation of metric properties is desirable.\nRelated work. For Euclidean embedding, it was shown in (35) that using SDP one can obtain\narbitrarily good approximation of the distortion. However, such a result is impossible when restricting\nthe target dimension to k, as in (39) it was shown that unless P=NP, the approximation factor must be\nn\u2126(1/k). Of all the measures studied in this paper, only Stressq was previously studied. In (10), it\nwas shown that computing an embedding into R1 with optimal Stressq is NP-hard, for any given q.\nThe only approximation algorithms known for this problem are the following: a 2-approximation to\nStress\u221e for embedding into R1 (22); an O(log1/q n)-approximation to Stressq for embedding into\nR1 (19); an O(1)-approximation to Stress\u221e for embedding into (cid:96)2\nAll proofs are omitted from this version and appear in the full version (in supplemental material).\n\n1 (6).\n\n4\n\n\f2 On the limitations of classical MDS\n\nPractitioners have developed various heuristics to cope with dimensionality reduction (see (49) for\na comprehensive overview). Most of the suggested methods are based on iterative improvement of\nvarious objectives. All these strategies do not provide theoretical guarantees on convergence to the\nglobal minimum and most of them even do not necessarily converge. Furthermore, classical MDS\nor PCA, one of the widely used heuristics, is usually referred to as the method that computes the\noptimal solution for minimizing Stress2. We show that this is in fact false: PCA can produce an\nembedding with Stressq value being far from optimum, even for the space that can be ef\ufb01ciently\nembedded into a line.4 Consider the following subset of Rd. For any \u03b1 < 1/2, for all i \u2208 [1, d], for\nany q \u2265 1, let si = 1/(\u03b1i)q. Let Xi \u2282 (cid:96)d\n2 be the (multi) set of size 2si that contains si copies of\nthe vector \u03b1i \u00b7 ei, denoted by X +\ni , and si copies of the antipodal vector \u2212\u03b1i \u00b7 ei, denoted by X\u2212\ni ,\nwhere ei is the standard basis vector of Rd. De\ufb01ne X as the union of all Xi. In the full version of the\npaper, we show that X can be embedded into a line with Stressq/Energyq(f ) = O(\u03b1/d1/q), for\nany q \u2265 1. Yet, for the PCA algorithm applied on X, into k \u2264 \u03b2 \u00b7 d dimensions (\u03b2 < 1), it holds that\nStressq/Energyq(F ) = \u2126(1), and (cid:96)q-dist /REMq(F ) = \u221e.\nMoreover, our empirical experiments show that the PCA and Isomap methods have signi\ufb01cantly\nworse performance than the JL on a variety of randomly generated families of metric spaces.\n\n3 Euclidean dimension reduction: moment analysis of the JL transform\n\nFrom a theoretical perspective, dimensionality reduction is known to be possible in Euclidean space\nvia the Johnson-Lindenstrauss Lemma (29), a cornerstone of Banach space analysis and metric\nembedding theory, playing a central role in a plethora of applications. The lemma states that every n\npoint subset of Euclidean space can be embedded in O(\u0001\u22122 log n) dimensions with worst case 1 + \u0001\na \ufb01xed dimension k, the worst case distortion becomes as bad as O(n2/k\u221a\ndistortion. The dimension bound is shown to be tight in (31) (improving upon (4)). When applied in\nlog n). Moreover, in (40)\na lower bound of n\u2126(1/k) on the worst case distortion of any embedding in k dimensions was proven.\nHowever, as explained above, in many practical instances it is desirable to replace the demand for\nworst case with average case guarantees. It should be noted yet that the JL transform does have good\nproperties, even when applied in k dimensions. The JL lemma in fact implies that in dimension k\nfor every pair there is some constant probability (\u2248 exp(\u2212\u00012k)) that a 1 + \u0001 distortion is achieved.\nWhile in itself an appealing property, it should be stressed that standard tail bounds arguments cannot\nimply that the average (or higher moments) distortion is bounded. Indeed, we show that for certain\nspecialized implementations of the JL embedding, such as those of (2) (e.g., using Rademacher\nentries matrix), (3) (fast JL), and (18) (sparse JL), the (cid:96)q-dist and REM q are unbounded.\nObservation 1. Let k \u2265 1, and d > k. Let Ed = {ei}1\u2264i\u2264d \u2286 (cid:96)d\nvectors. Assume that a linear map f : (cid:96)d\nfor all i, j, P [i, j] \u2208 U for some \ufb01nite set U \u2282 R. If |U| < d 1\n(cid:96)q-dist(f ), REMq(f ) = \u221e.\nThe proof follows by volume argument: for matrix P the set f (Ed) = {P ei}1\u2264i\u2264d is exactly the\nset of columns of P . Since the entries of P belong to U, there can be at most |U|k < d different\ncolumns in the set f (Ed). Therefore, there is at least one pair of vectors in Ed that will be mapped\ninto the same image by f. This implies the observation as (cid:96)q-distortion and REM q measures depend\non the inverse of the embedded distance.\nYet, focusing on the Gaussian entries implementation by (28) we show that it behaves dramatically\nbetter. Let X \u2282 (cid:96)d\n2 be an n-point set, and k \u2265 1 be an integer. The JL transform of dimension k,\nf : X \u2192 (cid:96)k\n2 is de\ufb01ned by generating a random matrix T of size k \u00d7 d, with i.i.d. standard normal\nentries, and setting f (x) = 1\u221a\nTheorem 1. Let X \u2282 (cid:96)d\ntransform f : X \u2192 (cid:96)k\n\n2 be an n-point set, and let k \u2265 1. Given any distribution \u03a0 over(cid:0)X\n\n2 be the set of standard basis\n2 is given by a transformation matrix Pk\u00d7d, such that\nk then for the set Ed, for all q \u2265 1,\n\n2 is s.t. with probability at least 1/2, (cid:96)q-dist (\u03a0)(f ) is bounded by:\n\n(cid:1), the JL\n\n2\n\nT x, for all x \u2208 X.\n\nk\n\n2 \u2192 (cid:96)k\n\n4We note that PCA is proven to minimize(cid:80)\n\nbut not over embeddings (not even linear maps).\n\nu(cid:54)=v\u2208X (d2\n\nuv \u2212 \u02c6d2\n\nuv) over all projections into k dimensions (38),\n\n5\n\n\f\u221a\n\n(cid:16) 1\u221a\n\n1 \u2264 q <\n1 + O\n\nk\n\n(cid:17)\n\nk\n\n\u221a\n\n(cid:16) q\n\nk \u2264 q \u2264 k\n1 + O\n\n4\nk\u2212q\n\n(cid:17)\n\nk\n\n(cid:17)O(1/q)\n(cid:16) k\n4 \u2264 q (cid:47) k\nk\u2212q\n\nq = k\n(log n)O(1/k)\n\nk (cid:47) q \u2264 \u221e\nnO( 1\n\nk \u2212 1\nq )\n\nThe bounds are asymptotically tight for most values of k and q when the embedding is required to\n\nmaintain all bounds simultaneously. For \ufb01xed q tightness holds for most values of q \u2265 \u221a\n\nk.\n\nNote that for large q our theorem shows that a phase transition emerges around q = k. The necessity\nof this phenomenon is implied by nearly tight lower bounds given in Section 4.1.\nAdditive distortion and \u03c3-distortion measures analysis. The following theorem provides tight\nupper bounds for all the additive distortion measures and for \u03c3-distortion, for 1 \u2264 q \u2264 k \u2212 1. This\nfollows from analyzing the REM (via similar approach to the raw moments analysis):\nTheorem 2. Given a \ufb01nite set X \u2282 (cid:96)d\n\n(cid:1), with constant probability, for all 1 \u2264 r \u2264 q \u2264 k \u2212 1:\n\ndimension k. For any distribution \u03a0 over(cid:0)X\n\n2 and an integer k \u2265 2, let f : X \u2192 (cid:96)k\n\n2 be the JL transform of\n\n2\n\nREM (\u03a0)\n\nq\n\n(f ), Energy(\u03a0)\n\nq\n\n(f ), \u03a6\u03c3,q,r\n\n(\u03a0)(f ), Stress(\u03a0)\n\nq\n\n(f ), Stress\u2217(\u03a0)\n\nq\n\n(f ) = O\n\n(cid:16)(cid:112)q/k\n\n(cid:17)\n\n.\n\nThe more challenging part of the analysis is \ufb01guring out how good are the JL performance bounds.\nTherefore our main goal is the task of establishing lower bounds for Problem 1.\n\n4 Partially tight lower bounds: q < k\n\nIn the full version we show that JL is essentially optimal when simultaneous guarantees are required.\nIf that requirement is removed, it is still the case for most of the ranges of q. Providing lower bounds\nlower bound of 1 + \u2126(q/(k \u2212 q)) for the range 1 \u2264 q \u2264 k \u2212 1. For q \u2264 \u221a\nfor each range requires a different technique. One of the most interesting cases, is the proof of the\nk, this turns out to be a\nconsequence of the tightness for the additive distortion measures and \u03c3-distortion, shown to be tight\nfor q \u2265 2. The proof is based on a delicate application of the technique of (4). We show that the\nanalysis of the JL transform for the additive measures and \u03c3-distortion, provides tight bounds for all\nvalues of 2 \u2264 q \u2264 k. Due to tight relations between the additive measures, the lower bounds for all\nmeasures follow from Energy measure. Let En denote an n-point equilateral metric space.\n, for any embedding f : En \u2192 (cid:96)k\nClaim 3. For all k \u2265 2, k \u2265 q \u2265 2, and n \u2265 4\n\n(cid:17)q/2\n\n9 \u00b7 k\n\n(cid:16)\n\n2 it holds\n\nq\n\nthat Energyq(f ) = \u2126((cid:112) q\n(cid:96)q-dist(f ) = 1 + \u2126(cid:0) q\n\nk\n\nk ).\n\nClaim 4. For all k \u2265 1, 1 \u2264 q < 2, and n \u2265 18k, for all f : En \u2192 (cid:96)k\nA more involved argument shows that Claim 3 implies\nCorollary 1. For any k \u2265 1 and any n \u2265 18k, for any embedding f : En \u2192 (cid:96)k\n\n(cid:1), for all 1 \u2264 q \u2264 \u221a\n\nk.\n\n2, Energyq(f ) = \u2126(cid:0) 1\n\nk1/q\n\n(cid:1).\n\n2 it holds that\n\n(cid:16) q\n\nk log k(cid:1).\n\n(cid:17)\n, for all q = \u2126(cid:0)\u221a\n\n2 it holds that (cid:96)q-dist(F ) \u2265 1 + \u2126\n\nBased on (31), we also prove\nTheorem 5. For all k \u2265 16, for all N large enough, there is a metric space Z \u2286 (cid:96)2 on N points,\nsuch that for any F : Z \u2192 (cid:96)k\n\nk\u2212q\n4.1 Phase transition: moment analysis lower bounds for q \u2265 k\nAn important consequence of our analysis is that the q-moments of the distortion (including REM q),\nexhibit an impressive phase transition phenomenon occurring around q = k. This follows from lower\nbounds for q \u2265 k. The case q = k (and \u2248 k) is of special interest where we obtain a tight bound:\nTheorem 6. Any embedding f : En \u2192 (cid:96)k\nlog n)1/k/k1/4), for any k \u2265 1.\nHence, for any q, the theorem tells that only k \u2265 1.01q may be suitable for dimensionality reduction.\nThis new consequence may serve an important guide for practical considerations, that seems to be\nmissing prior to our work. We also prove the following claim for large values of q:\nClaim 7. For any embedding f : En \u2192 (cid:96)k\n2, for all k \u2265 1, for all q > k, (cid:96)q-dist(f ) =\n\u2126(max{n(\n\n2 has (cid:96)k-dist(f ) = \u2126((\n\n2(cid:100)k/2(cid:101)\u2212 2\n\n2q }).\n\n2k \u2212 1\n\nq ), n\n\n\u221a\n\n1\n\n1\n\n6\n\n\f5 Approximate optimal embedding of general metrics\n\nPerhaps the most basic goal in dimensionality reduction theory and essentially, the main problem\nof MDS, is: Given an arbitrary metric space compute an embedding into k dimensional Euclidean\nspace which approximates the best possible embedding, in terms of minimizing a particular distortion\nmeasure objective. Except for some very special cases no such approximation algorithms were\nknown prior to this work. Applying our moment analysis bounds for JL we are able to obtain the\n\ufb01rst general approximation guarantees to all the discussed measures. The bounds are obtained via\nconvex programming combined with the JL-transform. While the basic idea is quite simple, it is not\nobvious that it can actually go through. The main obstacle is that all q-moment measures are not\nassociative. In fact, this is not generally the case that combining two embeddings results in a good\n\ufb01nal embedding. However, as we show, this is indeed true speci\ufb01cally for JL-type embeddings.\n\nq = {(cid:96)q-dist (\u03a0), REM (\u03a0)\nLet OBJ (\u03a0)\n(cid:111)\nq\nof the objective measures. For Obj(\u03a0)\n\n(cid:110)\n\nObj(\u03a0)\n\nq\n\n2\n\n, Energy(\u03a0)\n\n(cid:110)\nq } denote the set\nq \u2208 OBJ (\u03a0)\nObj(\u03a0)\n, and\n. Note that OP T (n) \u2264 OP T . The \ufb01rst step of the approximation\n\nq\n, denote OP T (n) = inf f :X\u2192(cid:96)n\n\n\u03c3,q,2, Stress(\u03a0)\n\n, Stress\u2217(\u03a0)\n\n, \u03a6(\u03a0)\n\n(cid:111)\n\n(f )\n\nq\n\nq\n\nq\n\n2\n\nq\n\nq\n\nq\n\nq\n\nq\n\nq\n\n(h)\n\nq = Stress\u2217(\u03a0)\n\n2 with Stress\u2217(\u03a0)\n\n, without constraining the target dimension.\n\n(cid:54)= Stress\u2217(\u03a0)\n2 such that Obj(\u03a0)\n\nOP T = inf h:X\u2192(cid:96)k\nalgorithm is to compute OP T (n) for a given Obj(\u03a0)\nTheorem 8. Let (X, dX ) be an n-point metric space and \u03a0 be any distribution. Then for any\nq \u2265 2 and for Obj(\u03a0)\nthere exists a polynomial time algorithm that computes an\nembedding f : X \u2192 (cid:96)n\n(f ) approximates OP T (n) to within any level of precision.\nFor Obj(\u03a0)\nthere exists a polynomial time algorithm that computes an embedding\nf : X \u2192 (cid:96)n\nThe proof is based on formulating the appropriate convex optimization program, which can be solved\nin polynomial time by interior-point methods.The exception is Stress\u2217\nq which is inherently non-\nconvex. We show that Stress\u2217\nq can be reduced to the case of Stressq, with an additional constant\nfactor loss, and that optimizing for \u03a6\u03c3,q,2 can be reduced to the case of Energyq. The second step in\nthe algorithm is applying the JL to reduce the dimension to the desired number of dimensions k.\n\n(f ) = O(cid:0)OP T (n)(cid:1).\n\nTheorem 9. For any \ufb01nite metric (X, dX ), any distribution \u03a0 over (cid:0)X\nO(OP T ) + O((cid:112)q/k), for Obj(\u03a0)\n\n2 \u2264 q \u2264 k\u22121, there is a randomized polynomial time algorithm that \ufb01nds an embedding F : X \u2192 (cid:96)k\n2,\nsuch that with high probability: (cid:96)q-dist (\u03a0)(F ) = (1 + O( 1\u221a\nk\u2212q ))OP T ; and Obj(\u03a0)\n(F ) =\nq }.\n, Stress\u2217(\u03a0)\n, Energy(\u03a0)\n\n(cid:1), for any k \u2265 3 and\n\n+ q\nk\n\u03c3,q,2, Stress(\u03a0)\n, \u03a6(\u03a0)\n\nq \u2208 {REM (\u03a0)\n\n2\n\nq\n\nq\n\nq\n\nq\n\n6 Empirical experiments\n\nIn this section we provide experiments to demonstrate that the theoretical results are exhibited in\npractical settings. We also compare in the experiments the bounds of the theoretical algorithms\n(JL and the approximation algorithm based on it) to some of the most common heuristics. In all\nthe experiments, we use Normal distribution (with random variance) for sampling Euclidean input\nspaces.5 Tests were made for a large range of parameters, averaging over at least 10 independent\ntests. The results are consistent for all settings and measures.\n\u221a\nWe \ufb01rst recall the main theoretical results to be veri\ufb01ed. In Theorem 1 and Theorem 2 we showed\nthat for q < k the (cid:96)q-distortion is bounded by 1 + O(1/\nk) + O(q/k), and all the rest measures are\n\nbounded by O((cid:112)q/k). Particularly, the bounds are independent of the size n and dimension d of the\n\ninput data set. In addition, our lower bounds in Section 4.1 show that for (cid:96)q-distortion and REMq\nmeasures a phase transition must occur at q \u223c k for any dimensionality reduction method, where the\nbounds dramatically increase from being bounded by a constant to grow with n as poly(n) for q < k.\nFinally, in Section 5 we exhibited an approximation algorithm for all distortion measures.\nThe graphs in Fig.1 and Fig.2a describe the following setting: A random Euclidean space X of a\n\ufb01xed size n and dimension d = n = 800 was embedded into k \u2208 [4, 30] dimensions with q = 5,\n5We note that (13) used similar settings with Normal/Gamma distributions. Most of our experimental results\n\nhold also for the Gamma distribution.\n\n7\n\n\fby the JL/PCA/Isomap methods. We stress that we run many more experiments for a wide range\nof parameter values of n \u2208 [100, 3000], k \u2208 [2, 100], q \u2208 [1, 10], and obtained essentially identical\nqualitative behavior. In Fig. 1a, the (cid:96)q-distortion as a function of k of the JL embedding is shown for\nq = 8, 10, 12. The phase transitions are seen at around k \u223c q as predicted. In Fig. 1b the bounds\nand the phase transitions of the PCA and Isomap methods are presented for the same setting, as\npredicted. In Fig. 1c, (cid:96)q-distortion bounds are shown for increasing values of k > q. Note that the\n(cid:96)q-distortion of the JL is a small constant close to 1, as predicted, compared to values signi\ufb01cantly\n> 2 for the compared heuristics. Overall, Fig. 1 clearly shows the superiority of JL to the other\nmethods for all the range of values of k. The same conclusions as above hold for \u03c3-distortion as well,\n\n(a) Phase transition: JL.\n\n(b) Phase transition: PCA, Isomap.\n\n(c) Comparing (cid:96)q-dists for k > q.\n\nFigure 1: Validating (cid:96)q-distortion behavior.\n\nas shown in Fig. 2a. In the experiment shown in Fig. 2b, we tested the behavior of the \u03c3-distortion\nas a function of d-the dimension of the input data set, similarly to that of (13)(Fig. 2), and tests are\nshown for embedding dimension k = 20 and q = 2. According to our theorems, the \u03c3-distortion of\nthe JL transform is bounded above by a constant independent of d, for q < k. Our experiment shows\nthat the \u03c3-distortion is growing as d increases for both PCA/Isomap, whereas it is a constant for JL.\nMoreover, JL obtains signi\ufb01cantly smaller value of \u03c3-distortion.\n\n(a) \u03c3-distortion.\n\n(b) \u03c3-distortion as a function of d.\n\nFigure 2: Validating \u03c3-dist. behavior.\n\nFigure 3: Non-Euclidean in-\nput metric: (cid:96)q-distortion be-\nhavior.\n\nIn the last experiment, Fig.3, we tested the quality of our approximation algorithm on non-Euclidean\ninput spaces versus the classical MDS and Isomap methods (adapted for non-Euclidean input spaces).\nThe construction of the space is as follows: \ufb01rst, a sampled Euclidean space X, of size and dimension\nn = d = 100, is generated as above; second, the interpoint distances of X are distorted with a\nnoise factor 1 + \u0001, with \u0001 \u223c N (0, \u03b4), for \u03b4 < 1. We ensure that the resulting space is a valid\nnon-Euclidean metric. We then embed the \ufb01nal space into k \u2208 [10, 30] dimensions with q = 5. Since\nthe non-Euclidean space is 1 + \u0001 far from being Euclidean, we expect a similar behavior to that shown\nin Fig. 1c. The result clearly demonstrates the superiority of the JL-based approximation algorithm.\n\n7 On relevance of distortion measures for ML\n\nIn (13) the authors developed a set of properties a distortion measure has to satisfy in order to be useful\nfor machine learning. Here we show that these properties can be generalized and that appropriate\nmodi\ufb01cations of all the measurement criteria discussed in this paper satisfy all of them.\nFor an embedding f : X \u2192 Y , let \u03c1f (u, v) be an error function of a pair u (cid:54)= v \u2208 X, which is a func-\ntion of the embedded distance and original distance between u and v. Let \u03c1(f ) = (\u03c1f (u, v))u(cid:54)=v\u2208X\ndenote the vector of \u03c1f (u, v) for all pairs u (cid:54)= v \u2208 X. Let M (\u03a0)\n: \u03c1(f ) \u2192 R+ be a measure function,\ndistf (u, v) and \u03c1f (u, v) := distf (u, v) \u2212 1, respectively; for Energyq, and Stressq measures,\n\n(cid:1). For instance, for (cid:96)q-distortion measure and REMq, \u03c1f (u, v) :=\n\nq\n\nfor any distribution \u03a0 over(cid:0)X\n\n2\n\n8\n\n\fq\n\n(\u03c1(f )) := (E\u03a0[(cid:107)\u03c1(f )(cid:107)q\n\n\u03c1f (u, v) := |expansf (u, v) \u2212 1|; for \u03a6\u03c3,q,r, \u03c1f (u, v) := |expansf (u, v)/ (cid:96)r-expans(f ) \u2212 1|. All\nthe measures are then de\ufb01ned by M (\u03a0)\nq])1/q. In what follows we will omit\n\u03a0 from the notation. We propose the generalizations of the ML motivated properties de\ufb01ned in (13):\nScalability. Although a measurement criterion may not necessarily be scalable, it can be naturally\nmodi\ufb01ed to a scalable version as follows. For every Mq, de\ufb01ne \u02c6Mq(\u03c1(f )) = min\u03b1>0 Mq(\u03c1(\u03b1 \u00b7 f )).\nNote that the upper and lower bounds that hold for Mq also hold for its scalable version \u02c6Mq.\nMonotonicity. We generalize this property as follows. Let f, g : X \u2192 Y be any embeddings.\nFor a given measure Mq, let \u02c6f and \u02c6g be embeddings minimizing Mq(\u03c1(\u03b1 \u00b7 f )) and Mq(\u03c1(\u03b1 \u00b7 g)),\nrespectively (over all scaling factors \u03b1 > 0). If \u02c6f and \u02c6g are such that for every pair u (cid:54)= v \u2208 X it\nholds that \u03c1 \u02c6f (u, v) \u2265 \u03c1\u02c6g(u, v), then the measure \u02c6Mq is monotone iff Mq(\u03c1( \u02c6f )) \u2265 Mq(\u03c1(\u02c6g)).\nRobustness to outliers in data/in distances. The measure \u02c6Mq is said to be robust to outliers if for any\nembedding fn of an n-point space, any modi\ufb01cation \u02dcfn where a constant number of changes occurs\nin either points or distances, it holds that limn\u2192\u221e Mq(\u03c1(fn)) = limn\u2192\u221e Mq(\u03c1( \u02dcf n)).\nIncorporation of the probability distribution. Let h : X \u2192 Y be an embedding and let u (cid:54)= v \u2208 X\nand x (cid:54)= y \u2208 X, such that \u03a0(u, v) > \u03a0(x, y) and \u03c1h(u, v) = \u03c1h(x, y). Assume that f : X \u2192 Y is\nidentical to h, except over (u, v), and assume that g is identical to h, except over (x, y), and assume\nthat \u03c1f (u, v) = \u03c1g(x, y). Now let \u02c6f and \u02c6h be de\ufb01ned as above and assume \u03c1 \u02c6f (u, v) \u2265 \u03c1\u02c6h(u, v). Then,\nthe measure \u02c6M (\u03a0)\n(\u03c1(\u02c6g)).\nRobustness to noise was not formally de\ufb01ned in (13). Assuming the model of noise that affects the\nerror \u03c1 by at most a factor of 1 + \u0001 (alternatively an additive error of \u0001) for each pair, the requirement\nis that the measure \u02c6Mq will be changed by at most factor of 1 + O(\u0001) (or additive O(\u0001)).\nIt is easy to see that all distortion criteria (adapted to be scalable as in the \ufb01rst property) discussed in\nthis paper obey all the above properties, implying their relevance to the ML applications.\n\nis said to incorporate the probability distribution \u03a0 if M (\u03a0)\n\nq\n\n(\u03c1( \u02c6f )) > M (\u03a0)\n\nq\n\nq\n\n8 Discussion\n\nThis work provides a new framework for theoretical analysis of embeddings in terms of performance\nmeasures that are of practical relevance, initiating a theoretical study of a wide range of average case\nquality measurement criteria, and providing the \ufb01rst rigorous analysis of these criteria.\nWe use this framework to analyze the new distortion measure developed in (13) designed to posses\nmachine learning desired properties and show that all considered distortion measures can be adapted\nto posses similar qualities.\nWe show nearly tight bounds on the absolute values of all distortion criteria, essentially showing that\nthe JL transform is near optimal for dimensionality reduction for most parameter regimes. When\nconsidering other methods, the JL bound can serve as guidance and it would make sense to treat a\nmethod useful only when it beats the JL bound. A phase transition exhibited in our bounds provides a\ndirection on how to choose the target dimension k, i.e. k should be greater than q by a factor > 1.\nThis means that the amount of outlier pairs is diminishing as k grows.\nA major contribution of our paper is providing the \ufb01rst approximation algorithms for embedding any\n\ufb01nite metric (possibly non-Euclidean) into k-dimensional Euclidean space with provable approxima-\ntion guarantees. Since these approximation algorithms achieve near optimal distortion bounds they are\nexpected to beat most common heuristics in terms of the relevant distortion measures. Evidence exists\nthat there is correlation between lower distortion measures and quality of machine learning algorithms\napplied on the resulting space, such as in (13), where such correlation is experimentally shown\nbetween \u03c3-distortion and error bounds in classi\ufb01cation. This evidence suggests that the improvement\nin distortion bounds should be re\ufb02ected in better bounds for machine learning applications.\nOur experiments show that the conclusions above hold in practical settings as well.\n\n9\n\n\fAcknowledgments\n\nThis work is supported by ISF grant #1817/17 and BSF grant #2015813.\n\nReferences\n[1] Ittai Abraham, Yair Bartal, and Ofer Neiman. Advances in metric embedding theory. Ad-\nvances in Mathematics, 228(6):3026 \u2013 3126, 2011. ISSN 0001-8708. doi: 10.1016/j.aim.\n2011.08.003. URL http://www.sciencedirect.com/science/article/pii/\nS000187081100288X.\n\n[2] Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary\ncoins. Journal of Computer and System Sciences, 66(4):671 \u2013 687, 2003. ISSN 0022-0000. doi:\nhttp://dx.doi.org/10.1016/S0022-0000(03)00025-4. URL http://www.sciencedirect.\ncom/science/article/pii/S0022000003000254. Special Issue on {PODS} 2001.\n\n[3] Nir Ailon and Edo Liberty. An almost optimal unrestricted fast johnson-lindenstrauss transform.\nACM Trans. Algorithms, 9(3):21:1\u201321:12, June 2013. ISSN 1549-6325. doi: 10.1145/2483699.\n2483701. URL http://doi.acm.org/10.1145/2483699.2483701.\n\n[4] Noga Alon. Perturbed identity matrices have high rank: Proof and applications. Combinatorics,\nProbability & Computing, 18(1-2):3\u201315, 2009. doi: 10.1017/S0963548307008917. URL\nhttp://dx.doi.org/10.1017/S0963548307008917.\n\n[5] Vassilis Athitsos and Stan Sclaroff. Database indexing methods for 3d hand pose estimation. In\n\nGesture Workshop, pages 288\u2013299, 2003.\n\n[6] Mihai Badoiu. Approximation algorithm for embedding metrics into a two-dimensional space.\nIn Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pages\n434\u2013443. Society for Industrial and Applied Mathematics, 2003.\n\n[7] Wojciech Basalaj. Proximity visualisation of abstract data. Technical Report UCAM-CL-TR-\n\n509, University of Cambridge, Computer Laboratory, January 2001.\n\n[8] I. Borg and P. J. F. Groenen. Modern Multidimensional Scaling: Theory and Applications\n\n(Springer Series in Statistics). Springer, Berlin, 2nd edition, 2005.\n\n[9] Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel. Generalized multidimen-\nsional scaling: A framework for isometry-invariant partial surface matching. Proceedings of the\nNational Academy of Sciences, 103(5):1168\u20131172, 2006.\n\n[10] Lawrence Cayton and Sanjoy Dasgupta. Robust euclidean embedding. In Proceedings of\nthe 23rd International Conference on Machine Learning, ICML \u201906, pages 169\u2013176, New\nYork, NY, USA, 2006. ACM. ISBN 1-59593-383-2. doi: 10.1145/1143844.1143866. URL\nhttp://doi.acm.org/10.1145/1143844.1143866.\n\n[11] A. Censi and D. Scaramuzza. Calibration by correlation using metric embedding from\nIEEE Transactions on Pattern Analysis and Machine Intelligence,\nISSN 0162-8828. doi: 10.1109/TPAMI.2013.34. URL\n\nnonmetric similarities.\n35(10):2357\u20132370, Oct. 2013.\ndoi.ieeecomputersociety.org/10.1109/TPAMI.2013.34.\n\n[12] Samidh Chatterjee, Bradley Neff, and Piyush Kumar. Instant approximate 1-center on road\nnetworks via embeddings. In Proceedings of the 19th ACM SIGSPATIAL International Con-\nference on Advances in Geographic Information Systems, GIS \u201911, pages 369\u2013372, New York,\nNY, USA, 2011. ACM. ISBN 978-1-4503-1031-4. doi: 10.1145/2093973.2094025. URL\nhttp://doi.acm.org/10.1145/2093973.2094025.\n\n[13] Leena Chennuru Vankadara and Ulrike von Luxburg. Measures of distortion for ma-\nchine learning.\nIn S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi,\nand R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages\n4891\u20134900. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/\n7737-measures-of-distortion-for-machine-learning.pdf.\n\n10\n\n\f[14] M. Costa, M. Castro, R. Rowstron, and P. Key. Pic: practical internet coordinates for dis-\ntance estimation. In 24th International Conference on Distributed Computing Systems, 2004.\nProceedings., pages 178\u2013187, 2004. doi: 10.1109/ICDCS.2004.1281582.\n\n[15] Russ Cox, Frank Dabek, Frans Kaashoek, Jinyang Li, and Robert Morris. Practical, distributed\nnetwork coordinates. SIGCOMM Comput. Commun. Rev., 34(1):113\u2013118, January 2004.\nISSN 0146-4833. doi: 10.1145/972374.972394. URL http://doi.acm.org/10.1145/\n972374.972394.\n\n[16] T. F. Cox and M. A. A. Cox. Multidimensional Scaling (Monographs on Statistics and Applied\n\nProbability). Chapman and Hall/CRC, 2nd edition, 2000.\n\n[17] Frank Dabek, Russ Cox, M. Frans Kaashoek, and Robert Morris. Vivaldi: a decentralized\nIn Proceedings of the ACM SIGCOMM 2004 Conference on\nnetwork coordinate system.\nApplications, Technologies, Architectures, and Protocols for Computer Communication, August\n30 - September 3, 2004, Portland, Oregon, USA, pages 15\u201326, 2004. doi: 10.1145/1015467.\n1015471. URL http://doi.acm.org/10.1145/1015467.1015471.\n\n[18] Anirban Dasgupta, Ravi Kumar, and Tam\u00e1s Sarl\u00f3s. A sparse Johnson- Lindenstrauss transform.\nIn Proceedings of the forty-second ACM symposium on Theory of computing, pages 341\u2013350.\nACM, 2010.\n\n[19] Kedar Dhamdhere. Approximating additive distortion of embeddings into line metrics. In\nApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques,\npages 96\u2013104. Springer, 2004.\n\n[20] Patrick J. F. Groenen, Rudolf Mathar, and Willem J. Heiser. The majorization approach to\nmultidimensional scaling for minkowski distances. Journal of Classi\ufb01cation, 12(1):3\u201319, 1995.\n\n[21] Eran Halperin, Jeremy Buhler, Richard M. Karp, Robert Krauthgamer, and B. Westover. Detect-\ning protein sequence conservation via metric embeddings. In ISMB (Supplement of Bioinfor-\nmatics), pages 122\u2013129, 2003.\n\n[22] Johan H\u00e5stad, Lars Ivansson, and Jens Lagergren. Fitting points on the real line and\nits application to rh mapping.\nISSN 0196-\n6774. doi: 10.1016/S0196-6774(03)00083-X. URL http://dx.doi.org/10.1016/\nS0196-6774(03)00083-X.\n\nJ. Algorithms, 49(1):42\u201362, October 2003.\n\n[23] W. J Heiser. Multidimensional scaling with least absolute residuals. In In H. H. Bock (Ed.)\n\nClassi\ufb01cation and related methods, pages 455\u2013462. Amsterdam: NorthHolland, 1988a.\n\n[24] G\u00edsli R. Hjaltason and Hanan Samet. Properties of embedding methods for similarity search-\ning in metric spaces. IEEE Trans. Pattern Anal. Mach. Intell., 25(5):530\u2013549, 2003. doi:\n10.1109/TPAMI.2003.1195989. URL http://dx.doi.org/10.1109/TPAMI.2003.\n1195989.\n\n[25] Gabriela Hristescu and Martin Farach-Colton. Cofe: A scalable method for feature extrac-\ntion from complex objects. In Proceedings of the Second International Conference on Data\nWarehousing and Knowledge Discovery, DaWaK 2000, pages 358\u2013371, London, UK, 2000.\nSpringer-Verlag. ISBN 3-540-67980-4. URL http://portal.acm.org/citation.\ncfm?id=646109.756709.\n\n[26] P. Indyk. Algorithmic applications of low-distortion geometric embeddings. In Proceedings\n42nd IEEE Symposium on Foundations of Computer Science, pages 10\u201333, Oct 2001. doi:\n10.1109/SFCS.2001.959878.\n\n[27] Piotr Indyk and Jiri Matou\u0161ek. Low-distortion embeddings of \ufb01nite metric spaces. URL\n\nciteseer.ist.psu.edu/672933.html.\n\n[28] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the\ncurse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory\nof Computing, STOC \u201998, pages 604\u2013613, New York, NY, USA, 1998. ACM. ISBN 0-89791-\n962-9. doi: 10.1145/276698.276876. URL http://doi.acm.org/10.1145/276698.\n276876.\n\n11\n\n\f[29] William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert\nspace. In Conference in modern analysis and probability (New Haven, Conn., 1982), pages\n189\u2013206. American Mathematical Society, Providence, RI, 1984.\n\n[30] J. B. Kruskal. Multidimensional scaling by optimizing goodness of \ufb01t to a nonmetric hypothesis.\n\nPsychometrika, 29(1):1\u201327, 1964.\n\n[31] Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. arXiv\n\npreprint arXiv:1609.02094, 2016.\n\n[32] Sanghwan Lee, Zhi-Li Zhang, Sambit Sahu, Debanjan Saha, and Mukund Srinivasan. Fun-\nIn NET-\ndamental effects of clustering on the euclidean embedding of internet hosts.\nWORKING 2007. Ad Hoc and Sensor Networks, Wireless Networks, Next Generation In-\nternet, 6th International IFIP-TC6 Networking Conference, Atlanta, GA, USA, May 14-18,\n2007, Proceedings, pages 890\u2013901, 2007. doi: 10.1007/978-3-540-72606-7_76. URL\nhttp://dx.doi.org/10.1007/978-3-540-72606-7_76.\n\n[33] Sanghwan Lee, Zhi-Li Zhang, Sambit Sahu, and Debanjan Saha. On suitability of euclidean\nembedding for host-based network coordinate systems. IEEE/ACM Trans. Netw., 18(1):27\u2013\n40, February 2010.\nISSN 1063-6692. doi: 10.1109/TNET.2009.2023322. URL http:\n//dx.doi.org/10.1109/TNET.2009.2023322.\n\n[34] N. Linial. Finite metric spaces- combinatorics, geometry and algorithms. In Proceedings of the\n\nICM, 2002.\n\n[35] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of\nits algorithmic applications. Combinatorica, 15(2):215\u2013245, 1995. ISSN 1439-6912. doi:\n10.1007/BF01200757. URL http://dx.doi.org/10.1007/BF01200757.\n\n[36] Eng Keong Lua, Timothy Grif\ufb01n, Marcelo Pias, Han Zheng, and Jon Crowcroft. On the accuracy\nof embeddings for internet coordinate systems. In Proceedings of the 5th ACM SIGCOMM Con-\nference on Internet Measurement, IMC \u201905, pages 11\u201311, Berkeley, CA, USA, 2005. USENIX\nAssociation. URL http://dl.acm.org/citation.cfm?id=1251086.1251097.\n\n[37] C. Lumezanu and N. Spring. Measurement manipulation and space selection in network\ncoordinates. In 2008 The 28th International Conference on Distributed Computing Systems,\npages 361\u2013368, June 2008. doi: 10.1109/ICDCS.2008.27.\n\n[38] Kantilal Varichand Mardia, John T. Kent, and John M. Bibby. Multivariate analy-\nsis. Probability and mathematical statistics. Acad. Press, London [u.a.], 1979.\nISBN\n0124712509. URL http://gso.gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=YOP&\nIKT=1016&TRM=ppn+02434995X&sourceid=fbw_bibsonomy.\n\n[39] Jiri Matousek and Anastasios Sidiropoulos. Inapproximability for metric embeddings into\nrd. In Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer\nScience, FOCS \u201908, pages 405\u2013413, Washington, DC, USA, 2008. IEEE Computer Society.\nISBN 978-0-7695-3436-7. doi: 10.1109/FOCS.2008.21. URL http://dx.doi.org/10.\n1109/FOCS.2008.21.\n\n[40] Ji\u02c7r\u00ed Matou\u0161ek. Bi-Lipschitz embeddings into low-dimensional Euclidean spaces. Commentat.\n\nMath. Univ. Carol., 31(3):589\u2013600, 1990. ISSN 0010-2628.\n\n[41] T. S. Eugene Ng and Hui Zhang. Predicting internet network distance with coordinates-based\napproaches. In Proceedings IEEE INFOCOM 2002, The 21st Annual Joint Conference of the\nIEEE Computer and Communications Societies, New York, USA, June 23-27, 2002, 2002. URL\nhttp://www.ieee-infocom.org/2002/papers/785.pdf.\n\n[42] Michael Quist, Golan Yona, and Bin Yu. Distributional scaling: An algorithm for structure-\npreserving embedding of metric and nonmetric spaces. Journal of Machine Learning Research,\npages 399\u2013420, 2004.\n\n[43] J. W. Sammon. A nonlinear mapping for data structure analysis.\n\nIEEE Transactions on\nComputers, C-18(5):401\u2013409, May 1969. ISSN 0018-9340. doi: 10.1109/T-C.1969.222678.\n\n12\n\n\f[44] Puneet Sharma, Zhichen Xu, Sujata Banerjee, and Sung-Ju Lee. Estimating network proximity\nand latency. Computer Communication Review, 36(3):39\u201350, 2006. doi: 10.1145/1140086.\n1140092. URL http://doi.acm.org/10.1145/1140086.1140092.\n\n[45] Yuval Shavitt and Tomer Tankel. Big-bang simulation for embedding network distances in\neuclidean space. IEEE/ACM Trans. Netw., 12(6):993\u20131006, December 2004. ISSN 1063-6692.\ndoi: 10.1109/TNET.2004.838597. URL http://dx.doi.org/10.1109/TNET.2004.\n838597.\n\n[46] Ian Spence and Stephan Lewandowsky. Robust multidimensional scaling. Psychometrika, 54\n\n(3):501\u2013513, 1989.\n\n[47] Sahaana Suri and Peter Bailis. DROP: dimensionality reduction optimization for time series.\n\nCoRR, abs/1708.00183, 2017. URL http://arxiv.org/abs/1708.00183.\n\n[48] Liying Tang and Mark Crovella. Geometric exploration of the landmark selection problem. In\nPassive and Active Network Measurement, 5th International Workshop, PAM 2004, Antibes,\npages 63\u201372, 2004.\n\n[49] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a\n\ncomparative review. J Mach Learn Res, 10:66\u201371, 2009.\n\n[50] J. Fernando Vera, Willem J. Heiser, and Alex Murillo. Global optimization in any minkowski\nmetric: A permutation-translation simulated annealing algorithm for multidimensional scaling.\nJ. Classif., 24(2):277\u2013301, September 2007. ISSN 0176-4268.\n\n[51] Jason Tsong-Li Wang, Xiong Wang, Dennis E. Shasha, and Kaizhong Zhang. Metricmap:\nan embedding technique for processing distance-based queries in metric spaces. IEEE Trans.\nSystems, Man, and Cybernetics, Part B, 35(5):973\u2013987, 2005. doi: 10.1109/TSMCB.2005.\n848489. URL http://dx.doi.org/10.1109/TSMCB.2005.848489.\n\n[52] Rongmei Zhang, Y. Charlie Hu, Xiaojun Lin, and Sonia Fahmy. A hierarchical approach to\ninternet distance prediction. In 26th IEEE International Conference on Distributed Computing\nSystems (ICDCS 2006), 4-7 July 2006, Lisboa, Portugal, page 73, 2006. doi: 10.1109/ICDCS.\n2006.7. URL http://dx.doi.org/10.1109/ICDCS.2006.7.\n\n13\n\n\f", "award": [], "sourceid": 5592, "authors": [{"given_name": "Yair", "family_name": "Bartal", "institution": "Hebrew University"}, {"given_name": "Nova", "family_name": "Fandina", "institution": "Hebrew University of Jerusalem"}, {"given_name": "Ofer", "family_name": "Neiman", "institution": "Ben-Gurion University"}]}