{"title": "Improved Error Bounds for Tree Representations of Metric Spaces", "book": "Advances in Neural Information Processing Systems", "page_first": 2838, "page_last": 2846, "abstract": "Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortion bounds depending on cardinality. Because cardinality is a simple property of any set, we argue that such bounds do not fully capture the rich structure endowed by the metric. We consider an embedding of a metric space into a tree proposed by Gromov. By proving a stability result, we obtain an improved additive distortion bound depending only on the hyperbolicity and doubling dimension of the metric. We observe that Gromov's method is dual to the well-known single linkage hierarchical clustering (SLHC) method. By means of this duality, we are able to transport our results to the setting of SLHC, where such additive distortion bounds were previously unknown.", "full_text": "Improved Error Bounds for Tree Representations of\n\nMetric Spaces\n\nSamir Chowdhury\n\nDepartment of Mathematics\nThe Ohio State University\n\nColumbus, OH 43210\n\nchowdhury.57@osu.edu\n\nDepartment of Computer Science and Engineering\n\nFacundo M\u00e9moli\n\nDepartment of Mathematics\n\nThe Ohio State University\n\nColumbus, OH 43210\nmemoli@math.osu.edu\n\nDepartment of Computer Science and Engineering\n\nZane Smith\n\nThe Ohio State University\n\nColumbus, OH 43210\nsmith.9911@osu.edu\n\nAbstract\n\nEstimating optimal phylogenetic trees or hierarchical clustering trees from metric\ndata is an important problem in evolutionary biology and data analysis. Intuitively,\nthe goodness-of-\ufb01t of a metric space to a tree depends on its inherent treeness, as\nwell as other metric properties such as intrinsic dimension. Existing algorithms for\nembedding metric spaces into tree metrics provide distortion bounds depending on\ncardinality. Because cardinality is a simple property of any set, we argue that such\nbounds do not fully capture the rich structure endowed by the metric. We consider\nan embedding of a metric space into a tree proposed by Gromov. By proving a\nstability result, we obtain an improved additive distortion bound depending only on\nthe hyperbolicity and doubling dimension of the metric. We observe that Gromov\u2019s\nmethod is dual to the well-known single linkage hierarchical clustering (SLHC)\nmethod. By means of this duality, we are able to transport our results to the setting\nof SLHC, where such additive distortion bounds were previously unknown.\n\n1\n\nIntroduction\n\nNumerous problems in data analysis are formulated as the question of embedding high-dimensional\nmetric spaces into \u201csimpler\" spaces, typically of lower dimension. In classical multidimensional\nscaling (MDS) techniques [18], the goal is to embed a space into two or three dimensional Euclidean\nspace while preserving interpoint distances. Classical MDS is helpful in exploratory data analysis,\nbecause it allows one to \ufb01nd hidden groupings in amorphous data by simple visual inspection.\nGeneralizations of MDS exist for which the target space can be a tree metric space\u2014see [13] for a\nsummary of some of these approaches, written from the point of view of metric embeddings. The\nmetric embeddings literature, which grew out of MDS, typically highlights the algorithmic gains\nmade possible by embedding a complicated metric space into a simpler one [13].\nThe special case of MDS where the target space is a tree has been of interest in phylogenetics for\nquite some time [19, 5]; the numerical taxonomy problem (NTP) is that of \ufb01nding an optimal tree\nembedding for a given metric space (X, dX ), i.e. a tree (X, tX ) such that the additive distortion,\nde\ufb01ned as (cid:107)dX \u2212 tX(cid:107)(cid:96)\u221e(X\u00d7X), is minimal over all possible tree metrics on X. This problem turns\nout to be NP-hard [3]; however, a 3-approximation algorithm exists [3], and a variant of this problem,\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fthat of \ufb01nding an optimal ultrametric tree, can be solved in polynomial time [11]. An ultrametric\ntree is a rooted tree where every point is equidistant from the root\u2014for example, ultrametric trees\nare the outputs of hierarchical clustering (HC) methods that show groupings in data across different\nresolutions. A known connection between HC and MDS is that the output ultrametric of single linkage\nhierarchical clustering (SLHC) is a 2-approximation to the optimal ultrametric tree embedding [16],\nthus providing a partial answer to the NTP. However, it appears that the existing line of work regarding\nNTP does not address the question of quantifying the (cid:96)\u221e distance between a metric (X, dX ) and its\noptimal tree metric, or even the optimal ultrametric. More speci\ufb01cally, we can ask:\nQuestion 1. Given a set X, a metric dX, and an optimal tree metric topt\nX ), can one \ufb01nd a nontrivial upper bound on (cid:107)dX \u2212 topt\nuopt\ndepending on properties of the metric dX?\n\nX (or an optimal ultrametric\nX (cid:107)(cid:96)\u221e(X\u00d7X))\n\nX (cid:107)(cid:96)\u221e(X\u00d7X) (resp. (cid:107)dX \u2212 uopt\n\nThe question of distortion bounds is treated from a different perspective in the discrete algorithms\nliterature. In this domain, tree embeddings are typically described with multiplicative distortion\nbounds (described in \u00a72) depending on the cardinality of the underlying metric space, along with\n(typically) pathological counterexamples showing that these bounds are tight [4, 10]. We remark\nimmediately that (1) multiplicative distortion is distinct from the additive distortion encountered\nin the NTP, and (2) these embeddings are rarely used in machine learning, where HC and MDS\nmethods are the main workhorses. Moreover, such multiplicative distortion bounds do not take\ntwo considerations into account: (1) the ubiquitousness of very large data sets means that a bound\ndependent on cardinality is not desirable, and (2) \u201cnice\" properties such as low intrinsic dimensionality\nor treeness of real-world datasets are not exploited in cardinality bounds.\nWe prove novel additive distortion bounds for two methods of tree embeddings: one into general\ntrees, and one into ultrametric trees. These additive distortion bounds take into account (1) whether\nthe data is treelike, and (2) whether the data has low doubling dimension, which is a measure of its\nintrinsic dimension. Thus we prove an answer to Question 1 above, namely, that the approximation\nerror made by an optimal tree metric (or optimal ultrametric) can be bounded nontrivially.\nRemark 1. The trivial upper bound is (cid:107)dX \u2212 topt\nX (cid:107)(cid:96)\u221e(X\u00d7X) \u2264 diam(X, dX ). To see this, observe\nthat any ultrametric is a tree, and that SLHC yields an ultrametric uX that is bounded above by dX.\n\nAn overview of our approach. A common measure of treeness is Gromov\u2019s \u03b4-hyperbolicity, which\nis a local condition on 4-point subsets of a metric space. Hyperbolicity has been shown to be a useful\nstatistic for evaluating the quality of trees in [7]. The starting point for our work is a method used\nby Gromov to embed metric spaces into trees, which we call Gromov\u2019s embedding [12]. A known\nresult, which we call Gromov\u2019s embedding theorem, is that if every 4-point subset of an n-point\nmetric space is \u03b4-hyperbolic, then the metric space embeds into a tree with (cid:96)\u221e distortion bounded\nabove by 2\u03b4 log2(2n). The proof proceeds by a linkage argument, i.e. by invoking the de\ufb01nition\nof hyperbolicity at different scales along chains of points. By virtue of the embedding theorem,\none can argue that hyperbolicity is a measure of the \u201ctreeness\" of a given metric space. It has been\nshown in [1, 2] that various real-world data sets, such as Internet latencies and biological, social, and\ncollaboration networks are inherently treelike, i.e. have low hyperbolicity. Thus, by Gromov\u2019s result,\nthese real-world data sets can be embedded into trees with additive distortion controlled by their\nrespective cardinalities. The cardinality bound might of course be undesirable, especially for very\nlarge data sets such as the Internet. However, it has been claimed without proof in [1] that Gromov\u2019s\nembedding can yield a 3-approximation to the NTP, independent of [3].\nWe note that the assumption of a metric input is not apparent in Gromov\u2019s embedding theorem.\nMoreover, the proof of the theorem does not utilize any metric property. This leads one to hope for\nbounds where the dependence on cardinality is replaced by a dependence on some metric notion.\nA natural candidate for such a metric notion is the doubling dimension of a space [15], which has\nalready found applications in learning [17] and algorithm design [15]. In this paper, we present novel\nupper bounds on the additive distortion of a Gromov embedding, depending only on the hyperbolicity\nand doubling dimension of the metric space.\nOur main tool is a stability theorem that we prove using a metric induced by a Voronoi partition. This\nresult is then combined with the results of Gromov\u2019s linkage argument. Both the stability theorem\nand Gromov\u2019s theorem rely on the embedding satisfying a particular linkage condition, which can\nbe described as follows: for any embedding f : (X, dX ) \u2192 (X, tX ), and any x, x(cid:48) \u2208 X, we have\ntX (x, x(cid:48)) = maxc mini \u03a8(xi, xi+1), where c = {xi}k\ni=0 is a chain of points joining x to x(cid:48) and \u03a8\n\n2\n\n\fis some function of dX. A dual notion is to \ufb02ip the order of the max, min operations. Interestingly,\nunder the correct objective function \u03a8, this leads to the well-studied notion of SLHC. By virtue of this\nduality, the arguments of both the stability theorem and the scaling theorem apply in the SLHC setting.\nWe introduce a new metric space statistic that we call ultrametricity (analogous to hyperbolicity), and\nare then able to obtain novel lower bounds, depending only on doubling dimension and ultrametricity,\nfor the distortion incurred by a metric space when embedding into an ultrametric tree via SLHC.\nWe remark that just by virtue of the duality between Gromov\u2019s embedding and the SLHC embedding,\nit is possible to obtain a distortion bound for SLHC depending on cardinality. We were unable to\n\ufb01nd such a bound in the existing HC literature, so it appears that even the knowledge of this duality,\nwhich bridges the domains of HC and MDS methods, is not prevalent in the community.\nThe paper is organized as follows. The main thrust of our work is explained in \u00a71. In \u00a72 we\ndevelop the context of our work by highlighting some of the surrounding literature. We provide\nall de\ufb01nitions and notation, including the Voronoi partition construction, in \u00a73. In \u00a74 we describe\nGromov\u2019s embedding and present Gromov\u2019s distortion bound in Theorem 3. Our contributions begin\nwith Theorem 4 in \u00a74 and include all the results that follow: namely the stability results in \u00a75, the\nimproved distortion bounds in \u00a76, and the proof of tightness in \u00a77.\nThe supplementary material contains (1) an appendix with proofs omitted from the body, (2) a\npractical demonstration in \u00a7A where we apply Gromov\u2019s embedding to a bitmap image of a tree\nand show that our upper bounds perform better than the bounds suggested by Gromov\u2019s embedding\ntheorem, and (3) Matlab .m \ufb01les containing demos of Gromov\u2019s embedding being applied to various\nimages of trees.\n\n2 Related Literature\n\nMDS is explained thoroughly in [18]. In metric MDS [18] one attempts to \ufb01nd an embedding of the\ndata X into a low dimensional Euclidean space given by a point cloud Y \u2282 Rd (where often d = 2\nor d = 3) such that the metric distortion (measured by the Frobenius norm of the difference of the\nGram matrices of X and Y ) is minimized. The most common non-metric variant of MDS is referred\nto as ordinal embedding, and has been studied in [14].\nA common problem with metric MDS is that when the intrinsic dimension of the data is higher than\nthe embedding dimension, the clustering in the original data may not be preserved [21]. One variant\nof MDS that preserves clusters is the tree preserving embedding [20], where the goal is to preserve\nthe single linkage (SL) dendrogram from the original data. This is especially important for certain\ntypes of biological data, for the following reasons: (1) due to speciation, many biological datasets are\ninherently \u201ctreelike\", and (2) the SL dendrogram is a 2-approximation to the optimal ultrametric tree\nembedding [16], so intuitively, preserving the SL dendrogram preserves the \u201ctreeness\" of the data.\nPreserving the treeness of a metric space is related to the notion of \ufb01nding an optimal embedding into\na tree, which ties back to the numerical taxonomy problem. The SL dendrogram is an embedding of\na metric space into an ultrametric tree, and can be used to \ufb01nd the optimal ultrametric tree [8].\nThe quality of an embedding is measured by computing its distortion, which has different de\ufb01nitions\nin different domain areas. Typically, a tree embedding is de\ufb01ned to be an injective map f : X \u2192 Y\nbetween metric spaces (X, dX ) and (Y, tY ), where the target space is a tree. We have de\ufb01ned the\nadditive distortion of a tree embedding in an (cid:96)\u221e setting above, but (cid:96)p notions, for p \u2208 [1,\u221e), can\nalso be de\ufb01ned. Past efforts to embed a metric into a tree with low additive distortion are described\nin [19, Chapter 7]. One can also de\ufb01ne a multiplicative distortion [4, 10], but this is studied in the\ndomain of discrete algorithms and is not our focus in the current work.\n\n3 Preliminaries on metric spaces, distances, and doubling dimension\n\nA \ufb01nite metric space (X, dX ) is a \ufb01nite set X together with a function dX : X \u00d7 X \u2192 R+\nsuch that: (1) dX (x, x(cid:48)) = 0 \u21d0\u21d2 x = x(cid:48), (2) dX (x, x(cid:48)) = dX (x(cid:48), x), and (3) dX (x, x(cid:48)) \u2264\ndX (x, x(cid:48)(cid:48)) + dX (x(cid:48)(cid:48), x(cid:48)) for any x, x(cid:48), x(cid:48)(cid:48) \u2208 X. A pointed metric space is a triple (X, dX , p), where\n(X, dX ) is a \ufb01nite metric space and p \u2208 X. All the spaces we consider are assumed to be \ufb01nite.\n\n3\n\n\fFor a metric space (X, dX ), the diameter is de\ufb01ned to be diam(X, dX ) := maxx,x(cid:48)\u2208X dX (x, x(cid:48)).\nThe hyperbolicity of (X, dX ) was de\ufb01ned by Gromov [12] as follows:\n\nhyp(X, dX ) :=\n\n\u03a8hyp\n\nX (x1, x2, x3, x4) : = 1\n2\n\nmax\n\n\u03a8hyp\n\nx1,x2,x3,x4\u2208X\n\nX (x1, x2, x3, x4), where\n\n(cid:16)\n\u2212 max(cid:0)dX (x1, x3) + dX (x2, x4), dX (x1, x4) + dX (x2, x3)(cid:1)(cid:17)\n\ndX (x1, x2) + dX (x3, x4)\n\n.\n\nA tree metric space (X, tX ) is a \ufb01nite metric space such that hyp(X, tX ) = 0 [19]. In our work, we\nstrengthen the preceding characterization of trees to the special class of ultrametric trees. Recall that\nan ultrametric space (X, uX ) is a metric space satisfying the strong triangle inequality:\n\nuX (x, x(cid:48)) \u2264 max(uX (x, x(cid:48)(cid:48)), uX (x(cid:48)(cid:48), x(cid:48))), \u2200x, x(cid:48), x(cid:48)(cid:48) \u2208 X.\n\nDe\ufb01nition 1. We de\ufb01ne the ultrametricity of a metric space (X, dX ) as:\nX (x1, x2, x3), where\n\nX (x1, x2, x3) := dX (x1, x3) \u2212 max(cid:0)dX (x1, x2), dX (x2, x3)(cid:1).\n\nult(X, dX ) := max\n\nx1,x2,x3\u2208X\n\n\u03a8ult\n\n\u03a8ult\n\nX\n\n|dX (x, x(cid:48)) \u2212 d(cid:48)\n\nX as follows:\nX (x, x(cid:48))|.\n\nWe introduce ultrametricity to quantify the deviation of a metric space from being ultrametric. Notice\nthat (X, uX ) is an ultrametric space if and only if ult(X, uX ) = 0. One can verify that an ultrametric\nspace is a tree metric space.\nWe will denote the cardinality of a set X by writing |X|. Given a set X and two metrics dX , d(cid:48)\nde\ufb01ned on X \u00d7 X, we denote the (cid:96)\u221e distance between dX and d(cid:48)\n\nX(cid:107)(cid:96)\u221e(X\u00d7X) := max\nx,x(cid:48)\u2208X\nX(cid:107)\u221e to mean (cid:107)dX\u2212d(cid:48)\n\n(cid:107)dX \u2212 d(cid:48)\nWe use the shorthand (cid:107)dX\u2212d(cid:48)\nX(cid:107)(cid:96)\u221e(X\u00d7X). We write \u2248 to mean \u201capproximately\nequal to.\" Given two functions f, g : N \u2192 R, we will write f (cid:16) g to mean asymptotic tightness, i.e.\nthat there exist constants c1, c2 such that c1|f (n)| \u2264 |g(n)| \u2264 c2|f (n)| for suf\ufb01ciently large n \u2208 N.\nInduced metrics from Voronoi partitions. A key ingredient of our stability result involves a\nVoronoi partition construction. Given a metric space (X, dX ) and a subset A \u2286 X, possibly with its\n{x1, . . . , xn}. For each 1 \u2264 i \u2264 n, we de\ufb01ne (cid:101)Vi := {x \u2208 X : dX (x, xi) \u2264 minj(cid:54)=i dX (x, xj)} .\nX on X \u00d7 X using a Voronoi partition. First write A =\nThen X =(cid:83)n\nown metric dA, we can de\ufb01ne a new metric dA\nThen X =(cid:70)n\n\ni=1(cid:101)Vi. Next we perform the following disjointi\ufb01cation trick:\nV1 := (cid:101)V1, V2 := (cid:101)V2 \\(cid:101)V1, . . . , Vn := (cid:101)Vn \\(cid:0) n\u22121(cid:91)\n(cid:101)Vi\n\ni=1 Vi, a disjoint union of Voronoi cells Vi.\n\nNext de\ufb01ne the nearest-neighbor map \u03b7 : X \u2192 A by \u03b7(x) = xi for each x \u2208 Vi. The map \u03b7\nsimply sends each x \u2208 X to its closest neighbor in A, up to a choice when there are multiple nearest\nneighbors. Then we can de\ufb01ne a new (pseudo)metric dA\n\nX : X \u00d7 X \u2192 R+ as follows:\n\n(cid:1).\n\ni=1\n\nX (x, x(cid:48)) := dA(\u03b7(x), \u03b7(x(cid:48))).\ndA\n\nX (x, x(cid:48)) = 0 if and only if x, x(cid:48) \u2208 Vi for some 1 \u2264 i \u2264 n. Symmetry also holds, as\n\nObserve that dA\ndoes the triangle inequality.\nA special case of this construction occurs when A is an \u03b5-net of X endowed with a restriction of\nthe metric dX. Given a \ufb01nite metric space (X, dX ), an \u03b5-net is a subset X \u03b5 \u2282 X such that: (1)\nfor any x \u2208 X, there exists s \u2208 X \u03b5 such that dX (x, s) < \u03b5, and (2) for any s, s(cid:48) \u2208 X \u03b5, we have\ndX (s, s(cid:48)) \u2265 \u03b5 [15]. For notational convenience, we write d\u03b5\nX . In this case, we obtain:\n\nX to refer to dX \u03b5\n\n(cid:107)dX \u2212 d\u03b5\n\nX(cid:107)(cid:96)\u221e(X\u00d7X) = max\nx,x(cid:48)\u2208X\n= max\n1\u2264i,j\u2264n\n= max\n1\u2264i,j\u2264n\n\u2264 max\n1\u2264i,j\u2264n\n\n(cid:12)(cid:12)dX (x, x(cid:48)) \u2212 d\u03b5\n\nX (x, x(cid:48))(cid:12)(cid:12)\n(cid:12)(cid:12)dX (x, x(cid:48)) \u2212 d\u03b5\nX (x, x(cid:48))(cid:12)(cid:12)\n(cid:12)(cid:12)dX (x, x(cid:48)) \u2212 dX (xi, xj)(cid:12)(cid:12)\n(cid:0)dX (x, xi) + dX (x(cid:48), xj)(cid:1) \u2264 2\u03b5.\n\nmax\n\nx\u2208Vi,x(cid:48)\u2208Vj\n\nmax\n\nx\u2208Vi,x(cid:48)\u2208Vj\n\nmax\n\nx\u2208Vi,x(cid:48)\u2208Vj\n\n4\n\n(1)\n\n\f(cid:26)\n\nn \u2208 N : X \u2282 n(cid:91)\n\nCovering numbers and doubling dimension. For a \ufb01nite metric space (X, dX ), the open ball of\nradius \u03b5 centered at x \u2208 X is denoted B(x, \u03b5). The \u03b5-covering number of (X, dX ) is de\ufb01ned as:\n\nNX (\u03b5) := min\n\nB(xi, \u03b5) for x1, . . . , xn \u2208 X\n\n.\n\ni=1\n\nNotice that the \u03b5-covering number of X is always bounded above by the cardinality of an \u03b5-net. A\nrelated quantity is the doubling dimension ddim(X, dX ) of a metric space (X, dX ), which is de\ufb01ned\nto be the minimal value \u03c1 such that any \u03b5-ball in X can be covered by at most 2\u03c1 \u03b5/2-balls [15]. The\ncovering number and doubling dimension of a metric space (X, dX ) are related as follows:\nLemma 2. Let (X, dX ) be a \ufb01nite metric space with doubling dimension bounded above by \u03c1 > 0.\n\nThen for all \u03b5 \u2208 (0, diam(X)], we have NX (\u03b5) \u2264(cid:0)8 diam(X)/\u03b5(cid:1)\u03c1\n\n.\n\n(cid:27)\n\n4 Duality between Gromov\u2019s embedding and SLHC\nGiven a metric space (X, dX ) and any two points x, x(cid:48) \u2208 X, we de\ufb01ne a chain from x to x(cid:48) over X\nas an ordered set of points in X starting at x and ending at x(cid:48):\n\nc = {x0, x1, x2, . . . , xn : x0 = x, xn = x(cid:48), xi \u2208 X for all 0 \u2264 i \u2264 n} .\n\nThe set of all chains from x to x(cid:48) over X will be denoted CX (x, x(cid:48)). The cost of a chain c =\n{x0 . . . , xn} over X is de\ufb01ned to be costX (c) := max0\u2264i sep(V, dV ),\n: diam(Ln, dLn ) < \u03b5 \u2264 sep(V, dV ),\n: \u03b5 \u2264 diam(Ln, dLn ).\n\nTo see this, note that in the \ufb01rst two cases, any \u03b5-ball centered at a point (v, l) automatically contains\nall of {v} \u00d7 Ln, so NXn(\u03b5) = NV (\u03b5). Speci\ufb01cally in the range diam(Ln, dLn) < \u03b5 \u2264 sep(V, dV ),\nwe need exactly one \u03b5-ball for each v \u2208 V to cover Xn. Finally in the third case, we need NLn (\u03b5)\n\u03b5-balls to cover {v} \u00d7 Ln for each v \u2208 V . This yields the stated estimate.\nBy the preceding claims, we now have the following for each n \u2208 N, \u03b5 \u2208 (0, Dn]:\n\n\uf8f1\uf8f2\uf8f34\u03b5 + 1\n\n4\u03b5 + 1\n4\u03b5 + 1\n\n2n log2(2NXn (\u03b5)) =\n\n\u03c1(n, \u03b5) \u2248 4\u03b5 + 1\n\n2n log2(2NV (\u03b5))\n2n log2(2|V |)\n2n log2(2|V |NLn (\u03b5))\nNotice that for suf\ufb01ciently large n, inf \u03b5>diam(Ln) \u03c1(n, \u03b5) = \u03c1(n, 1\n\u03c1(n, \u03b5) \u2264 \u03c1(n, 1\n\nn \u2264 En \u2264 Bn = min\n\u03b5\u2208(0,Dn]\n\n1\n\nn ). Then we have:\nn ) \u2248 C\nn ,\n\n: \u03b5 > sep(V ),\n: diam(Ln) < \u03b5 \u2264 sep(V ),\n: \u03b5 \u2264 diam(Ln).\n\nfor some constant C > 0. Here the \ufb01rst inequality follows from the proof of Claim 2, the second\nfrom Theorem 8, and the third from our observation above. It follows that En (cid:16) Bn (cid:16) 1\nRemark 12. Given the setup of the preceding proof, note that the Gromov-style bound behaves as:\n\nn \u2192 0.\n\nGn = \u03c1(n, 0) = 1\n\n2n log2(2|V |(n + 1)) \u2248 C(cid:48) log2(n+1)\n\nn\n\n,\n\nfor some constant C(cid:48) > 0. So Gn approaches 0 at a rate strictly slower than that of En and Bn.\n\n8 Discussion\n\nWe are motivated by a particular aspect of the numerical taxonomy problem, namely, the distortion\nincurred when passing from a metric to its optimal tree embedding. We describe and explore a\nduality between a tree embedding method proposed by Gromov and the well known SLHC method\nfor embedding a metric space into an ultrametric tree. Motivated by this duality, we propose a novel\nmetric space statistic that we call ultrametricity, and give a novel, tight bound on the distortion of the\nSLHC method depending on cardinality and ultrametricity. We improve this Gromov-style bound\nby replacing the dependence on cardinality by a dependence on doubling dimension, and produce a\nfamily of examples proving tightness of this dimension-based bound. By invoking duality again, we\nare able to improve Gromov\u2019s original bound on the distortion of his tree embedding method. More\nspeci\ufb01cally, we replace the dependence on cardinality\u2014a set-theoretic notion\u2014by a dependence on\ndoubling dimension, which is truly a metric notion.\nThrough Proposition 11, we are able to prove that our bound is not just asymptotically tight, but that\nit is strictly better than the corresponding Gromov-style bound. Indeed, Gromov\u2019s bound can perform\narbitrarily worse than our dimension-based bound. We construct an explicit example to verify this\nclaim in Appendix A, Remark 14, where we also provide a practical demonstration of our methods.\n\n8\n\n\fReferences\n[1] Ittai Abraham, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Venugopalan Ramasubra-\nmanian, and Kunal Talwar. Reconstructing approximate tree metrics. In Proceedings of the\n26th annual ACM symposium on Principles of distributed computing. ACM, 2007.\n\n[2] Muad Abu-Ata and Feodor F. Dragan. Metric tree-like structures in real-life networks: an\n\nempirical study. arXiv preprint arXiv:1402.3364, 2014.\n\n[3] Richa Agarwala, Vineet Bafna, Martin Farach, Mike Paterson, and Mikkel Thorup. On the\napproximability of numerical taxonomy (\ufb01tting distances by tree metrics). SIAM Journal on\nComputing, 28(3):1073\u20131085, 1998.\n\n[4] Yair Bartal. Probabilistic approximation of metric spaces and its algorithmic applications. In\n\nFoundations of Computer Science. IEEE, 1996.\n\n[5] Jean-Pierre Barth\u00e9lemy and Alain Gu\u00e9noche. Trees and proximity representations. 1991.\n\n[6] Gunnar Carlsson and Facundo M\u00e9moli. Characterization, stability and convergence of hierar-\n\nchical clustering methods. The Journal of Machine Learning Research, 2010.\n\n[7] John Chakerian and Susan Holmes. Computational tools for evaluating phylogenetic and\n\nhierarchical clustering trees. Journal of Computational and Graphical Statistics, 2012.\n\n[8] Victor Chepoi and Bernard Fichet. (cid:96)\u221e approximation via subdominants. Journal of mathemati-\n\ncal psychology, 44(4):600\u2013616, 2000.\n\n[9] Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, 2009.\n\n[10] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary\nmetrics by tree metrics. In Proceedings of the thirty-\ufb01fth annual ACM symposium on Theory of\ncomputing, pages 448\u2013455. ACM, 2003.\n\n[11] Martin Farach, Sampath Kannan, and Tandy Warnow. A robust model for \ufb01nding optimal\n\nevolutionary trees. Algorithmica, 13(1-2):155\u2013179, 1995.\n\n[12] Mikhael Gromov. Hyperbolic groups. Springer, 1987.\n\n[13] Piotr Indyk and Jiri Matousek. Low-distortion embeddings of \ufb01nite metric spaces. in in\n\nhandbook of discrete and computational geometry, pages 177\u2013196, 2004.\n\n[14] Matth\u00e4us Kleindessner and Ulrike von Luxburg. Uniqueness of ordinal embedding. In COLT,\n\npages 40\u201367, 2014.\n\n[15] Robert Krauthgamer and James R Lee. Navigating nets: simple algorithms for proximity search.\nIn Proceedings of the \ufb01fteenth annual ACM-SIAM symposium on Discrete algorithms, pages\n798\u2013807. Society for Industrial and Applied Mathematics, 2004.\n\n[16] Mirko Krivanek. The complexity of ultrametric partitions on graphs. Information processing\n\nletters, 27(5):265\u2013270, 1988.\n\n[17] Yi Li and Philip M. Long. Learnability and the doubling dimension. In Advances in Neural\n\nInformation Processing Systems, pages 889\u2013896, 2006.\n\n[18] Kantilal Varichand Mardia, John T. Kent, and John M. Bibby. Multivariate analysis. 1980.\n\n[19] Charles Semple and Mike A. Steel. Phylogenetics, volume 24. Oxford University Press on\n\nDemand, 2003.\n\n[20] Albert D. Shieh, Tatsunori B. Hashimoto, and Edoardo M. Airoldi. Tree preserving embed-\nding. Proceedings of the National Academy of Sciences of the United States of America,\n108(41):16916\u201316921, 2011.\n\n[21] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine\n\nLearning Research, 9(2579-2605):85, 2008.\n\n9\n\n\f", "award": [], "sourceid": 1433, "authors": [{"given_name": "Samir", "family_name": "Chowdhury", "institution": "The Ohio State University"}, {"given_name": "Facundo", "family_name": "M\u00e9moli", "institution": "The Ohio State University"}, {"given_name": "Zane", "family_name": "Smith", "institution": "Ohio State University"}]}