{"title": "Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models", "book": "Advances in Neural Information Processing Systems", "page_first": 2023, "page_last": 2033, "abstract": "Hyperbolic embeddings achieve excellent performance when embedding hierarchical data structures like synonym or type hierarchies, but they can be limited by numerical error when ordinary floating-point numbers are used to represent points in hyperbolic space. Standard models such as the Poincar{\\'e} disk and the Lorentz model have unbounded numerical error as points get far from the origin.\nTo address this, we propose a new model which uses an integer-based tiling to represent \\emph{any} point in hyperbolic space with provably bounded numerical error. This allows us to learn high-precision embeddings without using BigFloats, and enables us to store the resulting embeddings with fewer bits. We evaluate our tiling-based model empirically, and show that it can both compress hyperbolic embeddings (down to $2\\%$ of a Poincar{\\'e} embedding on WordNet Nouns) and learn more accurate embeddings on real-world datasets.", "full_text": "Numerically Accurate Hyperbolic Embeddings Using\n\nTiling-Based Models\n\nTao Yu\n\nChristopher De Sa\n\nDepartment of Computer Science\n\nDepartment of Computer Science\n\nCornell University\nIthaca, NY, USA\n\ntyu@cs.cornell.edu\n\nCornell University\nIthaca, NY, USA\n\ncdesa@cs.cornell.edu\n\nAbstract\n\nHyperbolic embeddings achieve excellent performance when embedding hierar-\nchical data structures like synonym or type hierarchies, but they can be limited by\nnumerical error when ordinary \ufb02oating-point numbers are used to represent points\nin hyperbolic space. Standard models such as the Poincar\u00e9 disk and the Lorentz\nmodel have unbounded numerical error as points get far from the origin. To address\nthis, we propose a new model which uses an integer-based tiling to represent any\npoint in hyperbolic space with provably bounded numerical error. This allows\nus to learn high-precision embeddings without using BigFloats, and enables us\nto store the resulting embeddings with fewer bits. We evaluate our tiling-based\nmodel empirically, and show that it can both compress hyperbolic embeddings\n(down to 2% of a Poincar\u00e9 embedding on WordNet Nouns) and learn more accurate\nembeddings on real-world datasets.\n\n1\n\nIntroduction\n\nIn the real world, valuable knowledge is encoded in datasets with hierarchical structure, such as\nthe IBM Information Management System to describe the structure of documents, the large lexical\ndatabase WordNet [14], various networks [8] and natural language sentences [24, 5]. It is challenging\nbut necessary to embed these structured data for the use of modern machine learning methods. Recent\nwork [11, 26, 27, 7] proposed using hyperbolic spaces to embed these structures and has achieved\nexciting results. A hyperbolic space is a manifold with constant negative curvature and endowed with\nvarious geometric properties, in particular, Bowditch [4] shows that any \ufb01nite subset of an hyperbolic\nspace looks like a \ufb01nite tree according to the de\ufb01nition in [18]. Therefore, the hyperbolic space is\nwell suited to model hierarchical structures.\nA major dif\ufb01culty that arises when learning with hyperbolic embeddings is the numerical instability,\nsometimes informally called \u201cthe NaN problem\u201d. Models of hyperbolic space commonly used to\nlearn embeddings, such as the Poincar\u00e9 ball model [26] and the Lorentz hyperboloid model [27],\nsuffer from signi\ufb01cant numerical error caused by \ufb02oating-point computation and ampli\ufb01ed by the\nill-conditioned Riemannian metrics involved in their construction. To address this when embedding a\ngraph, one technical solution exploited by Sarkar [32] is to carefully scale down all the edge lengths\nby a factor before embedding, then recover the original distances afterwards by dividing by the factor.\nHowever, this scaling increases the distortion of the embedding, and the distortion gets worse as the\nscale factor increases [30]. Sala et al. [30] suggested that, to produce a good embedding in hyperbolic\nspace, one can either increase the number of bits used for the \ufb02oating-point numbers or increase the\ndimension of the space.\nWhile these methods can greatly improve the accuracy of an embedding empirically, they come with a\ncomputational cost, and the \ufb02oating-point error is still unbounded everywhere. Despite these previous\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fadopted methods, as points move far away from the origin, the error caused by using \ufb02oating-point\nnumbers to represent them will be unbounded. Even if we try to compensate for this effect by using\nBigFloats (non-standard \ufb02oating-point numbers that use a large quantity of bits), no matter how many\nbits we use, there will always be numerical issues for points suf\ufb01ciently far away from the origin. No\namount of BigFloat precision is suf\ufb01cient to accurately represent points everywhere in hyperbolic\nspace.\nTo address this problem, it is desirable to have a way of representing points in hyperbolic space\nthat: (1) can represent any point in the space with small \ufb01xed bounded error; (2) supports standard\ngeometric computations, such as hyperbolic distances, with small numerical error; and (3) avoids\npotentially expensive BigFloat arithmetic.\nOne solution is to avoid \ufb02oating-point arithmetic and do as much computation as possible with\ninteger arithmetic, which introduces no error. To gain intuition, imagine solving the same problem\nin the more familiar setting of the Euclidean plane. A simple way to construct a constant-error\nrepresentation is by using the integer-lattice square tiling (or tessellation) [9] of the Euclidean plane.\nWith this, we can represent any point in the plane by (1) storing the coordinates of the square where\nthe point is located as integers and (2) storing the coordinates of the point within that square as\n\ufb02oating point numbers. In this way, the worst-case representation error (De\ufb01nition 1) will only be\nproportional to the machine epsilon of the \ufb02oating-point format\u2014but not the distance of the point\nfrom the origin.\nWe propose to do the same thing in the hyperbolic space: we call this a tiling-based model. Given\nsome tiling of hyperbolic space, we can represent a point in hyperbolic space as a pair of (1) the tile\nit is on and (2) its position within the tile represented with \ufb02oating point coordinates. In this paper,\nwe show how we can do this, and we make the following contributions:\n\n\u2022 We identify tiling-based models for both the hyperbolic plane and for higher-dimensional\nhyperbolic space in various dimensions. We prove that the representation error (De\ufb01nition 1)\nis bounded by a \ufb01xed value, further, the error of computing distances and gradients are\nindependent of how far the points are from the origin.\n\u2022 We show how to compute ef\ufb01ciently over tiling-based models, and we offer algorithms to\n\ncompress and learn embeddings for real-world datasets.\n\nThe reminder of this paper is organized as follows. In Section 2, we discuss related work regarding\nhyperbolic embeddings on various models. In Section 3, we detail the standard models of hyperbolic\nspace which we use in our theory and experiments. In Section 4, we introduce the L-tiling model and\nshow how it can be used to accurately represent any point in the hyperbolic plane (2-dimensional\nhyperbolic space). In Section 5, we show how to use the L-tiling model to learn embeddings with\ntraditional manifold optimization algorithms. In Section 6, we develop the H-tiling model, which\ngeneralizes our methods to higher dimensional spaces. Finally, in Section 7, evaluate our methods on\ntwo different tasks: (1) compressing a learned embedding and (2) learning embeddings on multiple\nreal-world datasets.\n\n2 Related Work\n\nHyperbolic space [1] is a simply connected Riemannian manifold with constant negative (sectional)\ncurvature, which is analogous to a high dimensional sphere with constant positive curvature. The\nnegative-curvature metric of the hyperbolic space results in very different geometric properties, which\nmakes it widely employed in many settings. One noticeable property is the volume of the ball in\nhyperbolic space: it increases exponentially with respect to the radius (for large radius), rather than\npolynomially as in the Euclidean case [6]. For comparison to hierarchical data, consider a tree with\nbranching factor b, where the number of leaf nodes increases exponentially as the tree depth increases\n[27], this property makes hyperbolic space particularly well suited to represent hierarchies.\nNickel and Kiela [26] introduced the Poincar\u00e9 embedding for learning hierarchical representations\nof symbolic data, which captured the attention of the machine learning community. The Poincar\u00e9\nball model of hyperbolic space was used to embed taxonomies and graphs with state-of-the-art\nresults in link prediction and lexical entailment. Similarly, it was also proposed in [7] to learn\nneural embeddings of graphs in hyperbolic space, where the performances on downstream tasks were\nimproved signi\ufb01cantly. The Poincar\u00e9 ball model was used in several subsequent works, including\n\n2\n\n\funsupervised learning of word and sentence embeddings [35, 13], directed acyclic graph embeddings\nand hierarchical relations learning using a family of nested geodesically convex cones [16]. Fur-\nther, Ganea et al. [15] proposed hyperbolic neural networks to embed sequential data and perform\nclassi\ufb01cation based on the Poincar\u00e9 ball model.\nIn a later work [27], the Poincar\u00e9 model and the Lorentz model of hyperbolic space were compared\nto learn the same embeddings, and the Lorentz model was observed to be substantially more ef\ufb01cient\nthan the Poincar\u00e9 model to learn high-quality embeddings of large taxonomies, especially in low\ndimensions. Similarly, Gulcehre et al. [20] built the new hyperbolic attention networks on top of the\nLorentz model rather than the Poincar\u00e9 model. Further along this direction, Gu et al. [19] explored a\nproduct manifold combining multiple copies of different model spaces to get better performance on a\nrange of datasets and reconstruction tasks. This suggests that the numerical model used for learning\nembeddings can have signi\ufb01cant impact on its performance. Sala et al. [30] analyzed the tradeoffs\nbetween precision and dimensionality of hyperbolic embeddings to show this is a fundamental\nproblem when using \ufb02oat arithmetic. More broadly, different models have been used in different\ntasks like hierarchies embedding [26], text embedding [35, 13] and question answering system [34].\nHowever, all these models can be limited by numerical precision issues.\n\n3 Models of Hyperbolic Space\n\nTypically, people work with hyperbolic space by using a model, a representation of hyperbolic space\nwithin Euclidean space. There exists multiple important models for hyperbolic space, most notably\nthe Poincar\u00e9 ball model, the Lorentz hyperboloid model, and the Poincar\u00e9 upper-half space model [1],\nwhich will be described in this section. These all model the same geometry in the sense that any two\nof them can be related by a transformation that preserves all the geometrical properties of the space,\nincluding distances and gradient [6]. Generally, one can choose whichever model is best suited for a\ngiven task [27].\nPoincar\u00e9 ball model. The Poincar\u00e9 ball model is the Riemannian manifold (Bn, gp), where\nBn = {x \u2208 Rn : (cid:107)x(cid:107) < 1} is the open unit ball. The metric and distance on Bn are de\ufb01ned as\n\n(cid:18)\n\n(cid:19)\n\n(cid:107)x \u2212 y(cid:107)2\n\n(1 \u2212 (cid:107)x(cid:107)2)(1 \u2212 (cid:107)y(cid:107)2)\n\n,\n\ngp(x) =\n\nge,\n\ndp(x, y) = arcosh\n\n1 + 2\n\nwhere ge is the Euclidean metric, due to its conformality (angles measured at a point are the same\nas they are in the actual hyperbolic space), its convenient parameterization, and clear visualization\nresults, the Poincar\u00e9 ball model is widely used in many applications. However, it can be seen from\nthis equation that the distance within the Poincar\u00e9 ball model changes rapidly when the points are\nclose to the boundary (i.e. (cid:107)x(cid:107) \u2248 1), and hence it is very poorly conditioned.\nLorentz hyperboloid model. The Lorentz model is arguably the most natural model algebraically\nfor hyperbolic space. It is de\ufb01ned in terms of a nonstandard scalar product called the Lorentzian\nscalar product. For two-dimensional hyperbolic space, it is de\ufb01ned as\n\n(cid:19)2\n\n(cid:18)\n\n2\n\n1 \u2212 (cid:107)x(cid:107)2\n\n(cid:104)x, y(cid:105)L = xT gly, where gl =\n\n(cid:34)\u22121\n\n0\n0\n\n(cid:35)\n\n.\n\n0\n0\n1\n\n0\n1\n0\n\nThe Lorentz model of 2-dimensional hyperbolic space is then de\ufb01ned as the Riemannian manifold\n(L2, gl), where L2 and associated distance function are given as\n\nL2 = {x \u2208 R3 : (cid:104)x, x(cid:105)L = \u22121, x0 > 0},\n\ndl(x, y) = arcosh(\u2212(cid:104)x, y(cid:105)L).\n\nThis model generalizes easily to higher dimensional spaces by increasing the number of 1s on the\ndiagonal of the matrix gl. Points in the Lorentz model lie on the upper sheet of a two-sheeted\nn-dimensional hyperbola. Unlike the Poincar\u00e9 disk model, which is con\ufb01ned in the Euclidean unit\nball, the Lorentz model is unbounded. However, like other models, it can experience severe numerical\nerror for points far away in hyperbolic distance from the origin as shown in Theorem 1.\nDe\ufb01nition 1. [Representation error] We are concerned with representing points in hyperbolic\nspace Hn using \ufb02oating-points \ufb02. De\ufb01ne the representation error of a particular point x \u2208 Hn as\n\u03b4\ufb02(x) = dHn (x, \ufb02(x)), and the worst case representation error of \ufb02oating-points representation as a\nfunction of the distance-to-origin d, which is the maximum representation error of any point with a\ndistance-to-origin at most d,\n\n\u03b4d\n\ufb02 =\n\nmax\n\nx\u2208Hn, dHn (x,O)\u2264d\n\n\u03b4\ufb02(x).\n\n3\n\n\fTheorem 1. The worst-case representation error (De\ufb01nition 1) in the Lorentz model using \ufb02oating-\nl = arcosh(1 + \u0001m(2 cosh2(d) \u2212 1)), where d\npoint arithmetic (with machine epsilon \u0001m) is \u03b4d\nm exp(\u22122d)) if\nis the hyperbolic distance to origin. This becomes \u03b4d\nd = O(\u2212 log \u0001m).\nPoincar\u00e9 half-space model. The Poincar\u00e9 upper half-space model of the hyperbolic space is the\nmanifold (U n, gu), where U n = {x \u2208 Rn : xn > 0} is the upper half space of the n-dimensional\nEuclidean space. The metric and corresponding distance function is\n\nl = 2d + log(\u0001m) + o(\u0001\u22121\n\ngu(x) =\n\nge\nx2\nn\n\n,\n\ndu(x, y) = arcosh\n\n1 +\n\n(cid:18)\n\n(cid:19)\n\n(cid:107)x \u2212 y(cid:107)2\n2xnyn\n\nHere ge is the Euclidean metric. The half-space model is also unbounded and conformal, and has\na particularly nice interpretation in two dimensions as a mapping on the complex plane. Note that\nalthough it is unbounded, this model still has an \u201cedge\u201d where xn = 0 and it can exhibit numerical\nissues similar to the Poincar\u00e9 ball as xn approaches 0.\n\n4 A Tiling-Based Model for Hyperbolic Plane\n\nAs we saw in the previous section, the standard models of hyperbolic space exhibit unbounded\nnumerical error as the hyperbolic distance from the origin increases. In this section, we will describe\na tiling-based model that avoids this problem. Our model is constructed on top of the Lorentz model\nfor the two-dimensional hyperbolic plane H2.\nIn hyperbolic geometry, a uniform tiling [9, 12, 33] is an edge-to-edge \ufb01lling of the hyperbolic plane\nwhich has regular congruent polygons as faces and is vertex-transitive (there is an isometry mapping\nany vertex onto any other) [28]. Any tiling is associated with a discrete group G of orientation-\npreserving isometries of H2 that preserve the tiling [38, 22]; discrete subgroups of isometries of H2\n(like G) are called Fuchsian groups [21, 2, 37]. Importantly, not only does the tiling determine G,\nbut G also determines the shape of the tiling. One way to see this is to consider the images of a\nsingle point in H2 under the group action G (called an orbit of the action). Then the Voronoi diagram\nassociated with the orbit (which partitions each point in H2 into tiles based on which point in the\norbit it is closest to) will be a regular tiling of H2. This equivalence between tilings and groups means\nthat we can reason about tilings by reasoning about Fuchsian groups.\nIn the 2-dimensional Lorentz model, isometries can be represented as matrices operating on R3 that\npreserve the Lorentzian scalar product. That is, a matrix A \u2208 R3\u00d73 is an isometry if AT glA = gl. If\nwe have some discrete group of isometries G, and we choose the tile which contains the origin to be\nthe fundamental domain [37, 36] F , then we can start to de\ufb01ne a tiling-based model on top of the\nLorentz model of the hyperbolic plane.\nL-tiling model. Our \ufb01rst insight is to represent points in the hyperbolic plane as a pair consisting of\nan element of the group and an element of the fundamental domain. The point represented by this\npair is the result of the group element applied to the fundamental domain element. For example, the\nordered pair (g, x) \u2208 G \u00d7 F would represent the point gx. The L-tiling model of the hyperbolic\nplane is de\ufb01ned as the Riemannian manifold (T n\nT n\nl = {(g, x) \u2208 G \u00d7 F : (cid:104)x, x(cid:105)L = \u22121},\n\ndlt((gx, x), (gy, y)) = arcosh(cid:0)\u2212xT gT\n\nx gltgyy(cid:1) .\n\nl , glt), where glt = gl and\n\nOf course, this is useless unless we have a group G that we can store and compute with easily. Our\nsecond insight is to construct a Fuchsian group that can be represented with integers so that group\noperations can be computed exactly and ef\ufb01ciently. The naive way to do this is to try the subgroup of\norientation-preserving isometries in R3\u00d73 that have all-integer coordinates: unfortunately, this group\n(called the modular group) results in a tiling with unbounded fundamental domain, which makes\nit impossible to bound the representation error, so it is not suitable for our purpose. Instead, we\nconstructed a special Fuchsian group to get a particularly useful L-tiling model of hyperbolic plane.\nDe\ufb01nition 2. Let ga and gb \u2208 Z3\u00d73 and L \u2208 R3\u00d73 be de\ufb01ned as\n\n(cid:35)\n\n(cid:34)2 1\n\n0\n0 0 \u22121\n0\n3 2\n\n(cid:34) 2 \u22121\n\n(cid:35)\n\n, gb =\n\n0\n\u22123\n\n0\n0 \u22121\n0\n2\n\nga =\n\n\uf8ee\uf8f0\u221a\n\n3\n0\n0\n\n\uf8f9\uf8fb .\n\n0\n0\n1\n\n0\n1\n0\n\n, and L =\n\nDe\ufb01ne G to be the fuchsian group generated by L \u00b7 ga \u00b7 L\u22121 and L \u00b7 gb \u00b7 L\u22121. It is straightforward\nto verify that (L \u00b7 ga \u00b7 L\u22121)T gl(L \u00b7 ga \u00b7 L\u22121) = (L \u00b7 gb \u00b7 L\u22121)T gl(L \u00b7 gb \u00b7 L\u22121) = gl. Note that\n\n4\n\n\fAlgorithm 1 Map Lorentz model to L-tiling model\nRequire: x \u2208 L2\ninitialize R \u2190 I\nwhile x /\u2208 F do\n\nif x2 \u2264 \u2212|x3| then S \u2190 g\u22121\nelse if x2 \u2265 |x3| then S \u2190 g\u22121\nelse if x3 < \u2212|x2|| then S \u2190 gb\nelse if x3 > |x2| then S \u2190 ga\n(R, x) \u2190 (R \u00b7 S, L \u00b7 S\u22121 \u00b7 L\u22121 \u00b7 x)\n\na\n\nb\n\nx1 =(cid:112)x2\n\n2 + x2\n\n3 + 1\n\n(cid:46) renormalize x\n\nend while\n\noutput (R, x)\n\nFigure 1: The regular quadrilateral\ntiling of hyperbolic space produced\nby the group G on the Poincar\u00e9 disk.\n\ng6\na = g6\n\nb = (gagb)3 = I, and so this group has presentation\n\nG = L \u00b7 (cid:104)ga, gb|g6\n\na = g6\n\nb = (gagb)3 = 1(cid:105) \u00b7 L\u22121.\n\n3, 2x2\n\n3 \u2212 x2\n\n2 \u2212 x2\n\nImportantly, any element of G can be represented in the form g = LZL\u22121 where Z \u2208 Z3\u00d73 is\nan all-integers matrix. For this reason, we can store elements of G and take group products and\ninverses using only integer arithmetic. This property makes G of particular interest for use with an\nL-tiling model. But before we can construct an L-tiling model for this group, we need to choose an\nappropriate fundamental domain.\nTheorem 2. F = {(x1, x2, x3) \u2208 L2| max(2x2\n2) < 1} is a fundamental domain of\nG. Any point in L2 can be mapped by G to one unique point in F or to a point on its boundary.\nFigure 1 illustrates the tiling generated by group G and F centered at the origin in the Poincar\u00e9 disk\nmodel. Now that we have a group and a fundamental domain, we can start computing with our new\nL-tiling model. The \ufb01rst step is to build a relationship between standard hyperbolic models and the\nL-tiling model, i.e., convert points into the L-tiling model from other models: to this end, we offer a\n\u201cnormalization\u201d procedure (Algorithm 1), which transforms the Lorentz model to the L-tiling model.\nThe convergence and complexity of this algorithm are characterized in Theorem 3.\nTheorem 3. For any point in the Lorentz model, Algorithm 1 converges and stops within 1 + 7d\nsteps, where d = d(x, O) denotes the distance from x to the origin.\nRepresenting points. For a point (g, x) in the L-tiling model, where g \u2208 G, x \u2208 F , we represent\nthis point with (g, \ufb02(x)). Here g is exact because it is represented by the related integer matrix,\nwhile \ufb02 denotes \ufb02oat arithmetic with error bounded by some machine epsilon \u0001m. This \ufb02oating point\narithmetic introduces some representation error, which we can bound as follows:\n\ndlt((g, x), (g, \ufb02(x))) = arcosh(\u2212xT gT gltg\ufb02(x)) = arcosh(\u2212xT glt\ufb02(x))\n\nSince x \u2208 F , which is bounded as shown in Theorem 2, this approximation error can also be bounded\n(Theorem 4). In comparison, for the Lorentz model, the worst case error (Theorem 1) is unbounded.\nTheorem 4. The representation error (De\ufb01nition 1) in L-tiling model is bounded as \u03b4d\n5\u0001m +\n15\u0001m/4 + o(\u0001m), where \u0001m is the machine error.\nBy convention, for (g, x) in the L-tiling model, where g \u2208 G, x \u2208 F , \ufb01rstly we will usually denote\ng using its related integer matrix \u02c6g = L\u22121gL; Secondly for the point x \u2208 F , even though x is part of\nthe Lorentz model and lies in 3-dimensional space, in fact only two coordinates suf\ufb01ce to determine\n3, x2, x3)\nwhich maps R2 to the hyperboloid model (this is sometimes called the Gans model [17]). In this\nway, we can represent (g, x) \u2208 T 2\nlt as (\u02c6g, h\u22121(x)). We can then store the integer matrix and \ufb02oating-\npoint coordinates h\u22121(x) \u2208 R2. In future sections, we assume we will use this integer matrix and\ntwo-coordinate representation rather than (g, x) unless otherwise speci\ufb01ed.\n\nits position. For simplicity, we de\ufb01ne a biejective function h(x2, x3) = ((cid:112)1 + x2\n\nlt \u2264 \u221a\n\n2 + x2\n\n5\n\n\f5 Learning in the L-tiling Model\n\nIn this section, we provide an ef\ufb01cient and pre-\ncise way to compute distances and gradients\naccordingly in the L-tiling model, with which\nwe can construct learning algorithms to train\nand derive embeddings. We also present error\nbounds for these computations, which avoid the\n\u201cNan\u201d problem.\nDistance and Gradient.\nFor two points\n(U, u), (V, v) in the L-tiling model, the formula\nto compute distance is\nd((U, u),(V, v)) = arcosh(h(u)T L\u2212T QL\u22121h(v))\nwhere Q = \u2212U T LT gltLV can be computed\nexactly with integer arithmetic. A potential dif-\n\ufb01culty here is that the entries in Q can be very\nlarge (possibly even larger than can be repre-\nsented in \ufb02oating-point). To solve this, observe\nthat Q11 has the largest absolute value in the ma-\ntrix (Lemma 2). So we de\ufb01ne and compute \u02c6Q =\nQ/Q11, which is guaranteed to not over\ufb02ow the\n\ufb02oating-point format, since all the entries of \u02c6Q\nare in [\u22121, 1]. Let dc = h(u)T L\u2212T \u02c6QL\u22121h(v),\nthis reduces our distance to\n\nAlgorithm 2 RSGD in the L-tiling model\nRequire: Objective function f, fuchsian group G\nwith fundamental domain F , exponential map\nexp\u03b2t (v) = cosh ((cid:107)v(cid:107)L)\u03b2t+sinh ((cid:107)v(cid:107)L) v(cid:107)v(cid:107)L\n,\nRequire: (\u03b2t, Ut) \u2208 F \u00d7 G, Epochs T , and learn-\n\nwhere (cid:107)v(cid:107)L =(cid:112)(cid:104)v, v(cid:105)L.\n\ning rate \u03b7\nfor t = 0 to T \u2212 1 do\n\n\u03b2t\n\n\u2207\u03b2tf (LUtL\u22121\u03b2t) (cid:46) Riemannian\n(cid:46) Projection\n\nlt \u21d0 g\u22121\ngrad f \u21d0 lt + (cid:104)\u03b2t, lt(cid:105)L\u03b2t\n\u03b2t+1 \u21d0 exp\u03b2t (\u2212\u03b7 gradf ) (cid:46) Update\nif \u03b2t+1 /\u2208 F then\nW \u21d0 arg min\nW\u2208G\nUt+1 \u21d0 Ut \u00b7 W (cid:46) Normalize if \u03b2t+1 /\u2208 F\n\u03b2t+1 \u21d0 LW \u22121L\u22121\u03b2t+1\nUt+1 \u21d0 Ut\n\nd(LW \u22121L\u22121\u03b2t+1, O)\n\nelse\n\nend if\nend for\n\noutput (\u03b2t+1, Ut+1)\n\n(cid:18)\n\n(cid:113)\n\nc \u2212 Q\u22122\nd2\n\n11\n\n(cid:19)\n\nd((U, u), (V, v)) = arcosh(Q11 \u00b7 dc) = log(Q11) + log\n\ndc +\n\nNote that (assuming that we can compute log(Q11) without over\ufb02ow) this expression can be computed\nin \ufb02oating-point without any over\ufb02ow, since all the numbers involved are well within range. The\ncorresponding formula for the gradient can also be derived as\n\n\u2207ud((U, u), (V, v)) =\n\n\u2207h(u)T L\u2212T \u02c6QL\u22121h(v)\n\n, where \u2207h(u) =\n\n(cid:34)\n\nu(cid:112)1 + (cid:107)u(cid:107)2\n\n(cid:35)\n\n, I\n\n.\n\n(cid:113)\n\nc \u2212 Q\u22122\nd2\n\n11\n\nAgain, this avoids any possibility of over\ufb02ow. We provide the error of computing distance (Theorem 6)\nand gradient (Theorem 7) in L-tiling model together with that in Lorentz model in Appendix. By\ncomputing with integer arithmetic, the error will be independent of how far the points are from the\norigin, which guarantees that it avoids the \u201cNaN\u201d problem. Since we can compute distances and\nderivatives, we can use all the standard gradient-based optimization algorithms. In Algorithm 2 we\npresent the most powerful one, RSGD, adapted for use with the L-tiling model.\n\n6 Extension to Higher Dimensional Space\n\nExtending the L-tiling model to higher dimension seems simple: just \ufb01nd a cocompact (to ensure a\nbounded fundamental domain) discrete subgroup of the higher-dimensional space\u2019s isometry group.\nSuch a group would induce a honeycomb, a higher-dimensional analog of a regular tiling of the\nhyperbolic plane. Unfortunately, a classic result by Coxeter [10] says this is impossible in general:\nthere are no such regular honeycombs in six or more dimensions.\nIn order to derive a high dimensional tiling-based model which may be necessary for complicated\ndatasets, we consider two possibilities.\n\u2022 Take the Cartesian product of multiple copies of the L-tiling model in the hyperbolic plane. The\nuse of multiple copies of models in the hyperbolic plane was previously proposed in Gu et al.\n[19].\n\n\u2022 Construct honeycombs and tilings from a set of isometries that is not a group.\nPractically we can embed data into products of H2s as we do in Section 7, however, the \ufb01rst\npossibility (tilings over H2 \u00d7 H2 \u00d7 \u00b7\u00b7\u00b7 H2) is something fundamentally different from tiling a single\n\n6\n\n\fh , ght), where\nT n\nh = {(j, k, x) \u2208 Z \u00d7 (Zn\u22121 \u00d7 {0}) \u00d7 S},\nh is then given as\n\nThe associated distance function on T n\n\n(cid:18)\n\nght(j, k, x) =\n\n(2jxn)2\n(cid:107)2j1 z1 \u2212 2j2z2 + 2j1 k1 \u2212 2j2 k2)(cid:107)2\n\nge\n\n(cid:19)\n\n.\n\n(0,1)\n\nDatasets\nBio-yeast[29]\nWordNet[14]\n(cid:30) Nouns\n(cid:30) Verbs\n(cid:30) Mammals\nGr-QC[23]\n\nNodes\n1458\n74374\n82115\n13542\n1181\n4158\n\nEdges\n1948\n75834\n769130\n35079\n6541\n13422\n\nFigure 2: (Left) The in\ufb01nite square tiling of hyperbolic space on the half-plane model; (Right) Datasets.\n\nhigh dimensional hyperbolic space (tilings over Hn), which we aims to do in this section. Fortunately\nfor the second possibility, in half-space model, we \ufb01nd that horizontal translation and homotheties\nare hyperbolic isometries, which can produce the (in\ufb01nite) square tiling illustrated in Figure 2 [1, 6].\nIt consists of the image of the unit square S, with vertical and horizontal sides and whose lower left\ncorner is at (0, ..., 0, 1), under the maps\n\np \u2192 2j(p + k), (j, k) \u2208 Z \u00d7 (Zn\u22121 \u00d7 {0}).\n\nHere each square is isometric to every other square, and the unit square S takes on the role of the\nfundamental domain in Theorem 2. With these maps, we can de\ufb01ne a tiling-based model on top of\nthe half-space model as follows.\nH-tiling model. The H-tiling model of the hyperbolic space is de\ufb01ned as the Riemannian manifold\n(T n\n\nd((j1, k1, x), (j2, k2, y)) = arcosh\n\n1 +\n\n2j1+j2+1z1nz2n\n\nSimilarly, we derive the representation error for this model, which is bounded by a constant depending\non the machine epsilon as shown in Theorem 5.\nTheorem 5. The representation error (De\ufb01nition 1) in H-tiling model is bounded as \u03b4d\n\n(cid:112)(n + 3)\u0001m/2 + (n + 3)\u0001m/4 + o(\u0001m), where \u0001m is the machine error.\n\nht =\n\nWe can compute distances and gradients in a numerically accurate way, and run RSGD algorithm\non this model for optimization, just as we could in the L-tiling model. For lack of space, we defer\nthat discussion and more learning details of Sections 5 and 6 to Appendix A. Also note that we are\nnot tied to the half-space model here: while the half-space model gives a convenient way to describe\nthe set of transformations we are using, we could use the same transformations with any underlying\nmodel we choose by adding an appropriate conversion.\n\n7 Experiments\n\nCompressing embeddings. We consider storing 2-dimensional embeddings using the L-tiling\nmodel for compression: storage using few bits. While storing the integer matrices exactly is\nconvenient for computation, it does tend to take up a lot of extra memory (especially when BigInts\nare needed to store the integer values in the matrix). This motivates us to look for alternative storage\nmethods. To store the matrix g, we prorpose and evaluate the following methods:\n\nthe whole matrix (Lemma 1 in Appendix D).\n\n\u2022 Matrix: store all 9 integers in the matrix g as Int or BigInt.\n\u2022 Entries: store just g21, g31 as Int or BigInt, which we can show is suf\ufb01cient to reconstruct\n\u2022 Order: store the generator order with respect to ga, gb as a string.\n\u2022 VBW: store the generator order with respect to ga, gb using a variable bit-width encoding.\na and g2\nb , 010 to represent\nWe use binary code 10 to represent g1\nb , and 000 to represent the\na and g4\na and g3\ng3\nend of the string. This encoding disambiguates the generators by taking advantage of the\nfact that powers of ga and gb must alternate to appear.\n\nb , 001 to represent g2\na and g5\n\na and g1\nb , 11 to represent g5\n\nb , 011 to represent g4\n\n7\n\n\fModels\nPoincar\u00e9(16512B)\nPoincar\u00e9(12688B)\nLorentz(11898B)\nMatrix(VLQ)\nEntries(VLQ)\nOrder\nVBW\nfpt-f32\nfpt-f16\n\nsize (MB)\n\nbzip (MB)\n\n372\n287\n396\n\n600(286)\n132(63)\n\n111\n33.1\n6.2\n4.25\n\n119\n81\n171\n\n260(251)\n57(55)\n8.52\n6.07\n1.96\n1.07\n\nFigure 3: (Left) Hyperbolic error for WordNet Nouns; (Right) Compression statistics for WordNet under the\nsame MSHE, \ufb01rst block contains the size of original poincare embedding, second block contains the size\nof compressed baseline models, third block contains the size of matrix part in the L-tiling model (size of\ncompressed integers using VLQ is also reported), the last block contains size of \ufb02oat points (fpt, f32 or f16) in\nthe fundamental domain of L-tiling model.\nThe generator order and corresponding VBW encoding of a given matrix can be derived using\nAlgorithm 1 as shown in Lemma 1. Additionally, for Int or BigInt, we can use variable length quantity\n(VLQ) to compress [31]. To test our compression methods, we use combinatorial construction [30]\nto derive 2-dimensional Poincar\u00e9 disk embeddings for WordNet (Tree-like) and Bio-yeast datasets\n(Figure 2), then we transform embeddings and compress them. We calculate the mean squared\nhyperbolic error (MSHE) with respect to the original embedding to show the error of compression.\nFor Bio-yeast, we evaluate different compressions using MSHE and mean average precision (MAP).\nAs shown in Table 1, representation and compression in the L-tiling model (with different \ufb02oating\nnumber for points in the fundamental domain) does not hurt MAP performance, while the compression\nof the Poincar\u00e9 embedding to the same size hurts MAP severely. For WordNet, we plot the scatter of\nthe relationship between log(MSHE) and bits to store per node in Figure 3. Under the same MSHE,\nthe L-tiling model requires approximately 2/3 less bits per node compared to that of Lorentz and\nPoincar\u00e9 models. We measure the size of different models under the same MSHE in Figure 3. The\nL-tiling model can represent the hyperbolic embedding with only (6.07+1.07) MB, which is 2% of\nthe original 372 MB, while it will cost at least 81 MB for any reasonably accurate baseline model.\nLearning embeddings. As we have shown, our tiling-based models represent hyperbolic space\naccurately, and so they can be used for learning embeddings with generic objective functions.\nHowever, since we analyzed hyperbolic distance and gradient computation error in this paper, we\nevaluate our learning methods empirically on objective functions that depend on distances. As\nproposed by Nickel and Kiela [26], to consider the ability to embed data that exhibits a clear latent\nhierarchical structure, we conduct reconstruction experiments on the transitive closure of the Gr-QC,\nWordNet Nouns, Verbs and Mammals hierarchy as summarized in Table 2. We \ufb01rstly embed the data\nand then reconstruct it from the embedding to evaluate the representation capacity of the embedding.\nLet D = {(u, v)} be the set of observed relations between objects. We aim to learn embeddings of D\nsuch that related objects are close in the embedding space. To do this, we minimize the loss [26]\n\n,\n\n(1)\n\n(cid:88)\n\n(u,v)\u2208D\n\nL(\u0398) =\n\n(cid:80)\n\nlog\n\ne\u2212d(u,v)\n\nv(cid:48)\u2208N (u) e\u2212d(u,v(cid:48))\n\nwhere N (u) = {v | (u, v) (cid:54)\u2208 D} \u222a {u} is the set of negative examples for u (including u). We\nrandomly sample |N (u)| = 50 negative examples per positive example during training.\n\nTable 1: Compression of Bio-yeast\n\nModels\nPoincar\u00e9(8128B)\nPoincar\u00e9(6360B)\nPoincar\u00e9(1832B)\nL-tiling-f64(1832B)\nL-tiling-f32(1768B)\nL-tiling-f16(1736B)\nL-tiling-f0(1704B)\n\nMSHE\n0.00\n\n4.84e-17\n1.01e+03\n9.76e-17\n5.12e-08\n4.30e-05\n5.90e-01\n\nMAP\n0.873\n0.873\n0.310\n0.873\n0.873\n0.873\n0.873\n\nTable 2: Learning Mammals\n\nMAP\n\nModels\n0.805\u00b10.011\nPoincar\u00e9\n0.855\u00b10.013\nLorentz\n0.892\u00b10.031\nL-tiling-SGD\n0.930\u00b10.005\n0.930\nL-tiling-RSGD\n0.930\nH-tiling-RSGD 0.923\u00b10.016\n\nMR\n\n2.22\u00b10.10\n1.89\u00b10.13\n2.14\u00b10.70\n1.491.491.49\u00b10.09\n1.56\u00b10.20\n\n8\n\n4000600080001000012000-40-30-20-10Hyperbolic Error-Bitsbits per nodelog(MSHE)L-tilingLorentzPoincare\fDIMENSION MODELS\n\n2\n\n4\n\n5\n\n10\n\nPOINCAR\u00c9\nLORENTZ\nH-TILING-RSGD\nL-TILING-SGD\nL-TILING-RSGD\n2\u00d7LORENTZ\n2\u00d7L-TILING-RSGD\nPOINCAR\u00c9\nLORENTZ\nH-TILING-RSGD\nPOINCAR\u00c9\nLORENTZ\nH-TILING-RSGD\n5\u00d7LORENTZ\n5\u00d7L-TILING-RSGD\n\n0.124\u00b10.001\n0.382\u00b10.004\n0.390\u00b10.002\n0.341\u00b10.001\n0.413\u00b10.007\n0.413\n0.413\n0.460\u00b10.001\n0.464\u00b10.002\n0.464\n0.464\n0.848\u00b10.001\n0.865\u00b10.005\n0.869\u00b10.001\n0.869\n0.869\n0.876\u00b10.001\n0.865\u00b10.004\n0.888\u00b10.004\n0.888\n0.888\n0.672\u00b10.000\n0.674\u00b10.000\n\n68.75\u00b10.26\n17.80\u00b10.55\n17.18\u00b10.52\n20.27\u00b10.39\n15.26\u00b10.57\n15.26\n15.26\n10.12\u00b10.03\n9.999.999.99\u00b10.09\n4.16\u00b10.04\n3.703.703.70\u00b10.12\n3.703.703.70\u00b10.06\n3.47\u00b10.02\n3.36\u00b10.04\n3.223.223.22\u00b10.02\n4.42\u00b10.00\n4.41\u00b10.00\n\nWORDNET NOUNS\nMAP\nMR\n\nWORDNET VERBS\nMAP\nMR\n\n0.537\u00b10.005\n0.750\u00b10.004\n0.750\n0.750\n0.747\u00b10.003\n0.696\u00b10.003\n0.746\u00b10.004\n0.873\u00b10.001\n0.873\n0.873\n0.871\u00b10.004\n0.948\u00b10.001\n0.947\u00b10.001\n0.949\u00b10.001\n0.949\n0.949\n0.953\u00b10.002\n0.948\u00b10.001\n0.954\u00b10.002\n0.958\u00b10.003\n0.961\u00b10.002\n0.961\n0.961\n\n4.74\u00b10.17\n2.11\u00b10.06\n2.10\u00b10.05\n2.33\u00b10.07\n2.072.072.07\u00b10.03\n1.31\u00b10.01\n1.33\u00b10.01\n1.19\u00b10.01\n1.161.161.16\u00b10.00\n1.161.161.16\u00b10.01\n1.16\u00b10.01\n1.15\u00b10.00\n1.15\u00b10.00\n1.07\u00b10.01\n1.061.061.06\u00b10.00\n\nGR-QC\n\nMAP\n\n0.561\u00b1 0.004\n0.563\u00b10.003\n0.560\u00b10.004\n0.574\u00b10.005\n0.574\n0.574\n0.564\u00b1 0.002\n0.718\u00b10.003\n0.718\n0.718\n0.716\u00b10.005\n0.714\u00b10.000\n0.715\u00b10.003\n0.715\n0.715\n0.714\u00b10.002\n0.729\u00b10.000\n0.724\u00b10.001\n0.729\u00b10.001\n0.944\u00b10.007\n0.953\u00b10.002\n0.953\n0.953\n\nMR\n\n67.91\u00b11.14\n68.40\u00b11.20\n66.17\u00b11.05\n63.04\u00b11.97\n63.04\n63.04\n63.88\u00b11.47\n11.59\u00b10.32\n10.88\u00b10.42\n10.88\n10.88\n34.60\u00b10.52\n33.51\u00b1 1.04\n33.46\u00b10.66\n33.46\n33.46\n29.51\u00b10.21\n29.34\u00b10.23\n27.75\u00b10.39\n3.06\u00b10.03\n3.033.033.03\u00b10.01\n\nTable 3: Learning experiments on different datasets. Results are averaged over 5 runs and reported in\nmean+std style.\n\nWe consider the L-tiling models trained with RSGD and SGD, H-tiling models trained with RSGD\nand the Cartesian product of multiple copies of 2-dimensional L-tiling models (proposed in Gu et al.\n[19]). The Poincar\u00e9 ball model [26] and Lorentz model [27] were included as baselines. All models\nwere trained in \ufb02oat64 for 1000 epochs with the same hyper-parameters. To evaluate the quality of\nthe embeddings, we make use of the standard graph embedding metrics in [3, 25]. For an observed\nrelationship (u, v), we rank the distance d(u, v) among the set {d(u, v(cid:48))|(u, v(cid:48)) \u2208 D)}, then we\nevaluate the ranking on all objects in the dataset and record the mean rank (MR) as well as the mean\naverage precision (MAP) of the ranking.\nWe start by evaluating all 2-dimensional embeddings on the Mammals dataset. As shown in Table 2,\nall tiling-based models outperform baseline models: the performances of L-tiling model and H-tiling\nmodel with RSGD are nearly the same. In particular, the L-tiling model achieves a 8.8% MAP\nimprovement on Mammals compared to Lorentz model.\nEmbedding experiments on other three large datasets are presented in Table 3. These results show\nthat tiling-based models generally perform better than baseline models in various dimensions. We\nfound three observations particularly interesting here. First, the group-based tiling model (L-tiling)\nperforms better than the non-group tiling model (H-tiling) in two dimensions. Second, tiling-based\nmodels perform particularly better than baseline models for the largest WordNet Nouns dataset,\nwhich further validates that numerical issue happens when the embeddings are far from the origin and\naffects the embedding performances. Third, the Cartesian product of multiple copies of 2-dimensional\nL-tiling models performs even better than high dimensional models when the datasets are not too\nlarge and complex such as WordNet Verbs and Gr-QC, especially for the dense graph Gr-QC.\nMore experiment details are provided in Appendix B. We release our compression code\u2217 in Julia and\nlearning code\u2020 in PyTorch publicly for reproducibility.\n\n8 Discussions and Conclusions\n\nIn this paper, we introduced tiling-based models of hyperbolic space, which use a tiling backed by\ninteger arithmetic to represent any point in hyperbolic space with \ufb01xed and provably bounded error.\nWe showed that L-tiling model using one particular group G can achieve substantial compression\nof an embedding with minimal loss, and can perform well on embedding tasks compared with\nother methods. A notable observation that could motivate future work is that our group based tiling\nmodel (L-tiling) performs better than the non-group tiling model (H-tiling) in two dimensions: it is\ninteresting to ask if this re\ufb02ects some advantages of the group, and if we can use this to \ufb01nd better\nnon-regular tilings in high dimensions. Overall, it is our hope that this work can help make hyperbolic\nembeddings more numerically robust and thereby make them easier for practitioners to use.\n\n\u2217https://github.com/ydtydr/HyperbolicTiling_Compression\n\u2020https://github.com/ydtydr/HyperbolicTiling_Learning\n\n9\n\n\fReferences\n[1] J. Anderson. Hyperbolic Geometry. Springer Undergraduate Mathematics Series. Springer\n\nLondon, 2005. ISBN 9781852339340. 2, 3, 7\n\n[2] Alan F Beardon. The geometry of discrete groups, volume 91. Springer Science & Business\n\nMedia, 2012. 4, 16\n\n[3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko.\nTranslating embeddings for modeling multi-relational data. In Advances in neural information\nprocessing systems, pages 2787\u20132795, 2013. 9\n\n[4] Brian H Bowditch. A course on geometric group theory. Mathematical Society of Japan, 16 of\n\nMSJ Memoirs, 2006. 1\n\n[5] Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large\nannotated corpus for learning natural language inference. In Proceedings of the 2015 Con-\nference on Empirical Methods in Natural Language Processing (EMNLP). Association for\nComputational Linguistics, 2015. 1\n\n[6] James W. Cannon, William J. Floyd, Richard Kenyon, Walter, and R. Parry. Hyperbolic\n\ngeometry. In In Flavors of geometry, pages 59\u2013115. University Press, 1997. 2, 3, 7\n\n[7] Benjamin Paul Chamberlain, James Clough, and Marc Peter Deisenroth. Neural embeddings\nof graphs in hyperbolic space. Proceedings of the 13th international workshop on mining and\nlearning from graphs held in conjunction with KDD, 2017. 1, 2\n\n[8] Aaron Clauset, Cristopher Moore, and Mark EJ Newman. Hierarchical structure and the\n\nprediction of missing links in networks. Nature, 453(7191):98, 2008. 1\n\n[9] J.H. Conway, H. Burgiel, and C. Goodman-Strauss. The Symmetries of Things. Ak Peters Series.\nTaylor & Francis, 2008. ISBN 9781568812205. URL https://books.google.com/books?\nid=EtQCk0TNafsC. 2, 4, 16\n\n[10] H. S. M. Coxeter. Regular honeycombs in hyperbolic space. In III, Noordhoff, Groningen, and\n\nNorth-Holland, page 155, 1956. 6\n\n[11] Andrej Cvetkovski and Mark Crovella. Multidimensional scaling in the poincar\u00e9 disk. Applied\nMathematics & Information Sciences, abs/1105.5332, 05 2011. doi: 10.18576/amis/100112. 1\n[12] Basudeb Datta and Subhojoy Gupta. Uniform tilings of the hyperbolic plane. arXiv e-prints,\n\nart. arXiv:1806.11393, Jun 2018. 4, 16\n\n[13] Bhuwan Dhingra, Christopher Shallue, Mohammad Norouzi, Andrew Dai, and George Dahl.\nEmbedding text in hyperbolic spaces. In Proceedings of the Twelfth Workshop on Graph-Based\nMethods for Natural Language Processing (TextGraphs-12), pages 59\u201369. Association for\nComputational Linguistics, 2018. 3\n\n[14] Christiane Fellbaum. WordNet: An Electronic Lexical Database. Bradford Books, 1998. 1, 7\n\n[15] Octavian Ganea, Gary B\u00e9cigneul, and Thomas Hofmann. Hyperbolic neural networks. In\n\nAdvances in neural information processing systems, pages 5345\u20135355, 2018. 3\n\n[16] Octavian-Eugen Ganea, Gary B\u00e9cigneul, and Thomas Hofmann. Hyperbolic entailment cones\nfor learning hierarchical embeddings. In Proceedings of the 35th International Conference on\nMachine Learning, 2018. 3\n\n[17] David Gans. A new model of the hyperbolic plane. The American Mathematical Monthly,\n73(3):291\u2013295, 1966. ISSN 00029890, 19300972. URL http://www.jstor.org/stable/\n2315350. 5\n\n[18] Mikhael Gromov. Hyperbolic groups. Essays in group theory, page 75\u2013263, 1987. 1\n\n[19] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher R\u00e9. Learning mixed-curvature repre-\nsentations in product spaces. In International Conference on Learning Representations, 2019.\nURL https://openreview.net/forum?id=HJxeWnCcF7. 3, 6, 9\n\n10\n\n\f[20] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz\nHermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, et al. Hyperbolic\nattention networks. International Conference on Learning Representations, 2018. 3\n\n[21] Svetlana Katok. Fuchsian groups. University of Chicago press, 1992. 4, 16\n[22] Benedikt Kolbe and Vanessa Robins. Tiling the euclidean and hyperbolic planes with ribbons.\n\narXiv preprint arXiv:1904.03788, 2019. 4\n\n[23] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densi\ufb01cation and\nshrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2,\n2007. 7\n\n[24] Shigeru Miyagawa, Robert C Berwick, and Kazuo Okanoya. The emergence of hierarchical\n\nstructure in human language. Frontiers in psychology, 4:71, 2013. 1\n\n[25] Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. Holographic embeddings of\n\nknowledge graphs. In Thirtieth Aaai conference on arti\ufb01cial intelligence, 2016. 9\n\n[26] Maximillian Nickel and Douwe Kiela. Poincar\u00e9 embeddings for learning hierarchical represen-\ntations. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and\nR. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 6338\u20136347.\nCurran Associates, Inc., 2017. 1, 2, 3, 8, 9, 14\n\n[27] Maximillian Nickel and Douwe Kiela. Learning continuous hierarchies in the Lorentz model\nof hyperbolic geometry. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th\nInternational Conference on Machine Learning, volume 80, pages 3779\u20133788. PMLR, 2018. 1,\n2, 3, 9\n\n[28] Melissa Potter and Jason M. Ribando. Isometries, tessellations and escher, oh my. American\n\nJournal of Undergraduate Research, 3, 03 2005. doi: 10.33697/ajur.2005.005. 4, 16\n\n[29] Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph\nanalytics and visualization. In Proceedings of the Twenty-Ninth AAAI Conference on Arti\ufb01cial\nIntelligence, 2015. URL http://networkrepository.com. 7\n\n[30] Frederic Sala, Chris De Sa, Albert Gu, and Christopher Re. Representation tradeoffs for\nhyperbolic embeddings. In Proceedings of the 35th International Conference on Machine\nLearning, volume 80, pages 4460\u20134469, Stockholmsm\u00e4ssan, Stockholm Sweden, 2018. PMLR.\n1, 3, 8\n\n[31] David Salomon. Variable-length Codes for Data Compression. Springer-Verlag, Berlin,\n\nHeidelberg, 2007. ISBN 1846289580. 8\n\n[32] Rik Sarkar. Low distortion delaunay embedding of trees in hyperbolic plane. In International\n\nSymposium on Graph Drawing, pages 355\u2013366. Springer, 2011. 1\n\n[33] Teruhisa Sugimoto. Convex pentagons for edge-to-edge tiling, ii. Graphs and Combinatorics,\n\n31(1):281\u2013298, 2015. 4, 16\n\n[34] Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. Hyperbolic representation learning for fast\nand ef\ufb01cient neural question answering. In Proceedings of the Eleventh ACM International\nConference on Web Search and Data Mining, pages 583\u2013591. ACM, 2018. 3\n\n[35] Alexandru Tifrea, Gary Becigneul, and Octavian-Eugen Ganea. Poincare glove: Hyperbolic\n\nword embeddings. In International Conference on Learning Representations, 2019. 3\n\n[36] John Voight. Computing fundamental domains for fuchsian groups. Journal de th\u00e9orie des\nnombres de Bordeaux, 21(2):467\u2013489, 2009. doi: 10.5802/jtnb.683. URL http://www.\nnumdam.org/item/JTNB_2009__21_2_467_0. 4, 16\n\n[37] John Voight. The arithmetic of quaternion algebras. preprint, 2014. 4, 16, 17\n[38] Robert Yuncken. Regular tessellations of the hyperbolic plane by fundamental domains\ndoi: 10.17323/\n\nof a fuchsian group. Moscow Mathematical Journal, 3, 03 2011.\n1609-4514-2003-3-1-249-252. 4\n\n11\n\n\f", "award": [], "sourceid": 1189, "authors": [{"given_name": "Tao", "family_name": "Yu", "institution": "Cornell University"}, {"given_name": "Christopher", "family_name": "De Sa", "institution": "Cornell"}]}