Review for NeurIPS paper: Ultrahyperbolic Representation Learning

NeurIPS 2020

Ultrahyperbolic Representation Learning

Review 1

Summary and Contributions: The paper presents a generalization of the hyperbolic and spherical manifolds. The main contribution of the paper is to show the computation of geodesics, exponential, and logarithm maps that allow optimization on this manifold. The experiment results look solid.

Strengths: A good work on a particular manifold, especially for optimization on the manifold. It details relevant developments.

Weaknesses: The paper could benefit from a few more explanations as mentioned in the detailed feedback.

Correctness: Yes. The developments look correct though I have not done a deep-dive over all the proofs.

Clarity: Yes, mostly. Certain parts can be improved (see comments in detailed feedback)

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: While the paper is mostly easy to follow, it becomes confusing at certain places. The metric discussion is missing (at least not highlighted properly), e.g., the definition of g_x(.,.) I understand that it means <>_q in (1), but it should be explicitly called out. Discuss the non positive definiteness of the metric on the tangent space explicitly. Without loss of generality, \beta can be fixed to -1. This will make certain expressions fewer notations heavy. In Section 3, the discussion between lines 83 to 88 looks odd and not relevant. Where is (4) used? It would be good to have a table in the paper that shows all the manifold related formulas in one place for easy referencing. While Figure 3 is great, it would be great to plot loss versus time to show the benefit of the proposed optimization approach over the Euclidean modeling. It would be great to have the discussion in Lines 496 to 500 in the main paper as well. ------- After rebuttal -------- I appreciate the author's response.

Review 2

Summary and Contributions: Authors propose a new representation for structured data: the ultrahyperbolic representation. It combines spherical and hyperbolic geometries by "stacking" them. Several expressions are given for this space, including geodesics, expo/log maps and a dissimilarity function. Optimization frameworks are then proposed to optimise a given cost function in that space. The optimisation deserves a specific attention as tangent vectors can have negative square norms, and authors propose a novel direction scheme. Experiments on the karate club dataset (+nips dataset on the supplementary) are then given. AFTER REBUTAL ------------------- I thank the authors for their response. I still think that a deeper investigation of the conditions/cases on which the geometry can be useful is missing, and after reding the rebutal, my concerns on this part remain. Nevertheless, I believe that the theoretical developments of the paper are interesting and that the paper may be published at NeurIPS.

Strengths: The paper is theoretical and introduces the ultrahyperbolic geometry, that combines both spherical and hyperbolic geometries. Tools to optimise on this geometry are then given, paving the way for new applications in machine learning. The experiments on the karate dataset show that the geometry can be useful when graphs contains cycles, which is a typical scenario on which the hyperbolic geometry fails (as it is supposed to represent tree-like data). As such, the work is a pioneer work in the field of hyperbolic geometry for machine learning ; the claims are sounds.

Weaknesses: The main weaknesses of the paper are the following: - it is unclear in which case this geometry can be useful "in real applications". In the experiments, authors state that it better embeds graph that contain cycles: a deeper analysis of this statement should be provided, together with all other relevant cases. If I have a problem with some tree- or graph-like data, in which case should I consider the ultrahyperbolic geometry rather than the hyperbolic one? - regarding the experimental evaluation, several questions are also raised. Authors propose to use the capacity of the nodes to learn the representations, as it is difficult to get ancestors in the presence of cycles. How does the ultrahyperbolic geometry (with q>1) behaves in "classical" scenario in which the hyperbolic assumption makes sense? How choosing the appropriate q value ? To what phenomenom it relates to?

Correctness: To the best of my knowledge, the claims are correct.

Clarity: The paper is well written and easy to follow.

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback: It is probably beyond the scope of the paper but I was wondering how dedicated methods for graphs embedding behave w.r.t. the proposed method in the karateclub dataset?

Review 3

Summary and Contributions: The paper considers a pseudo-Riemannian geometry that extend the traditional hyperbolic and spherical space. Some theory regarding the geometry and optimization is provided. An application on graph-valued data is provided. == After the rebuttal == I have lowered my score for this paper after the rebuttal. My main concern with the paper is that the theory is too decoupled from the application. I cannot tell a reason why the developed representation should be applicable to graph-data. I had given the authors the benefit of the doubt hoping for an explanation in the rebuttal, but the rebuttal did not have a (to this reviewer) convincing argument why the representation should work well for graphs.

Strengths: The key contribution of the paper is theoretical. As far as I can tell, the paper provides two novel theoretical results: *) Closed-form expression for geodesics in the considered space is provided. Derivations seem to closely match corresponding results for spheres and hyperbolic space. This is to be expected; and the result remain novel. *) From my perspective the most interesting result is the derivation of a non-trivial descent direction over the considered space.

Weaknesses: The key contribution of the present paper is theoretical. The main issue is that the theory seems largely to have limited practical use. The paper propose to use the considered space for representing graphs, but this appears to largely be ad hoc with little to no grounding in first principles. The argument is that since trees are known to be well-represented in hyperbolic spaces, then graphs should naturally fit into a space that extend hyperbolic space. From what I can tell, this conclusion has no grounding in mathematics. If I have missed something, I would appreciate if the authors provide more detail in the rebuttal.

Correctness: The theory appear to be correct. The empirical work appears to be largely heuristic.

Clarity: The paper is generally easy to read, but may require prior experience in differential geometry. Given the theoretical nature of the paper, I would have preferred that a larger portion of the paper was devoted to explaining and deriving the theory (the current paper largely states results and refer the reader the supplementary material for details).

Relation to Prior Work: Yes. I miss a discussion of the work of Feragen et al. that were among the first to give tree-data a well-defined geometry: (*) Means in spaces of tree-like shapes A Feragen, S Hauberg, M Nielsen, F Lauze 2011 International Conference on Computer Vision (*) Towards a theory of statistical tree-shape analysis A Feragen, P Lo, M de Bruijne, M Nielsen, F Lauze IEEE Transactions on Pattern Analysis and Machine Intelligence

Reproducibility: Yes

Additional Feedback: Generally, I would wish for a greater emphasis on explaining and deriving the theory as this is where the main contributions lie. I had trouble understanding Eq. 9. In particular, I did not understand the motivation for the 'otherwise' branch -- where does this expression come from? In general, I found the graph-analysis aspect of the paper less convincing and this is the reason for my fairly low score. It seems to me that quite a few 'hacks' are needed to force graphs into the proposed representation (lines 220-229; the symmetrization of C in line 241). If I an missing a great principle here, then I would be interested to learn more.

Review 4

Summary and Contributions: This paper proposed a novel approach in representation learning context using by studying Riemannian geometry properties of pseudo-hyperboloids. The novelty of this paper is mainly theoretical. In particular, the authors proposed for the first time the explicit formulas of the geodesic distance on ultrahyperbolic manifolds and the logarithm map for pseudo-hyperboloids. Also, an optimization scheme using the ultrahyperbolic geometry is proposed, in which the authors have shown a relationship to representation learning of non-tree graphs both theoretically and experimentally.

Strengths: The theoretical grounding of this paper is solid. With precise geometric analysis, the authors further develop theories on pseudo-hyperboloids, e.g. Riemannian metrics, geodesics, exponential maps, and (pseudo-)distances. Formulas are explicit and concrete, and thus laid a solid foundation for numerical computations. For the numerical algorithm part, the author proposed a novel approach of computing representation learning, with the aim of improving performance on non-tree graphs. The experimental results also the efficacy of the proposed algorithm and its performance on various datasets.

Weaknesses: Theoretically, the dissimilarity metric (Eq.8) is locally defined due to the (non)existence of the logarithm map in the global scope. The effectiveness of the dissimilar metric is limited by the geometry of the underlying manifold, which might in turn limit the effectiveness of the proposed method on complicated real-world applications. The authors could have justified the applicability of the algorithm by experimenting it on more real-world datasets, alongside the two reported in the paper and the supplementary materials. The proposed optimization algorithm, as the authors pointed out in the conclusion section, lacks a convergence rate analysis. However, this is not a major issue as this is not the emphasis of the paper. Finally, it would be great if more experiments were conducted with various real-world datasets and under different settings of model hyper-parameters.

Correctness: This paper involves intensive Riemannian geometry computations, which require extensive background knowledge. However, the statements are clear and well organized. And the empirical methodology is consistent with previous state-of-arts.

Clarity: The paper is well written, all the mathematical concepts are explained clearly. The computational algorithms are represented in details.

Relation to Prior Work: This paper proposed a novel method based on the geometries of pseudo-hyperboloids, which generalized the hyperboloid models studied in existing representation learning literature. Both of the theoretic and numerical novelties are clearly stated in the paper.

Reproducibility: Yes

Additional Feedback: The paper is well presented and novelties are clearly stated, and paved a new way of solving representation leaning problems with non-Riemannian nature. AFTER REBUTAL ------------------- I thank the authors for their response. My concern has been fully addressed, especially why the proposed method is better than hyperbolic geometry for graph embedding/representation. I believe that the theoretical developments of the paper are interesting and that the paper should be published at NeurIPS. So I keep my original score.