NeurIPS 2020

High-Dimensional Bayesian Optimization via Nested Riemannian Manifolds

Review 1

Summary and Contributions: This manuscript considers Bayesian optimization of (nominally) high-dimensional functions. This is a perennial challenge for Bayesian optimization, as many off-the-shelf models are not able to effectively cope with the curse of dimensionality, and the sample efficiency improvements of Bayesian optimization seen in lower dimensions do not necessarily carry over. The folk intuition -- that had better be true of we are to have any hope -- is that most real objective functions are not truly high dimensional, but rather vary primarily along some much lower dimensional subspace. If we could only identify that subspace, we could reduce the problem to a much easier, lower-dimensional version. This vision has inspired many previous investigations, which for example sought to exploit low-dimensional linear projections and/or additive decompositions with low-dimensional components. Here the authors seek to learn a low-dimensional Riemannian manifold embedded in the domain and transfer the optimization problem onto this intrinsically lower-dimensional space. The bulk of the paper (and the supplement) is devoted to building theoretical machinery to support this vision, with a brief empirical investigation as well showing the proposed methods have promise. After the discussion phase I am somewhat less enthusiastic about this paper as the other reviewers have convinced me that the scope of the proposed methods may be somewhat limited in practical settings. I am still positive overall but I would encourage the authors to better motivate the framework (and the decisions made along the way) in terms of foreseen applications.

Strengths: This work is a technical tour de force that manages to introduce and distill numerous deep ideas into a plausible, working system. The vision is sound and the approach is sophisticated. The approach would unquestionably be of interest to the NeurIPS community (or at least the subset interested in Bayesian optimization and Gaussian process modeling at large), and provides tools opening doors to a variety of lines of future research.

Weaknesses: Quite honestly, there is little to complain about here. Perhaps the biggest limitation of the work is a slight disconnect between the framework and actual practical applications, but this is mitigated by the fact that it is mostly a theoretical contribution introducing new modeling approaches rather than targeting a specific application. I did wish when reading through that a paragraph or two might have been devoted to some practical commentary, but with so much to cover (and the authors _very obviously_ struggling with the space limitations), I understand that it simply would not fit. However, the authors may want to consider whether a venue allowing more space might be a more effective means of communicating their message. NeurIPS offers a big audience but precious little bandwidth to convey complicated ideas. Just a thought.

Correctness: Yes, as far as I can tell.

Clarity: Ignoring the expected terseness and a couple of typos, yes. Just a couple of very minor suggestions I marked while reading through: - 52: robot ability -> robot's ability - 59: you say "high-diemensional" Riemannian manifolds here but I suppose you mean low-dimensional manifolds embedded in a high-dimensional space? - 75: They -> they - 82: permits -> permits us - 140: even if "supervisedly" is a word (and it doesn't read like one to me), it's an awkward one

Relation to Prior Work: Yes, the discussion and consideration of previous work is thorough. Just a couple more related works you can consider citing if you see fit: - Garnett, et al. Active learning of linear embeddings for Gasussian processes, UAI 2014. [also considers active sampling to learn a linear embedding from a high- to low-dimensional space] - Gardner, et al. Discovering and exploiting additive structure for Bayesian optimization, AISTATS 2017. [yet another work attempting to discover and exploit additive structure for Bayesian optimization]

Reproducibility: Yes

Additional Feedback: Just one technical comment: the inverse map m⁻¹ does not necessarily exist if the manifold contains self-intersection: consider for example an immersion of a Klein bottle in R³. Is this possibility somehow explicitly ruled out?

Review 2

Summary and Contributions: The paper describes the inclusion of Riemannian manifold geometric information in Gaussian process regression for Bayesian optimization. Two different manifolds are treated: the sphere and symmetric positive definite matrices. The approach incorporates dimension reduction. Experimental results are presented on several synthetic functions.

Strengths: The main interest is to brings novel ideas to Bayesian optimization.

Weaknesses: Only toy examples are provided, without a realistic test case such as the ones discussed in introduction (e.g., robot arms). While the results are good overall, the difference with other methods (without prior geometric information) is not so striking on several instances. The geometric structure must be known beforehand.

Correctness: Yes

Clarity: The paper is clear and pedagogical.

Relation to Prior Work: See Duvenaud, D. K. Automatic Model Construction with Gaussian Processes University of Cambridge, University of Cambridge, 2014 (for topological manifolds) Ginsbourger, D., Roustant, O., & Durrande, N. (2016). On degeneracy and invariances of random fields paths with applications in Gaussian process modelling. Journal of statistical planning and inference, 170, 117-128. Gaudrie, D., Le Riche, R., Picheny, V., Enaux, B., & Herbert, V. (2020). Modeling and optimization with Gaussian processes in reduced eigenbases. Structural and Multidisciplinary Optimization, 1-19.

Reproducibility: No

Additional Feedback: What is the effect of model misspecification, e.g., if d is not known? Adding an algorithmic summary would be beneficial. ### Post rebuttal comments ### The authors have responded to my comments. This paper has generated interesting discussions about the interest and limitations of using the appropriate geometrical information (which could be also included in the additional page of content). I thus increased my score to accept.

Review 3

Summary and Contributions: The authors consider the problem of Bayesian Optimisation in high dimensional spaces and propose a method to reduce this to a problem in lower dimension.

Strengths: The proposed technique appears to be built on sound ideas which are well motivated (from a theoretical point of view, not from an application point of view, see 'weaknesses below).

Weaknesses: The main weakness I'd say is that it is unclear what the applications of the proposed techniques are. The introduction mentions applications such as robotic arms or learning of directions, but these are all low dimensional. The experiment section does not help here since it concentrate on mostly artificial data.

Correctness: The claims appear to be correct.

Clarity: Overall yes, but please see the Additional Feedback section below for specific comments and suggestions as some terms used in the paper can be confusing.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: * Figure 1: these examples are interesting applications in general, but are not good examples for this paper since they are not high dimensional examples. * Figure 1 again: Please define what are S^3_{++} and S^2_{++} here * What does it mean for an objective function to only vary within a low dimensional latent space? Please define this precisely in the paper. * What is a structure preserving mapping mapping between M^D and M^d? Such a map cannot preserve the Riemannian structure since its differential cannot have full rank with d < D. Please make this clearer. * Would it be possible to use another term for the map m^{-1}, since this map is not really the inverse of m (do you assume it is a right inverse? section 3.2 does suggest so)? ======================== Update after authors feedback ======================== I thank the authors for their feedback. I will maintain my score of 7 as I found the ideas in this paper interesting and overall well introduced. I maintain some reservations on the applicability of these ideas, even if high dimensional Bayesian optimisation is a hard problem in general. I would also like to point out that, in my opinion, the paragraph the authors said they would add in the introduction in too technical for an introduction.

Review 4

Summary and Contributions: This paper proposes a Bayesian optimization method called high-dimensional geometry aware Bayesian optimization (HD-GaBO) for a setting where the parameter space is high-dimensional and non-Euclidean. It uses a surrogate model that learns a mapping from a high dimensional Riemannian manifold, where the parameters lie, to a lower dimensional Riemannian manifold and a representation of the objective in this lower dimensional manifold. Update after the rebuttal - Keeping my original score since the authors have not satisfactorily answered the key concern in my original review: "We have an objective function to optimize where the optimizing parameters lie on a high dimensional Riemannian manifold and the authors use a latent space, which is again a Riemannian manifold. Problems where the optimizing parameters lie on a Riemannian manifold can naturally occur, but why does the latent space have to be a Riemannian manifold as well? Why would a Euclidean latent space be not sufficient?". From the reviewer discussions, it seemed that some other reviewers also share this concern.

Strengths: 1) Generally speaking, the idea is somewhat interesting, but it needs to be motivated and empirically backed.

Weaknesses: 1) The problem setting is very specific and hypothetical. We have an objective function to optimize where the optimizing parameters lie on a high dimensional Riemannian manifold and it somehow makes sense to use a latent space, which is again a Riemannian manifold. Problem where the optimizing parameters lie on a Riemannian manifold can naturally occur. But why does the latent space have to be a Riemannian manifold as well? Why would a Euclidean latent space be not sufficient. Figure 1 shows two somewhat abstract examples of lower dimensional manifolds as subspaces of higher dimensional manifolds of the same geometry. But could the authors elaborate more on where these kind of settings occur in real world machine learning/optimization problems? Lines 111-113 say: "We assume here that the objective function satisfies the low-dimensional assumption and thus only varies within a low-dimensional latent space. Moreover, we assume that this latent space can be identified as a low-dimensional Riemannian manifold Md inheriting the geometry of the original manifold MD, with d ≪ D" - authors should describe real-world problem settings where these assumptions make sense. 2) The experimental evaluation is not sufficient. No real-world datasets have been used. Probably because the problem setting is too specific/hypothetical to find real worl examples? (see the point (1) above). 3) The key contribution of this paper seem to be a combination of the ideas presented in [26] (Bayesian optimization on Riemannian manifolds), [35] (Bayesian optimization using low dimensional latent spaces). Therefore, the novelty of this work is not very high.

Correctness: 1) As detailed about the empirical evaluation is not sufficient to judge the merits of this work. Appropriate real-world datasets should be used.

Clarity: It is relatively well written and key points are sufficiently described.

Relation to Prior Work: Authors appropriately discuss relevant prior work such as [26] and [35].

Reproducibility: No

Additional Feedback: This problem setting is way too specific and might not be broad enough to publish in a major machine learning conference such as NeurIPS.