Paper ID: | 537 |
---|---|

Title: | Bayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach |

Bayesian Optimization algorithms for deterministic functions (possibly observed in noise) with continuous input domain have been intensively adressed in recent decades within the framework of Gaussian Process modelling. Here the authors focus on an important issue with most state-of-the-art strategies, namely that the acquisition function is often relying on one-step-lookahead considerations, and does not account for the information brought by candidate points regarding susceptible pay-offs in future iterations. This is known as the finite budget / time horizon set-up, and has inspired several of the papers referenced in the bibliography. The focus here is on transferring know-how and formalism from the world of Dynamic Programming (DP) to Bayesian Optimization (BO). In particular, it is recalled and formalized how the finite time BO problem can be cast as a DP one, and more centrally how an Approximate DP approach called Rollout can be adapted to it. A finite time BO algorithm based on Rollout is proposed and tested on Gaussian Process realizations and four classical test functions. Empirical results suggest that the Rollout outperforms the considered state-of-the-art one-step-look-ahead criteria on GPs with fixed covariance parameters, while on test functions results are more mitigated and further highlight the importance of adequately choosing the Rollout horizon and discount factor.

I enjoyed this paper where an effort has been made to transfer a relevant formalism and an appropriate technique from the world of Operations Research and Dynamic Programming to Bayesian Optimization. While this was partly done in previous work here it it seems to go one step further, and I am not aware of publications where the Rollout was adapted to this precise problem. The results look promising, especially on the GP realizations, but I really felt the absence of comparison to other strategies recently proposed to adress this very issue; GLASSES of [5] seems a natural competitor here (and maybe also the MCTS of [13]). Also I was wondering if further improvements could be reachable at reasonable research investment regarding the (currently rather simple) base policies. As for the empirical comparisons on functions, the way the models are set appears a bit contrived, and the conclusions would have more weight with some experiments in more realistic conditions. Overall, I assess this work as good and promising, but I feel that it should be seriously improved before publication. Beyond the points mentioned above here are a few additional (essentially minor) remarks along the lines: * abstract, line 5: about `uncountable', isn't it the case already in most usual sequential settings (without the finite budget issue)? * lines 23-24 after the colon: this formulation sounds unclear to me. * line 28: `optimal' with respect to which cost? * Equation 1 (and later, e.g. in Equation 5): why should there be unicity of the maximizer? Besides, what guarantees its existence unless assumptions such as continuous f and compact X are made? * In Equations 3-4: noise appears at a sudden. But it was not really mentioned earlier (unless I oversaw it), nor the consequences for Bayesian optimization. * Lines 99 and 103: the set Wk first depends on zk and uk then not anymore. As formalisation is an important part of this paper, I guess it should be all the more taken care of. * In Equation 7: shouldn't wk be upper-cased here as it is a random variable with respect to which expectation is taken? Similar questions for further quantities at stake might be relevant. * In equation (9): where are u{k+1} and w{k+1} now? * The beginning of the sentence in line 118 seems somehow redundant. * In line 212 (and/or around): is h typically assumed much smaller than N? * In 6.1: how are the GP simulated? What is the discretization? (+The software used?)

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The paper proposes a new Bayesian optimisation framework in which the remaining evaluation budget is taken into account to determine the location of the next evaluation of the objective. Since the problem is intractable, an approximation to the optimal policy is computed by dynamic programming using rollout. Several experiments show interesting results achieved by this method compared to standard Bayesian optimization approaches.

I think that this is a good paper that tackles, in my opinion, one of the most interesting open problems in Bayesian optimization. The proper use of the available resources to define good policies, and how this can be adapted to the exiting acquisition acquisition functions, is crucial to push the field forward. The paper itself is a nice piece of work, well structured and easy to read. The idea of using dynamic programming to integrate the effect of the new evaluations in future decisions is a natural one and it is nicely described and executed. My main concern about this paper are the experimental results. Although the authors show some improvements with respect to standard acquisitions, comparisons with existing not myopic methods of with non-greedy acquisitions, such us Entropy search, are missing in the paper. These comparisons would make this paper much stronger but even without those I think that this work is an incremental contribution to the Bayesian optimization literature.

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The paper addresses Gaussian process-based optimization. The question of interest is to find new points where to sample a function f, in order to minimize it over a bounded Euclidean domain. The new points are selected by maximizing the 'potential reward' brought by an evaluation of f at this point. In the literature, the authors mention two kinds of choices for the 'potential rewards' to maximize. The 'potential rewards' of the first kind are simple to calculate, but potentially sub-optimal (expected improvement, probability of improvement, upper confidence bound). Those of the second kind have Bayesian optimality properties, but are close to intractable to use in practice. The authors propose an approximation of the optimal reward functions, which lead to a new Gaussian process-based optimization method. They compare their proposed methods to those based on simple reward functions (expected improvement, probability of improvement, upper confidence bound), on simulated Gaussian process trajectories and analytical functions.

The paper starts by presenting the optimization problem, its practical importance, and the use of Gaussian processes in this context. I found this first part of the paper clear and pleasant to read. Then, the paper presents dynamic programming in general, its application to Gaussian-process based optimization, and the approximate dynamic programming approach they propose. This second part is thus at the core of the innovation brought by the paper. My important concern is that, here, I find the exposition unclear and difficult to follow (even though I am not an expert in dynamic programming). In my opinion, the exposition should be improved significantly. In particular, I think that it would be valuable that the authors summarize in an algorithm how exactly they select the new point where to evaluate the function. I would like to see clearly what exactly is optimized, which mean values are approximated numerically, etc... I think the exposition in Figure 1 should be improved. In a last part, the authors compare their proposed algorithm with standard greedy procedures for selecting the evaluation points (expected improvement, probability of improvement, upper confidence bound). After reading this part, I am not sure that the improvement brought by the paper is significant. First of all, the standard greedy methods have the advantage that they are simple to use, and that the time to select the new function evaluation point is small. Unfortunately, the authors do not provide information about the time taken by their algorithm to select an evaluation point. I find it essential to provide such an information, at least for the settings of Table 1 and 2, because the reader should be convinced that the algorithm proposed by the authors is numerically tractable. This issue is even more important in higher dimensional cases than those addressed in the paper (1 and 2). My other concern is that, setting the algorithmic cost issue aside, I do not find the improvement brought by the paper very significant compared to the standard greedy methods. In particular in table 2, addressing the more interesting two dimensional cases, there are 3 standard methods performing best 3 times while 4 new methods of the paper perform best 4 times. (actually these 4 new methods are 4 different configurations of the algorithm proposed in the paper). This seems relatively even to me. Also, since there are different configurations of their algorithm, perhaps the authors could give advices on how to configure their algorithm in practice.

1-Less confident (might not have understood significant parts)

The paper presents a Bayesian optimization approach with finite budget. In particular it presents a heuristic to solve dynamic programming using a rollout strategy.

I think the idea is not totally new. In the context of model predictive control and look-ahead strategies this approach is used very often. However, this technique can be useful in Bayesian Optimization. In the present version of the paper it is missing an analysis of the computational load of the various algorithms.

2-Confident (read it all; understood it all reasonably well)

The paper shows how solving a global optimization problem under a finite objective evaluation budget can be posed as a dynamic programming (DP) problem. The "dynamic state" of the problem represented by the x-y data collected so far, the "control" is the design point to evaluate next, and the "stochastic disturbance" arises from the uncertainty in the predictive distribution of the design-objective map. The key lies in choosing the reward function of the DP to be the improvement in the solution to the optimization problem. With the exception of the choice of the reward function, the formulation of an optimization of a DP problem is not new. Similar ideas, can be found in W. B. Powell's book named "Optimal learning" (see in particular Chapter 7). I have to mention, however, that I am not familiar with something that is identical to the present work. The full DP problem is extremely complex (especially due to the fact that the state space is growing) and cannot be solved computationally. The authors employ the well known roll out technique. Roll-out approximates the value-to-go function by simulation assuming that in the future one would follow sub-optimal heuristic strategy. The authors demonstrate that their approach performs better than the examined greedy heuristics (probability of improvement, expected improvement, and upper confidence bound).

There are some issues/concerns that the authors should address: 1. It would be nice to see a more challenging example (beyond 2D). There you would start seeing the importance of going beyond the simple greedy approximations. 2. Why do you pick the last suboptimal heuristic policy to be the one that minimizes the posterior mean? Please explain in the text. 3. How would the result change if you finished all strategies (PI, EI, UCB, and R-?-?) with a final step that just minimizes the mean of the GP? 4. Your method is outperformed by EI and UCB for two of the test functions. Why do you think that happens? Is there anything peculiar about these functions?

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The paper proposes a new acquisition function for Bayesian optimization taking into account remaining evaluations. The main idea is to formulate the problem as finite horizon dynamic programming problem. The solution is approximated by not computing the optimal value function but the value function of a heuristic policy, using Gauss-Hermite quadrature and restricting the horizon.

Technical quality: 3/5 The derivation of the acquisition function is well motivated. The final experiments should contain a direct comparison to the closest competitor (GLASSES). Since the code is freely available this shouldn't be too much to ask. After that I'd be happy to rate a 4. Novelty: 4/5 I am not aware of other literature that uses DP to obtain an acquisition function for BO. The rolling horizon approximation introduces back a certain myopia but a BO strategy that is aware that there are at least 5 function evaluations to go is still a great step. Impact: 3/5 The results are not overwhelming but it is a promising approach. Clarity: 4/5 Excellent in every respect. Please explicitly state the computational effort of the acquisition function (per evaluation). Minor issues: * Reference [5]: replace arxiv citation with AISTATS * Please add a reference to Gauss-Hermite quadrature (l. 213) * Line 151: It seems intuitively clear that E[...] is Expected Improvement. However, a little formal proof in the appendix would complete the picture. * For the sake of completeness mention Entropy Search as acquisition function. * Please augment every book reference with page numbers, especially [1] and [15].

3-Expert (read the paper in detail, know the area, quite certain of my opinion)