NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4226
Title:Max-value Entropy Search for Multi-Objective Bayesian Optimization

Reviewer 1

Originality 3/5: Given the PESMO and MES work, this is a straightforward direction of work. Quality 5/5: There are a few places that could use some clarification: - Algorithm return statement: How is this Pareto front constructed? Should it not also return the Pareto set, which is probably of greater interest to a practitioner? How would the set be constructed? These are not immediately obvious to me. - Line 259 and equation 5.1: How do you compute this "ideal" Pareto front? Especially for the real-world problems. Clarity 5/5: Very clearly written paper. Structured in a way that is very easy to follow. Conclusions from experimental results are nicely summarized.

Reviewer 2

Before getting into the review, I want to emphasize the high level of similarity between this submission and (Wang and Jegelka, 2017, I recommend reading the two papers side by side, since it is clear that this work was inspired by it to a high degree. I believe the comparison of the two works should be the central question of the review process. (1) Quality: Is the submission technically sound? Are claims well supported by theoretical analysis or experimental results? Is this a complete piece of work or work in progress? Are the authors careful and honest about evaluating both the strengths and weaknesses of their work? The submission is of very high quality. Max-value entropy search is well motivated, demonstrably works well in practice. The authors deserve credit for the careful, high quality experiments. Furthermore, the paper provides a theoretical guarantee on the performance, although I lack the expertise to properly judge the significance of this. (2) Clarity: Is the submission clearly written? Is it well organized? (If not, please make constructive suggestions for improving its clarity.) Does it adequately inform the reader? (Note: a superbly written paper provides enough information for an expert reader to reproduce its results.) The paper is an excellent read. It presents the algorithm in a structured way, with sufficient detail for reproduction. I would recommend it for reading to anyone interested in the general topic. (3) Originality: Are the tasks or methods new? Is the work a novel combination of well-known techniques? Is it clear how this work differs from previous contributions? Is related work adequately cited? As mentioned above, it is clear that this work was heavily inspired by MES (Wang and Jegelka, 2017). This work does not adequately cite related works. My problem is that the way this submission puts it, 'our work is inspired by the recent success of single objective BO algorithms based on the idea of optimizing output-space information gain', might give the wrong impression to the reader. MESMO is the natural extension of MES (Wang and Jegelka, 2017) to the multiobjective domain. It also positions MESMO against PESMO and it compares/contrasts the two. This comparison might make it seem like the contribution is greater than it is in reality. On multiple occasion, the paper presents work without disclosing the strong connection to (Wang and Jegelka, 2017). Examples: - Section 4, Equations 4.4-4.6. These equations literally come from (Wang and Jegelka, 2017). - Section 4 1) and 2), The algorithm this paper uses to approximate MESMO is the same algorithm as in (Wang and Jegelka, 2017). - Section 4.1 The theoretical result is analogous to Section 3.4 of (Wang and Jegelka, 2017). Based on the arguments above, I will rephrase the question: Are the ideas in this paper novel enough to warrant a new publication over (Wang and Jegelka, 2017)? I would argue that they are not. The reason is that there does not seem to be a significant hurdle that this work needed to overcome in order to extend MES to the multiobjective domain. Both the formula for MESMO and the algorithm to approximate it extend to multiobjective problems. I cannot comment on Theorem 1. (4) Significance: Are the results important? Are others (researchers or practitioners) likely to use the ideas or build on them? Does the submission address a difficult task in a better way than previous work? Does it advance the state of the art in a demonstrable way? Does it provide unique data, unique conclusions about existing data, or a unique theoretical or experimental approach? The method is certainly interesting to practitioners. The work demonstrates the performance of MESMO on a variety of benchmarks and it consistently outperforms the baselines. _________________________________________________________________________ After reading the author's reply, I decided to raise my score. I expect that the connection to (Wang and Jegelka, 2017) is properly disclosed in the revised version. The paper is very well written but it has shortcomings in originality. The explanation for my score is that I think a practitioner might find this paper useful, but I don't expect it to have a large impact in the research community. I think this paper is truly borderline. I do not have strong arguments for or against acceptance.

Reviewer 3

This paper presents a novel multi-objective BO (MO-BO) algorithm that extends max-value entropy search to the multi-objective case. This is an original and significant contribution. Overall the paper is well written and structured. There are however few aspects of the paper that are not sufficiently thorough and overall decrease my score of the current manuscript. One aspect of the paper that should be improved is the related work. The current manuscript does not accurately place the current work in the existing literature: - In the introduction, it is mentioned only PES as a MO-BO algorithm (line 31). It would be good to also mention the more widely known ParEGO, SMS-ego, and EIHV. - page 3 line 94 state that HV-based MO-BO is not feasible for more than 2 objectives. This is not accurate (as also demonstrated in the experimental section when using SMSego on a 6-objective task) and it should be rectified. - The connection to the max-value entropy search acquisition function for SOO should be highlighted much earlier and more clearly in the text. Another aspect is that the benchmark functions used to evaluate MESMO are not standard functions in the MOO community. To increase reproducibility and allow direct comparisons to past literature it would be good to include also experiments with more standard MOO functions. Some of these functions can also be scaled to arbitrary numbers of objectives, which would also be valuable when computing the computational time, e.g., by plotting the computational time of the different algorithms vs the number of objectives. Additional comments: - In page 3 line 96, it is stated that reducing to single-objective is sub-optimal. This statement is ambiguous and potentially inaccurate and should be better explained (e.g., by defining sub-optimality w.r.t. some criteria) and motivated (and cite appropriate references). - In page 4, paragraph "cheap MO solver" it might be clearer to explicitly define the optimization that you are solving. It might also be interesting to connect this optimization to previous literature such as [Binois 2015] and [Calandra et al. 2014] given the similar underlying idea (but different tools). references: - Binois, Mickaƫl. Uncertainty quantification on pareto fronts and high-dimensional strategies in bayesian optimization, with applications in multi-objective automotive design. Diss. 2015. - Calandra, Roberto, Jan Peters, and M. P. Deisenrothy. "Pareto front modeling for sensitivity analysis in multi-objective bayesian optimization." NIPS Workshop on Bayesian Optimization. Vol. 5. 2014. --- Edited after rebuttal: Thank you for taking the time to answer my comments. As a result of the rebuttal, I raised my score. Here are some remaining comments: - Point 3: This point is still inaccurate. It is well known that linear scalarizations might result in sub-optimal solutions (depending from the convexity of the PF), however claiming the same for the non-linear case is not something to be done lightly (E.g., arguing about the sub-optimality of the Tchebishev scalarization in ParEGO would require appropriate references). - Point 5: I am familiar with Entropy based methods. Your reply does not however really answer the question of why using entropy works better than previous MO approaches. - Point 7: You should include clearly in the text a reference to pygmo and the algorithm used (Nowak et al.). Generally, this is not the most efficient algorithm to compute HV, and I would suggest you to mention this in the text as well. For more efficient implementations see "Lacour, Renaud, Kathrin Klamroth, and Carlos M. Fonseca. A box decomposition algorithm to compute the hypervolume indicator. Computers & Operations Research 79 (2017): 347-360.", "Jaszkiewicz, Andrzej. "Improved quick hypervolume algorithm." Computers & Operations Research 90 (2018): 72-83.", and follow-ups.