__ Summary and Contributions__: The paper proposes a batched Bayesian multi-objective optimization method that uses diversity in the design and performance space during sampling to balance exploration and exploitation.
The key contribution is the batch selection strategy that is guided by diversity measure and hypervolume improvement metrics. The diversity measure is obtained via grouping candidate points into diversity regions based on a performance buffer data structure similar to graph-cut. The actual selection is then just a greedy approach that tries to maximize both diversity and hypervolume improvement.
The paper presents experimental results that show superior performance both quantitatively (hypervolume indicator) and qualitatively over several synthetic and real-world datasets.

__ Strengths__: The proposed algorithm seems sound, diversity in sampling is known to help avoiding local minima problems, this paper leverages this in the MOBO domain quite efficiently. I think the paper adds valuable contribution to the batch-MOBO domain.
The authors propose several simplifications that make the runtime of the method in the same league as the related ones (only few iterations in Pareto front approximation, greedy selection).
Besides the superior performance there are several ablation studies provided over the batch size showing robust performance and algorithms flexibility.

__ Weaknesses__: The proposed method builds upon a lot of previously well-studied elements, and I only find the diversity technique in the sampling process as somewhat novel. Still, it introduces quite some extra calculations, in fact only ParEGO and MOEA are slower than that. These algorithms are however not primed for batches so the comparison is questionable.
I am also wondering how the grouping method's buffer capacity impacts the quality of the results

__ Correctness__: Yes, the paper clearly describes the method via algorithm sections that are easy to understand.

__ Clarity__: The paper is well-written, easy to follow. I found the math solid and Figure 1 particularly useful in understanding how the proposed method differs from related ones.

__ Relation to Prior Work__: The paper does a good job going over related work in the MOBO domain and is does notable efforts to distinguish its contributions.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper considers the problem of black-box optimization of multiple objectives using a batch of inputs for function evaluation in each iteration of Bayesian optimization (BO). The paper provides an approach for selecting a batch of inputs by balancing diversity and potential improvement in Pareto set of solutions. There are three key steps in the algoithm:
1) Solve a cheap multi-objective optimization problem using the mean of GP posterior for each black-box function using a recent local search style approach
2) Identify linear subspaces as diverse regions by taking into account accuracy and neighborhood information in the input space. This step is tied to local search method in step #1.
3) A greedy algorithm to maximize the hypervolume improvement by selecting inputs from diverse regions uniformly.
Experiments are performed on both synthetic and real-world benchmarks with good results.

__ Strengths__: + Considers a relatively less-studied problem.
+ Extremely well-written paper.
+ Good experimental results.

__ Weaknesses__: - Technical novelty is low.
- Experimental evaluation is lacking to some extent.

__ Correctness__: Claims and method are correct.
Empirical methodology is correct, but evaluation can be improved.

__ Clarity__: Extremely well-written paper except for a small part.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: Comments:
1. This is a very well-written and well-executed paper on a relatively less-studied problem.
2. Technical novelty is on the lower side. Step #2 is tied to the local search method in Step #1 and is the core technical innovation.
3. There are many existing methods from batch BO in single-objective setting, which can be easily used as wrappers on top of any given multi-objective acquisition function. Some examples:
Local penalization (https://arxiv.org/abs/1505.08052)
Determinental Point Processes (DPPs)
Stein method (http://proceedings.mlr.press/v97/gong19b/gong19b.pdf)
I would have liked to see some experimental results for some instantiation of the above methods (e.g., local penalization is simple to implement) for multi-objective BO.
4. The exposition for step #2 can be improved to make the paper self-contained. In its current form, the text is too high-level and all the details are not clear.
Post rebuttal:
Thanks for the detailed response. I strongly encourage to include a baseline from my comment #3. As I mentioned, local penalization method is simple to implement and test.

__ Summary and Contributions__: The article describes a new algorithm for conducting multiobjective
optimization using Gaussian process models to improve sample-efficiency.
Their main contribution involves recognizing that Pareto frontiers can
likely be described as having multiple components, each of which is
continuous but which represent disjoint regions in parameter space. The
authors propose a strategy to exploit this knolwedge, implement that
strategy (which is provided in the supplementary material), and test it
on a selection of sample problems. Performance is defined in terms of the
hypervolume of the associated approximate Pareto frontier, and their
algorithm has superior performance on some of the problems.

__ Strengths__: This article takes on a very timely topic (multi-objective optimization of
expensive objectives) and correctly identifies an element which has been
missing from existing strategies (the fact that Pareto frontiers are often
comprised of multiple disjoint continuous sections). The authors effectively
set the scene for the reader and concisely present their approach to exploiting
this Pareto frontier knowledge. The authors have tested their strategy on a
spread of problems, though the actual breadth of them is a bit limited (all
these problems are fully continuous and, I think, <=6 dimensions). The article
is well-written as well.

__ Weaknesses__: The description of the diversity regions feels extremely important to the
contributions of this paper. The section explaining how the diversity region
is defined and implemented is only a single paragraph, which provides no
insights whatsoever to the actual process which is occurring. It also
provides no insights as to the implication of various decisions regarding
this diversity regions definition. Personally, I would see this as a
fundamental and necessary element of the analysis and experiments which
are presented, given that the key novelty of your method is, as you have
stated, is this desire for diversity.
I would recommend the following:
1) Skip some of the intro content or push that to an appendix. Use the
additional space to build out your explanation of the "Split points from
..." step in your algorithm, which is, I think, the key novel part to this
article.
2) Provide a more complete explanation of how your diversity region algorithm
works, what free parameters are present in it (and how they impact the result),
how it functions under different circumstances (number of diversity regions,
distance between them, size of diversity regions, extreme circumstances such as
only a single point as the Pareto frontier), how many points are needed
to find diversity regions with any accuracy.
3) Explain the implications of noise/randomness in function evaluations impacts
your results. Realistically, any problem of interest (certainly in the NeuRIPS
community) will likely have randomness/uncertainty in it. How does your
algorithm deal with noisy objectives? How, specifically, does your diversity
mechanism deal with noisy objectives (since noise complicates the ability to
accurately define points as being on the Pareto frontier)? Similarly, any
algorithm that requires solving the dual from the KKT conditions feels very
subject to uncertainty (since the quality of your approximation to the Pareto
frontier is complicated significantly by noise).
4) Explain how your ability to build these diversity regions plays a role in
your algorithm's performance. Your Table 1 does not seem to provide any
relevant information (though you can keep it if you would like). Your Figure 2
provides 20 experimental results, but no ability to understand what aspect of
these problems made your algorithm more successful (in the cases where it was
more successful). Can you explain or depict what part of the frontier your
algorithm was able to find that the other methods were not? Can you identify
any common structures/behaviors which yielded wins for your method? It seems
like your hypervolume was much better for ZDT3, DTLZ4, VLMOP2, OKA1, RE2, RE6:
do these have certain things in common? Or is this a situation where the
baselines against which you are comparing are just very weak. Personally, I
would only really have expected the TSEMO baseline to be of comparable
quality.
Another point I would like to make is that the numerical experiments are not
terribly convincing. First off, there is also no ability to interpret increased
hypervolume from a practical perspective. I do not dispute that a greater hypervolume is
good, but I do not think that this provides a comprehensive sense of performance
of a multiobjective optimization tool. Maybe providing the frontiers and actually
pointing to the improvements which are observed would be beneficial. At present
I cannot interpret these numbers or how relevant they are.
Another point of interest for me is the batch situation. Obviously, the ability
to execute in parallel is a relevant part of any practical sample efficient
optimization strategy, but I feel like asynchronous parallelism is more the
standard among open source tools than batch parallelism. Ignoring that, I'm also
surprised that there are no sequential results. In principle, if your algorithm
supposed to perform better or worse, compared to the performance gap we see in
Figure 2, in a sequential setting? Along those lines, I do not understand why
you are comparing against methods which do not have a parallel implementation, such
as ParEGO (and I assume others). Why not simply run a sequential comparison, if
what you are trying to argue is that your diversity methodology is a fundamentally
beneficial tool? If it is only beneficial in a batch parallelism setting, I think
you should be stating that explicitly to readers (and not handicapping other
baselines by running them in unfavorable circumstances).
One fundamental weakness of this algorithm is that it relies on a continuous
approximation of the Pareto frontier. This seems impossible in circumstances with
discrete parameters (which includes most problems of relevance) and also seems
problematic in noisy circumstances (where these continuous predictions are then
also subject to noise). This alone is not a barrier to publication, but would
be an impassible barrier to usage in any actual circumstance (if the continuous
approximation were a fundamental driver for quality).

__ Correctness__: How were the choice of DGEMO hyperparameters made? How sensitive is the performance
of your algorithm to those choices? I realize, of course, that some choice needs to
be made; I just want to understand how this particular decision was made. Was there
a great deal of experimentation, or were they just chosen randomly? Was any
experimentation done using the functions that you show us in the Table 2? If so,
Might that be biasing the performance of your algorithm on those problems? Was any
experimentation done on the baseline hyperparameters to help improve performance?
If not, how much of the gap that you are seeing in performance do you think could
be accounted for if the same care was applied to the baselines as was applied to
your algorithm? I would love to have also seen a uniform random baseline, just to
set a lower bound on how hard/easy these problems are (and the gap between the other
baselines and the "nothing" baseline).
What is the implication of the dimension in the problem? What about the number of
metrics? How do these impact performance of your method (relative to the
alternatives)?

__ Clarity__: The publication is clearly written in good English. A solid introduction
and background discussion is provided to help set the scene for the reader.

__ Relation to Prior Work__: This article has presented some discussion of other work. I would not say that all
the relevant work has been mentioned (though I am not someone who cares that much).
For example, http://proceedings.mlr.press/v48/hernandez-lobatoa16.html on the pure
BO side. I also think that most of the good work on this topic has probably been
done more on the application side. Some quick examples that I was able to find
with only minor searching include: https://dl.acm.org/doi/abs/10.1145/3195970.3196078,
https://www.osapublishing.org/optica/abstract.cfm?uri=optica-7-7-784,
https://www.sciencedirect.com/science/article/abs/pii/S0925231219308525. You
should not feel compelled to cite any/all of these, nor test against them as
baselines (I only remember the Hernandex-Lobato paper because I happened to be
there when it was presented to a skeptical audience). And I would never condemn
anyone to having to run entropy search code. I really am bringing this
up more to state that the application publications/community may have much more to
say on this topic than the pure BO community.

__ Reproducibility__: Yes

__ Additional Feedback__: Page 6: spelling -- enfocing
I think that this topic is good and worth talking about, and I feel that the authors
have a good idea and the wherewithal to study it, but I feel like the analysis
is incomplete. I think that if this algorithm can only work effectively in certain
situations (such as only continuous parameters with >=4 batch parallelism and no
noise) and will show no improvement in other situations, these things need to be
explicitly stated. If this works fine in a broad set of circumstances, then I
would like to hear about that also, but the experiments that are presented seem
to just be chosen because they were readily available, not because they represent
some specific topic or set of problems with specific characteristics which this
algorithm is meant to address effectively. Without that structure, its impossible
to understand whether any of the progress claimed by this paper is actually a
function of the decisions they have made to address those circumstances, or simply
a byproduct of weak/untuned baselines.

__ Summary and Contributions__: This paper proposes a novel batch Bayesian optimization method for multi-objective optimization. The batch selection method considers the diversity of both design space and performance space. The design space is first partitioned into multiple regions, then a greedy procedure picks points one by one, excluding previously chosen regions, by maximizing hypervolume improvement. Extensive experiments are conducted, both on synthetic and real problems, and superior results are observed compared to several baselines.

__ Strengths__: extensive experiments and superior empirical results.
code is provided, I'm convinced the experiments are reproducible.

__ Weaknesses__: technical content seems to be mostly based on [39]

__ Correctness__: appears to be correct

__ Clarity__: this paper is well-written overall, but the main technical part can be hard to follow for people not familiar with the previous paper [39].

__ Relation to Prior Work__: there are two main parts of the algorithm:
1. Pareto front approximation. This part seems to be adopted from [39]
2. Batch selection. This part also use the idea of performance buffer, which is also introduced in [39].
It's not clear to me which part is exactly novel.

__ Reproducibility__: Yes

__ Additional Feedback__: The technical novelty is somewhat limited in my opinion since most of stuff is based on [39], but it appears to have solid empirical results.
update after rebuttal:
the author feedback partially addressed my concern on novelty, I agree to accept the paper, but my score remains unchanged.