NeurIPS 2020

Review 1

Summary and Contributions: The authors propose a meta-algorithm to learn promising regions of the search space in high-dimensional optimization problems. The benefit of the presented framework is that it can easily be used in combination with existing methods, e.g., Bayesian Optimization, to scale these to high-dimensional search spaces. The main contribution is to use MCTS to learn promising regions--or in other words to partition the search space--to avoid over-exploration.

Strengths: The paper considers an interesting problem that has seen a lot of attention in recent years. - Soundness of the claims in terms of empirical evaluation: The presented algorithm has been evaluated extensively on a wide range of test functions and RL problems. The comparison to other state of the art methods is impressive. Further, the comparison to gradient-based approaches on the RL tasks is interesting. The additional visualization and analysis in Figure 5 helps to understand how the algorithms works in practice. - Significance and novelty: The idea of learning interesting regions of the search space is appealing and seems to work well for the problems evaluated in the paper. Focusing only on local regions in the search space is not novel in itself, see e.g., TuRBO. Using MCTS in combination with BO is (apart from LaNAS) a novel contribution. - Relevance to the community: This work is in line with many other papers presented at top tier ML venues such as NeurIPS and ICML and the community would certainly benefit from the general idea proposed in this paper.

Correctness: The methods have been tested on a range of standard benchmark functions that are typically used in the community. The methodology seems to be correct overall. The numbers for ARS, NG, TRPO presented in Table 2 seem to be copied from [54], however, a different version of MuJoCo has been used. Line 76: “BOHB [23] further combines TPE with Hyperband [24] to achieve strong any time performance. Therefore, we choose TPE in comparison.” This sounds like TPE is used in the numerical experiments, however, results for BOHB are presented in Section 4, though.

Clarity: The paper is a bit hard to read at times due to grammatical errors and unclear formulations. To name only a few examples: Line 6: … recent works like LaNAS shows good … Line 13: … local model fits well with … Line 26: … the optimizer needs to search every x to find the optimal. Line 38: … K-ary uniform partitions Line 58: … use existing function approximator like BO to find a promising … (BO is not a function approximator) Line 77: Therefore, we choose TPE in comparison. Line 117: This section describes LA-MCTS that progressively learn and generalize … Line 119: … to Table. 1 for … Line 64, 219, 223, 224: switching between different capitalization of MuJoCo. Line 153: Please note the simplicity is critical to the node model for having a tree of them. Figure 4: (collected from runs lasting for 3 day)

Relation to Prior Work: The overall relation to prior work is discussed and all important papers in the field are cited. The general idea seem to stem from the LaNAS algorithm which uses only linear decision boundaries instead of nonlinear. The authors make it clear how LA-MCTS is different from LaNAS.

Reproducibility: Yes

Additional Feedback: In general, please spell-check your paper before submission. Some questions have been raised in the Weaknesses section.

Review 2

Summary and Contributions: LA-MCTS is an MCTS algorithm based on search space partitioning. It adaptively partitions the search space using Kmeans and SVM. It seems to be a simple and effective approach for domains where it is possible to obtain a large number of samples quickly, and the dimension is not so high.

Strengths: The idea is simple and well described. The experimental results are adequate. This paper analyzes both the strength and limitations of their algorithm. I highly appreciate it because most papers hide their limitations. At first glance, I worried that the expansion threshold \theta might be too large, but the results in Fig. 6 (c) show that the best value was $\theta = 10$. Therefore it is not so large. It is similar to the expansion threshold used in normal MCTS for games (FYI, AlphaGo [42] used 40).

Weaknesses: Works well in some domains but not for others. However, the paper describes its own weaknesses, which would be very useful for the readers.

Correctness: It seemed mostly correct. The questions and concerns are described in questions for the authors. The comparison with other algorithms in the experiments seems adequate.

Clarity: The explanation of the algorithm was easy to understand and well described with the help of beautiful figures. I had several small concerns, which I wrote in the feedback.

Relation to Prior Work: This paper covers a broad range of related work in recent black-box optimization. The novelty of the proposed algorithms is clear.

Reproducibility: Yes

Additional Feedback: Misc. comments. - Lines in Figure 3 were hard to distinguish. - Where are the references in the supplementary? Some awkward questions - Is this a maximizing problem or a minimizing problem? Is it correct that min f(x) is sampled and higher f(x) is the better region? - Does STOA mean SoTA? - Kmean should be Kmeans? - What does the citation [54] at line 225 mean? - Does "gibbson sampling [58]" (Appendix A) mean "Gibbs sampling"? (comments after rebuttal) Additional experiments and explanations would improve the readability of this paper. However, my main concern with this paper is not the content, but the writing. I still have some concerns about that.

Review 3

Summary and Contributions: This paper extends the LaNAS method to LaMCTS for scalable black-box optimization. It makes a few improvements including a nonlinear classifier to partition the action space and using BO or TuRBO for sampling in a selected region. It conducts extensive comparison with baselines from BO, EA, MCTS and gradient-based methods on Mujoco locomotion tasks and a few small-scale benchmarks. The experiments show the competitive performance as a black-box optimization method.

Strengths: While the main algorithm is a moderate modification from LaNAS, the two main modifications are good choices in order to scalar to high-dimensional state space for black-box optimization. Thorough empirical comparison with various baselines show a convincingly competitive performance especially in high-dimensional space. The ablation study is very helpful to understand the choice of hyper-parameters.

Weaknesses: One concern I have is about the choice of hyper-parameters. It is important to have a robust setting for a practical black-box optimization algorithm at the absence the prior knowledge about a problem. While the authors argue the performance is not sensitive to the hyper-parameters, the ablation study does show the impact of those parameters. In the experiments, most baselines use their default hyper-parameters for all experiments but C_p, \theta and the type of kernel of LaMCTS are adjusted according to tasks. It makes me wonder if LaMCTS can maintain its performance with a single setting. Besides, how is the length-scale of the RBF kernel decided?

Correctness: The method is correctly presented. The empirical evaluation is correct, except for the choice of hyper-parameters as explained in the "weakness" question.

Clarity: The paper is well written. Each component of the whole algorithm is clearly and concisely explained in section 3 despite the limit of space.

Relation to Prior Work: The authors have done a great job explaining the relation of the proposed algorithm to the work of LaNAS, Bayesian optimization, evolutionary algorithms and MCTS.

Reproducibility: Yes

Additional Feedback: Typo: "STOA" -> "SOTA"