NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
EDIT: I have read the author response; and appreciate the proposed improvements for the final version. I do not consider it necessary to change my score, or the review. The paper provides a new evolutionary strategy (ASEBO) for black-box optimization. The authors point out that ES methods scale poorly with dimensionality by trying to accurately estimate the gradient of the objective. Their method uses online PCA to estimate a subspace of the gradients, whose dimensionality varies during optimization. A factor controlling exploration is estimated using the gradient norm on the subspace and its complement. Experiments with RL tasks and synthetic functions show better performance of ASEBO compared to other ES methods, and commonly used algorithms. Originality / Quality: The paper presents a novel algorithm for black-box optimization. Both theoretical and experimental results are provided to demonstrate its effectiveness. The experiments are sound, and the authors compare to a number of popular methods in the domain. Clarity: Overall, the paper is well written and easy to follow. The theoretical results, and their significance, need to be discussed in more detail. This is especially true for Theorem 3.3. Significance: Given the renewed interest in black-box optimization, and the popularity of ES methods, I believe this work is an important contribution.
Reviewer 2
It is to be noted that the algorithm does not assume that the function is differentiable, but instead relies on Gaussian smoothing of the original function. Typical blackbox strategies evaluate the gradient of the Gaussian smoothing via MC techniques, then use typical first order optimization. However, sampling these gradients in high dimension is quite costly, which the article proposes to improve. To do so, it makes the assumption (validated by Fig1) that the gradients along the optimization process lie on a lower dimension subspace, the "active subspace", paving the way for cheaper evaluation. A first idea is to estimate these subspaces via PCA of the last s (parameter) estimated gradients, but s and the dimension of the PCA are difficult to tune. Instead, ASEBO uses bandit techniques to balance exploitation (sampling gradient along the active subspace) and exploration (sampling orthogonal to this set, via compressed sensing). The proposed methodology is sound, theoretical guarantees are provided and experiments seem to show that ASEBO needs fewer function evaluations than other existing approaches. Overall, the paper is easy to read even with mild knowledge about bandit techniques. A few points can be improved: The formulas in Algorithm 2 are not very intuitive and could benefit from better explanations. Several expressions are made clear long after they have been used for the first time: - The meaning "learn the bias of the low dimensional model" (L53) is only explained in 2 and was not clear to me before. - Similarly, the "hardness of the optimization problem" is not made clear from the beginning. - active subspace is explained L147 only. - When you speak about covariance matrix, you are speaking about the covariance of the gradients, which was not obvious to me until ~ L131. These could be put in Definition environments, and links to these defs given the first time the expression is mentioned, like "See formal definition in Def X" Cosmetic remarks: L131 you use lambda for decay rate, but gamma in Algorithm 1 hyperparams (better IMO since lambda is already taken for the eigenvalues of Cov_t) Technically lambda_i should be lambda_i,t since they depend on cov_t (nitpick but might be more explicit. otherwise r does not depend on t) L210 epsilon clashes with the epsilon of PCA used in Algo 1.
Reviewer 3
This work primarily introduces the methods of adaptive subspaces to the standard continuous control benchmarks through a BBO scheme. This introduction is particularly useful as these search methods have been proven competent in this setting; more efficient methods can open up new doors in related fields such as meta-learning, etc. The authors are through in their introduction of the method as well as their theoretical discussions and clearly discuss the active subspace methods, accurately relating it to previous work.