NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Originality: The paper provides a novel extension of the Structured Procrastination (SP) algorithm together with a novel theoretical analysis. The algorithm seems to be a clear improvement over previous work. Quality: The authors describe and analyze their approach in detail. I would have like to see Figure 2 discussed in more detail, as it is the main empirical evaluation of the approach. I was surprised to see the deep dip around 10 CPU days that is not reached again until much later. Is there an explanation or interpretation of this? Clarity: The paper is written clearly and the algorithm, though somewhat complex, is described in approachable terms. Significance: The proposed algorithm improves over existing work in that it is both anytime and adaptive. In contrast to LB, it does not require specifying additional parameters, which makes the algorithm more practical and easier to use. This looks like very solid work, but I'm not very familiar with the area and did not attempt to follow the proofs. I have read the author response.
Reviewer 2
This is a theoretical paper extending Structured Procrastination to make it anytime. The paper is easy to follow, but I'm unfamiliar with the related work that it builds on and therefore I'm ill equipped to judge the merits of the paper in the context of existing work (e.g., novelty of the proof techniques).
Reviewer 3
This paper studies provable guarantees for algorithm configuration. Let’s say you need to solve a series of closely related computational problems that are drawn i.i.d. from an unknown distribution (e.g., SAT instances from some specific application domain). You have a number of different algorithms – or “configurations” -- you could use for this task (potentially infinitely-many), and you want to figure out which configuration has the smallest expected runtime, where the expectation is over the unknown distribution. In the past few years, three papers with provable guarantees have come out on this topic, one by Kleinberg et al. [2017] and two by Weisz et al. [2018, 2019]. This paper builds directly on the former, which introduced an algorithm called Structured Procrastination (SP). The authors of the submission call their new algorithm Structured Procrastination with Confidence (SPC). To describe the guarantees that come with SPC, let OPT be the smallest expected running time over all configurations. Given a configuration i and a problem instance j, let R(i, j) be the time it takes to solve the instance j using the configuration i (this might be infinite). Given a runtime cap t, let R(i, j, t) = min{t, R(i, j)}. A configuration is (epsilon, delta)-optimal if there exists a runtime cap t such that for all but a delta fraction of the problem instances R(i, j) <= t, and E[R(i, j, t)] <= (1 + epsilon) * OPT. One of the main differences between this work (as well as SP) and the work by Weisz et al. [2018, 2019] is that the former provides an "anytime algorithm" but the latter do not. The longer you run SP and SPC, the algorithms return configurations that are (epsilon, delta)-optimal for better and better values of epsilon and delta. Meanwhile, the work by Weisz et al. [2018, 2019] require epsilon and delta as input. The authors write that the main issue with SP is that it runs each configuration for long enough to accurately estimate its runtime to within a (1 + epsilon) factor. However, all you really need is to be able to separate the good configurations (which are (epsilon, delta)-optimal) from the bad configurations (which are not (epsilon, delta)-optimal). The authors identify the bad configurations via a new lower confidence bound (bottom of page 5) on each configuration’s expected running time. This means of computing the LCB seems technically novel, though a bit hard to wrap one’s head around. The authors make a reasonable effort to explain the intuition behind the formula on page six, and Figure 1 is especially helpful. It would be great to have more concrete comparisons with SP. For example: - Can you give a simple, but concrete, example of an algorithm configuration problem where SPC is much better than SP? - How exactly do the runtime bounds for SP and SPC compare? Here's the difference, from my understanding: Let’s say that n is the number of configurations. From what I understand, SP will terminate with an (epsilon, delta)-optimal configuration given running time OPT * n / (delta * epsilon^2). This is similar to the runtime bound provided in Theorem 3.3, but instead of multiplying each of the n configurations by 1 / (delta * epsilon^2), we multiply each suboptimal configuration i by 1 / (delta_i * epsilon_i^2), where epsilon_i and delta_i are chosen such that configuration i is (epsilon_i, delta_i)-suboptimal and 1 / (delta_i * epsilon_i^2) is as small as possible. - It would be great to compare SPC against SP in the experimental section, rather that just the algorithm LeapsAndBounds [Weisz et al., 2018]. - SP takes as input a parameter epsilon, and returns both a configuration and a value delta such that the configuration is (epsilon, delta)-optimal. In contrast, SPC doesn’t take as input epsilon, and it doesn’t return any delta --- it only returns a configuration. Is there any hope that SPC could be modified to return epsilon, delta, and a configuration that is (epsilon, delta)-optimal? It would also be great to have a more robust comparison (at least in terms of the theoretical guarantees) between SPC and LeapsAndBounds [Weisz et al., 2019], which appeared in ICML 2019 (and is concurrent work). Smaller comments: - In line 7 of the pseudocode, I think it should be C_i.ExecuteStep() - In the caption of Figure 1, it would be helpful to use the same notation as the rest of the paper (e.g., G(x) instead of \hat{f}(x))