NeurIPS 2020

An Optimal Elimination Algorithm for Learning a Best Arm


Meta Review

This paper studies the PAC best-arm identification problem where an algorithm seeks to return an arm with a mean within epsilon of the best with probability at least 1-delta using as few total pulls as possible. It was known almost a decade ago that this was possibly with O(n epsilon^{-2} log(1/delta) ) pulls using the celebrated median elimination algorithm. However, median elimination has a constant in the thousands and is not practical. This paper seeks the optimal constant as n -> infinity. While finite-sample results are proven, the asymptotic regime is of interest because it is shown in the minimax regime that the leading constant in their upperbound is tight. Authors raised concerns that this instance-independent sample complexity can be much larger than instance-dependent sample complexities, and there exist algorithms that achieve optimal instance-dependent sample complexities, at least asymptotically. However, the novelty of the solution and fundamental importance of the problem merit acceptance.