Finding All $\epsilon$-Good Arms in Stochastic Bandits

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review Supplemental

Authors

Blake Mason, Lalit Jain, Ardhendu Tripathy, Robert Nowak

Abstract

The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an $\epsilon$-good arm, best-arm identification, top-$k$ arm identification, and finding all arms with means above a specified threshold. However, the problem of finding \emph{all} $\epsilon$-good arms has been overlooked in past work, although arguably this may be the most natural objective in many applications. For example, a virologist may conduct preliminary laboratory experiments on a large candidate set of treatments and move all $\epsilon$-good treatments into more expensive clinical trials. Since the ultimate clinical efficacy is uncertain, it is important to identify all $\epsilon$-good candidates. Mathematically, the all-$\epsilon$-good arm identification problem is presents significant new challenges and surprises that do not arise in the pure-exploration objectives studied in the past. We introduce two algorithms to overcome these and demonstrate their great empirical performance on a large-scale crowd-sourced dataset of $2.2$M ratings collected by the New Yorker Caption Contest as well as a dataset testing hundreds of possible cancer drugs.