Reviews: Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

The paper makes conceptual, algorithmic and theoretical contributions to optimization in multi armed bandits. It introduces a new problem motivated by settings where there is a testing phase with a cost structure proportional to the utility derived from playing each alternative, and gives a novel algorithm with a sample complexity bound for identifying a near-best arm. All the reviewers agree that the paper's contributions as above are significant. The only concern expressed was about the potentially narrow scope of the problem formulation, but the author feedback has helped in clarifying this aspect. It appears that the learning problem studied here, i.e., its cost structure specifically, could emerge in settings beyond e-commerce and advertising, and is likely to be of broader interest.

Paper ID:	1663
Title:	Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits