NeurIPS 2020

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms


Meta Review

All reviewers agree that the paper considers a problem of relevance (bandits with many arms) and shows interesting results about simple-to-implement learning algorithms based on the greedy principle. However, one lingering concern that arose during the discussions among the reviewers was whether/how the results obtained in the paper applied for the case when the number of arms is larger than the time horizon of the game (k >T). It appears that the author response to this question has not been substantial. Though I can see that this will not be an issue -- the proof of Lemma 2 bounds regret with respect to the best possible reward of 1, the author(s) is/are requested to add a precise clarification of this regime in the updated version.