Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh


In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-th moment is bounded by a constant nu_p for 1