Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback »Bibtex »MetaReview »Paper »Review »Supplemental »


Kyungjae Lee, Hongjun Yang, Sungbin Lim, Songhwai Oh


In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-th moment is bounded by a constant nu_p for 1