NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper considers the multiplayer (stochastic) MAB problem in which M players compete over K > M arms, where the reward of an arm is sampled i.i.d. unless there is a collision (two or more players pull the same arm) in which case the reward is zero. The authors give new algorithms with regret bounds comparable to the best existing bounds for centralized algorithms. The reviewers unanimously agreed (and I concur) that the results are significant and the paper is well-written. A clear accept.