NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6497
Title:SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

Reviewer 1

Originality: This paper invents the trick of implicit communication through collisions for synchronization case and it is used in subsequent papers. Also, it proposes the first algorithm without synchronization, which is an unrealistic assumption in real case. Quality: The proofs seem to be right and it clearly support the theorem given and the claim that the existing lower bound has potential problem. Clarity: The paper is well-organized. Significance: The paper gives a useful trick for synchronization case which uses the information deduced from collisions and explains why the previous lower bound has potential problem. It also invents the first algorithm with log regret in dynamic setting, which is more realistic.

Reviewer 2

Originality: The paper studies the multiplayer bandit problem and is largely based on the Musical Chairs paper (Rosenski et al.). The fact that the players can use the collisions to their advantage, and that the resulting algorithm enjoys a similar regret bound to the centralized setting, is nontrivial. Quality: The results are sound. Clarity: The paper is well-written overall. However, the authors do not explicitly address the fact that SIC-MMAB2 and its analysis are only found in the supplementary material. Significance: The theoretical results are important as they contradict two previously known lower bounds. --- After rebuttal --- I have read the authors feedback and am satisfied with the fixes planned for the camera-ready version. I wholeheartedly suggest the papers acceptance.

Reviewer 3

This paper deals with the problem of multi-player MAB, where the players either get collision information or do not have this information, and when the setting is static (i.e. all players start and finish at the same time) or dynamic (where players can join during the time horizon at different time points). The algorithms proposed in the paper are very interesting and innovative, and the ideas are original. The paper is written very well, clear and organised. The theoretical guarantees of the first algorithm have profound implication on the lower bounds of the problem of decentralized MMAB, showing that the existing lower bound are incorrect, and that synchronization is in some sense equivalent to centralized control (since it allows communication). The second algorithm which deals with the more complex setting, pf a dynamic and no-sensing MMAB problem, is able to achieve logarithmic regret and offer interesting ideas as well. I found the paper very significant and interesting. In terms of the technical details - I did not go over the proofs thus I cannot asses the correctness of the analysis. General Comments: - line 37 - In line 15 you are defining X_k(t) as the reward of arm k at time t. Then you say the X_k(t) is "additional information" and not just the regular reward information in the MAB. Since you are only defining r(t) later, this is confusing. - lines 155-156 - "Notice that players even use their own quantized statistics to accept/reject an arm instead of the exact ones. Otherwise, the sets of accepted or rejected arms could differ between the players" - this sentence seem to be self contradicting... I am not sure I understand this - if players use their own quantized statistics, how do they not end up with different accept/reject decisions? I suspect there is a mistake in the phrasing of this sentence. - line 196 - you mention experiments you performed for which the results are in the appendix, maybe you can state what they were (in a sentence, no need for more). - When you state that your regret is close to the lower bound on the centralized case - it will be helpful if you also state what is that lower bound. In addition, when stating that existing lower bounds for the decentralized case are incorrect - please also note what they are so that it is easy to see where is the gap. Technical Comments: - line 148 - "During," -> remove "," -- Added after author feedback -- I have read the author feedback and am satisfied with their response. I leave my score as is and advocate for acceptance.