NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3526
Title:Nonstochastic Multiarmed Bandits with Unrestricted Delays


		
This paper significantly extends previous work in multi-armed bandits with delayed feedback by allowing for unbounded delays. While theoretical, this work should be practically relevant as it only uses bandit feedback, it avoids making stochastic assumptions on the losses, and (of course) it allows for delayed feedback, the latter being needed to capture many real-world problems. This paper generated a good amount of discussion, so I’ll mention the pros and cons. On the positive side, as mentioned above, this paper significantly extends previous work in multi-armed bandits with delayed feedback by allowing for unbounded delays. As the first paper to consider bandits with unbounded delays, an important restriction in all previous works has been relaxed, and given that this work is for non-stochastic bandits, there could be many practical applications. Additionally, the algorithmic approach is interesting and original, and the proofs appear to be technically correct. On the (somewhat) negative side, the way in which unbounded delays are handled is less than ideal. For one result, the total (cumulative) delay D needs to be known. For another result which avoids needing to know D, the delay is known at the time of pulling the arm. Relaxing this assumption to expected delay (or not needing to know this delay until observation time while still adapting to D and T) would be better. That said, the paper’s results and analysis still appear to be technically sophisticated, so it would be fine to leave these potential improvements for the future. One controversial negative aspect was the lack of experiments, about which one reviewer was very disappointed. However, this reviewer also expressed low confidence for their review, and I felt that the paper can stand on its theory; moreover, doing meaningful experiments (with meaningful baselines) is quite challenging here. While i.i.d. equal mean sequences can work well in the vanilla non-stochastic setting, in the delayed feedback setting this becomes less clear. This work will be a welcome addition to the proceedings. I ask the authors to please improve the writing / presentation as per Reviewer 1’s suggestions, as Reviewer 1 argued for acceptance assuming these would be improved.