NeurIPS 2020

Finite-Time Analysis of Round-Robin Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

Meta Review

This paper makes a good theoretical progress in the Markovian bandit. Still, the rebuttal to the raised concerns (the model is artificial, the use of round-robin manner is not novel and has cost in finite-time, etc.) is not convincing although the theoretical contribution overtakes them. Thus we expect that the true contribution is presented clearer in the coming version.