Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper extends the i.i.d. best arm identification problem to a special Markovian setting. The technical tool is a new Chernoff-type inequality for a special class of Markov chains, which allows transferring the method for the iid problem to the considered Markov case. While the reviewers found the results interesting, they raised a number of questions, which should be addressed in the final version. These, include, among others, requiring a better motivation (note that the motivation given in the response does not apply, since there the chain is not ergodic), including the limitations of the reward models, an argument why some it is not a problem that some extra information about the reward model is needed for the algorithm, etc.