NeurIPS 2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Meta Review

While this paper initially had some mild divergence of opinion among the reviewers, after the author response and some detailed discussion, it was agreed that this paper makes a solid contribution (please see the revised reviews). It is certainly is of relevance to NeuRIPS. After discussion, there was agreement on the significance of the conceptual contribution, namely the treatment of the cross-component bonuses. Several reviewers note that the mathematics is fairly “standard” (Bernstein-bound machinery), though in the end that should not be considered a drawback. At least one reviewer notes that the 31pp appendix means that it is not possible to verify the mathematical results during the review period. (That said, I’m not sure that much can be done to address this is in the revised paper!) The reviews, including their initial pre-rebuttal critique, suggest several ways on which the paper can be improved to maximize its impact. The author response did a good job of conveying the significance of the results, and the author(s) is/are strongly encouraged to revise the paper to ensure these points are not lost on future readers. Please incorporate other reviewer suggestions as appropriate also. Solid contribution!