NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper contributes new algorithmic ideas and theoretical results for regret minimization in Markov Decision Processes with known transition kernels but arbitrary cost functions. The reviewers broadly agree that the theoretical and algorithmic techniques introduced by the paper -- using the FTRL online learning idea and the extension to large MDPs via linear function approximation -- are novel, and thus the paper deserves to be published; however, the known-MDP-unknown-cost setting may be somewhat narrow in its applicability in practice.