Loading [MathJax]/jax/output/CommonHTML/jax.js

Planning in entropy-regularized Markov decision processes and games

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Authors

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order ˜O(1/ϵ4) for a desired accuracy ϵ, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.