NIPS Proceedingsβ

Policy gradients in linearly-solvable MDPs

Part of: Advances in Neural Information Processing Systems 23 (NIPS 2010)

[PDF] [BibTeX] [Supplemental]

Authors

Abstract

We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.