NIPS Proceedingsβ

RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

Part of: Advances in Neural Information Processing Systems 27 (NIPS 2014)

[PDF] [BibTeX] [Supplemental] [Reviews]


Conference Event Type: Spotlight


We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the gamma-discounted infinite horizon performance loss by a factor of 1/(1-gamma) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.