RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning[PDF] [BibTeX] [Supplemental] [Reviews]
Conference Event Type: Spotlight
We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the gamma-discounted infinite horizon performance loss by a factor of 1/(1-gamma) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.