Risk Aversion in Markov Decision Processes via Near Optimal Chernoff Bounds

Part of Advances in Neural Information Processing Systems 25 (NIPS 2012)

Bibtex Metadata Paper Supplemental

Authors

Teodor Moldovan, Pieter Abbeel

Abstract

The expected return is a widely used objective in decision making under uncer- tainty. Many algorithms, such as value iteration, have been proposed to optimize it. In risk-aware settings, however, the expected return is often not an appropriate objective to optimize. We propose a new optimization objective for risk-aware planning and show that it has desirable theoretical properties. We also draw con- nections to previously proposed objectives for risk-aware planing: minmax, ex- ponential utility, percentile and mean minus variance. Our method applies to an extended class of Markov decision processes: we allow costs to be stochastic as long as they are bounded. Additionally, we present an ef´Čücient algorithm for op- timizing the proposed objective. Synthetic and real-world experiments illustrate the effectiveness of our method, at scale.