Action-Gap Phenomenon in Reinforcement Learning

Farahmand, Amir-massoud

Action-Gap Phenomenon in Reinforcement Learning

Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

Bibtex Metadata Paper SpotlightSlide

Authors

Amir-massoud Farahmand

Abstract

Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy (\hat{\pi}) with respect to an action-value function (\hat{Q}), the performance loss (E[V^(X) - V^{\hat{X}} (X)]) is upper bounded by (O(|| \hat{Q} - Q^||_\infty^{1+\zeta})), in which (\zeta >= 0) is the parameter quantifying the action-gap regularity. For (\zeta > 0), our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.

Action-Gap Phenomenon in Reinforcement Learning

Authors

Abstract

Name Change Policy