Action-Gap Phenomenon in Reinforcement Learning

Farahmand, Amir-massoud

Action-Gap Phenomenon in Reinforcement Learning

Amir-massoud Farahmand

Advances in Neural Information Processing Systems 24 (NIPS 2011)

Abstract

Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy (\hat{\pi}) with respect to an action-value function (\hat{Q}), the performance loss (E[V^(X) - V^{\hat{X}} (X)]) is upper bounded by (O(|| \hat{Q} - Q^||_\infty^{1+\zeta})), in which (\zeta >= 0) is the parameter quantifying the action-gap regularity. For (\zeta > 0), our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.

Abstract

Name Change Policy