# NIPS Proceedingsβ

## Action-Gap Phenomenon in Reinforcement Learning

[PDF] [BibTeX] [Spotlights]

### Abstract

Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an agent following the greedy policy $$\hat{\pi}$$ with respect to an action-value function $$\hat{Q}$$, the performance loss $$E[V^*(X) - V^{\hat{X}} (X)]$$ is upper bounded by $$O(|| \hat{Q} - Q^*||_\infty^{1+\zeta}$$), in which $$\zeta >= 0$$ is the parameter quantifying the action-gap regularity. For $$\zeta > 0$$, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms.