Approximating Shapley Explanations in Reinforcement Learning

Beechey, Daniel; Şimşek, Özgür

Approximating Shapley Explanations in Reinforcement Learning

Daniel Beechey, Ozgur Simsek

Advances in Neural Information Processing Systems 38 (NeurIPS 2025) Main Conference Track

Abstract

Reinforcement learning has achieved remarkable success in complex decision-making environments, yet its lack of transparency limits its deployment in practice, especially in safety-critical settings. Shapley values from cooperative game theory provide a principled framework for explaining reinforcement learning; however, the computational cost of Shapley explanations is an obstacle for their use. We introduce FastSVERL, a scalable method for explaining reinforcement learning by approximating Shapley values. FastSVERL is designed to handle the unique challenges of reinforcement learning, including temporal dependencies across multi-step trajectories, learning from off-policy data, and adapting to evolving agent behaviours in real time. FastSVERL introduces a practical, scalable approach for principled and rigourous interpretability in reinforcement learning.

Abstract

Name Change Policy