#### Authors

Liam Dermed, Charles Isbell

#### Abstract

Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games to within $\epsilon$ relative error of the optimal game-theoretic solution, in time polynomial in $1/\epsilon$. Our algorithm extends Murrays and Gordon’s (2007) modified Bellman equation which determines the \emph{set} of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts.