Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper introduces a mean-field model of multiagent Q-learning in repeated symmetric games. The model assumes that at each time step each agent plays symmetric games with m other randomly chosen agents, and considers the limit of n, m to infinity. Under these settings the authors have derived the Fokker-Planck equation governing the time evolution of the distribution of the agents' Q-values. The review scores exhibited quite a large split. Two reviewers rated this paper well above the threshold, whereas Reviewer #1 rated it negatively. The main criticisms by Reviewer #1 are that this paper is derivative and incremental, and that the experiments are not sufficient. In my own view, however, the proposal of a mean-field model of multiagent Q-learning that allows description of its dynamics in terms of the Fokker-Planck equation is original enough as a theoretical contribution. I would therefore like to recommend acceptance of this paper, expecting that the authors will add citation to the already published work on multiagent RL in the mean-field setting, with discussion to put the contribution of this paper in view of them, as mentioned by Reviewer #1.