Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper examines the stationary points of the control policy space in zero-sum linear-quadratic games, and show that they are Nash equilibria. The authors also propose three nested gradient methods for policy optimization in such games, with theoretical proofs of convergence. This paper was flagged early on as potentially similar to submission #5637 "Provable Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case". For reference, the NeurIPS policy on this issue is that the originality of each submission must be evaluated within the group of potentially overlapping submissions; in other words, this submission should be evaluated as if 5637 had already been published, and vice versa. One of the reviewers assigned to this paper was also assigned to 5637 in order to report on possible similarities. In addition, another reviewer of this paper was asked to anonymously compare notes with a reviewer of 5637, and vice versa; neither reviewer had seen the other paper before submitting their review. In the author feedback phase, the authors explained that the author lists are not identical; they also pointed out the fact that this work is more fundamental than 5637, and that 5637 provides a model-free extension of this work. In the discussion phase, the reviewers were asked to re-evaluate their assessment as if 5637 had already been published, and the majority view was that this paper contains substantial theoretical contributions which are not present in 5637. [The paper was actually championed by the reviewer who was initially assigned both papers] After my own reading of the paper, I concur with the reviewers' assessment that this paper would make a worthwhile addition to the NeurIPS 2019 technical program and I recommend acceptance.