Review for NeurIPS paper: A mean-field analysis of two-player zero-sum games

NeurIPS 2020

A mean-field analysis of two-player zero-sum games

Review 1

Summary and Contributions: In this paper, the authors consider the problem of computing Nash equilibria for two-player zero-sum continuous games. In particular, the focus is on games with the loss function does not have the requisite convex-concave structure, and therefore, computing pure strategy Nash is computationally hard. Allowing mixed strategy Nash equilibria is similar to lifting non-convex problems into measure space to convexify them. Mixed Nash equilibria exist in the space of measures; therefore, the mixed strategy equilibria can be computed in polynomial time using mirror descent. That said, mirror descent is very inefficient for high dimensional problems. The approach taken in this paper is to parametrize mixed strategies as a weighted combination of delta functions supported on specific points (strategies), and then compute the optimal set of weights and positions. The problem is non-convex in this new parametrization, and so one has to develop new methods to establish the converge to the global optimum. The authors propose two algorithms for solving this problem -- one that uses noisy gradient descent to escape local minima (Alg 1) and another that uses gradient descent to update the position variables, and a multiplicative weight update for the weights (Alg 2). They show that these algorithms are plausible by first considering convergence properties of the the dynamics induced on the space of measures, and then establishing a law of large numbers to show that the finite particle algorithms are close to the continuum limit. They show the computationally efficiency of their method on a set of synthetic problems, and on GANs, both with synthetic and real life data.

Strengths: This is an interesting paper in that it proposes a non-convex approach to solving a problem that is convex -- the mixed Nash equilibrium or equivalently pure strategy Nash equilibrium over the space of measures is convex. The claim is that this non-convex problem converges faster than the canonical convex optimization approach, i.e. the mirror descent algorithm, for high dimensional problems. The authors show how to map the finite particle algorithms to corresponding dynamical system over the space of measures, and then establish that the dynamical system converges to an approximate Nash equilibrium in finite time. The dynamical system has connections to optimal transport. The proposed methodology is novel, and the possibility of interesting results in the future. This paper is clearly of interest to the NeurIPS community -- both from the perspective of the problem and the proposed methodology.

Weaknesses: There are several issues with this paper: 1) Why are mixed strategies of interest in the GAN setting where the goal is to compute a parameter setting for the forward NN g and the discriminator network f? The mixed strategy approach will compute a distribution over the parameters of the forward and discriminator networks -- how is one to interpret this? More generally, if the game of interest is truly non-convex (i.e. does not satisfy the proper convexity/concavity properties) relaxing the game to allow for mixed strategies convexifies the problem. But one is either left with the problem of generating a solution for the original game, or if one is okay with mixed strategies, then the game is convex to begin with. So, it is not clear what the authors mean when they say that they are addressing non-convex games? 2) Alg 2 appears to reduce to gradient descent in the x space, and multiplicative weight update in the w space. This may be more than just a passing resemblance since the multiplicative weight algorithm is also related to entropy smoothing -- the same approach used here. The authors should explore this connection more fully. 3) Given comment 2, is it really necessary to go down the route of posing the problem over measure spaces? Can one not analyze the finite particle problem directly. Put another way, what is the advantage of introducing the measure theoretic machinery if in the end one is analyzing a finite particle system? In particular, when, by the author's own admission, the convergence results for these methods are very weak -- finite time convergence and no rates, and also slow convergence because the entropic regularization dominates over the drift term. 4) In the very beginning of the introduction, the authors appear to be confused between multi-objective and multi-agent optimization. These are two, very different, problems. This paper is about multi-agent problems and not multi-objective problems.

Correctness: The claims and methods, to the best of my knowledge, appear to be correct.

Clarity: Reasonably well written -- not withstanding the confusion between multi-agent and multi-objective problem as detailed above. That said, the paper is more of a list of results, with no intuition as to why the proposed methodology is appropriate.

Relation to Prior Work: The paper approaches problem from a GAN/two person game perspective. However, there is a literature on regret minimization over experts and online optimization that also results in similar algorithms. The authors should discuss how their work fits with that literature.

Reproducibility: No

Additional Feedback: I read through the other reviews (to understand the issues raised there) and also the author responses. The responses to the comments are quite generic. For example, in response to the comment seeking a connection to previous work on experts, the authors responded by referring to previous work on the Nash equilibrium problems! Similarly, in response to the comment about scaling the algorithm to larger datasets, I read the response as stating that algorithm is not likely to scale well. I continue to think that the direction proposed in this paper is interesting, however, the authors need to address the issues raised in the reviews before I would be inclined to move my review from "accept" to something stronger.

Review 2

Summary and Contributions: This paper focuses on two player zero-sum continuous games and provide convergence guarantee for Wasserstein gradient descent-ascend dynamical systems in looking for mixed Nash equilibrium. It is an interesting submission in the sense that there are very few work in the literature dealing with manifold games from algorithmic perspective.

Strengths: This paper provides a concrete framework for two-player zero-sum games using all necessary techniques i.e. Fokker-Planck as the foundation, Langiven descent-ascent and Wasserstein descent-ascent as two schemes of discretization. Although natural to come up with these approach, but there are some nontrivial generalization from classic potential games to manifold zero-sum games in proving uniqueness of the solution.

Weaknesses: My main concern is the approximation of integrals for each step of update. It does not seem to be clear to me that there can be a good threshold for sample complexity in Langevin descent ascent at step 0. This seems difficult but it is a very critical issue. Another problem is that the authors did not provide explanation on the assumption of X,Y to be closed manifolds. This excludes even the case of Euclidean spaces, and a reasonable amount of applications questions focuses on manifolds with boundary, simplex, polytopes. Is boundaryless a must condition to ensure the results or just to annihilate boundary integral? If the technique can be used to manifold with boundary, there might be more application potential.

Correctness: The proofs are very detailed and carefully written.

Clarity: Yes, well written and the main idea is easy to follow.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: For the proof of uniqueness, theorem 4, is that possible to apply the current technique to potential games with independent playing, which is a direct analogy of Rosen 1965, for certain locally uniqueness in P(M_1)\times ...\times P(M_n)?

Review 3

Summary and Contributions: This paper focuses on the two-player zero-sum continuous games, and reframe it from the OT perspective. The optimization strategy is investigated after parametrizing mixed strategies as mixtures of particles. Theoretical properties such as the convergence are proved. Experiments on synthetic data show the proposed method can be incorporated with the GAN training.

Strengths: This paper analyzes the two-player zero-sum games from a novel perspective. Although the mirror descent could not solve the Mixed Nash equilibria in high dimensions, the authors turn to the OT measure, which leads to geometric benefits. Theoretical results establish approximate mean-field convergence. Empirical results show the proposed WFR-DA is stable and efficient.

Weaknesses: 1. For the experiments, the authors should apply the method on large-scale datasets, and test with different dimensions (model size). The complexity of the proposed method should also be analyzed. 2. It's better to show more applications of the proposed analysis. For example, training "robust models".

Correctness: It seems the claims and method are correct. Do not check the proofs in detail.

Clarity: Since there are a lot of theoretical proofs in the paper, the authors can summarize the main claims of the theorems and thread them for better understanding.

Relation to Prior Work: The related methods are cited and discussed in the paper.

Reproducibility: Yes

Additional Feedback: This paper provides a novel analysis of the two-player zero-sum games and proposes strategies to solve it. There are two main concerns: 1. The high dimension is one of the main problems of the vanilla approach. The authors should clarify the dimension (e.g., to which scale) and show some comparisons on the efficiency/stableness with the previous methods. 2. Parametrizing mixed strategies as mixtures of particlesis is indeed a novel way to solve the problem. However, the main advantage of such a view should be emphasized and stressed in the GAN experiments since there are many choices to train such models.