Review for NeurIPS paper: Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

NeurIPS 2020

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

Review 1

Summary and Contributions: The paper characterizes a new, larger, subset of games where an optimal correlated equilibrium can be computed in polynomial time, using the concept of triangle-free games. The polytope of von Stengel and Forge, V, can be decomposed and an algorithm built on that decomposition. In the case of triangle-free games, V is shown to be equivalent to the set of correlation plans.

Strengths: The set of triangle-free games is a non-trivial extension of the class of extensive-form games where it is known to be possible to compute an optimal correlated equilibrium: it now includes games with some types of chance events - even including some (variants of?) real world games.

Weaknesses: While there was nothing in particular wrong with the experiments, I wasn't sure what I was supposed to take away from the experiments. Maybe also have some explanation of how decomposing the polytope now allows it to be mapped?

Correctness: Yes, with the caveats mentioned above w.r.t. experiments.(maybe some explanation of how decomposing the polytope now allows it to be mapped?)

Clarity: The paper is generally well written, but I found myself getting confused between whether an actual value is being assigned to a policy. For example, in the First Example, "Then, this value is split arbitrarily into the two (non-negative) entries v[∅, 1], v[∅, 2] so that v[∅, 1] + v[∅It was not impossible to read, but , 2] = v[∅, ∅]" constructing one specific policy, while the describing the (general version of the) operation as the step in constructing the polytope. The later usage of terms like "filled in" in the algorithm and its description are then harder to immediately parse: as far as I can tell values for sequences are not actually being filled in, but instead we are marking down that a particular constaint about filling in values has now been added to the scaled-extension polytope. I found the boxes in Figure 2 slightly confusing as well. Following the text for the leftmost example, going from v[0,1] to v[1,1] + v[2,1] and v[3,1] + v[4,1], I would have expected to see a longer column in the figure for those fields. (Or conversely, given the figure, I would have expected to go from v[0,1] and v[0,2] to v[1,1]+v[2,1] and v[1,2]+v[2,2].) The connection between the boxes and what is going on should be made more clear. Consider using some of the space from the experiment section to expand section 3. For example, a sketch of the argument for Theorem 3 seems more informative than knowing that Gurobi struggles to solve the LP based on the decomposition (when this does not really seem to be part of message of this paper.)

Relation to Prior Work: Yes, the distinction is clear, and appears to cite relevant related works.

Reproducibility: Yes

Additional Feedback: line 244 "topmost game in Figure 2" leftmost ---- comments after feedback Thank you for the response. It does clarify what the experimental section is trying to show, and I think making those points explicit would also help the paper. a) the proof says you can safely run an LP based on V rather than Xi -- which you could have done before, just without a theoretical argument of soundness. b) one could actually implement and run the decomposition algorithm that is part of the proof, whichis important because of (c) c) the regret-based alogrithm, which *does* require the decomposition, is (potentially much) faster than the LP algorithm.

Review 2

Summary and Contributions: The paper builds on work of on tractable computation of EFCE in two player games. Prior art permitted (tractable) solving two player extensive form games without chance moves. This work extends this capability to games with public chance moves - and more generally to two player EFGs that satisfy a "triangle-freeness" condition. The authors provides proofs, algorithms, and empirical validation of their methods.

Strengths: The paper story is clear and well motivated, and represents an exciting complexity result in EFGs. The figures are beautiful. Empirical evaluation is limited but perhaps sufficient to support the result. I think this is relevant (if a bit niche) to the NeurIPS community, particularly as a lot of the prior work was published in NuerIPS. All in all, a good submission - I enjoyed reading it and learned a thing or two!

Weaknesses: I found the algorithm a little intimidating. Although this cannot always be helped - I feel there could be some clarity improvements. I think the algorithm as presented in the appendix in long form was clearer. I have some nitpicks about the broader impact section (I expand on this in additional feedback). I also think some modest expansion of the empirical results would be of benefit to some readers (again I expand on this in additional feedback).

Correctness: Theoretical claims appear to be correct. Empirical results are plausible. Appendices were not checked for correctness.

Clarity: The technical contributions in this paper are complex. While I feel the authors to try to lead the reader in gently, there is a wall of notation and algorithmic complexity to digest that is daunting. I found the examples and accompanying figure very helpful, and the triangle-freeness easy to understand with the aid of these examples. The algorithm in the main paper could be formatted more clearly. I think the appendix does a better job here. Some undefined values appear (lower case xi) and annoyingly there are differences in notation between main paper and appendix (eg xi to v). I feel that the more complex the arguments a paper makes, the more effort should be expended in optimizing for clarity, otherwise it risks intimidating readers. As it stands, the paper is fairly intimidating and there is room for improvement in terms of clarity. Low hanging fruit: algorithm formatting, pedantic defining of variables, consistency of variables between main paper and appendix. At a high level, I found the story and motivation to be clear, and understand how it fits into the bigger picture.

Relation to Prior Work: Prior work and novel contributions are clearly marked. The paper claims boldly that this is "the biggest positive complexity result surrounding extensive-form correlation in more than a decade" - I have ran this by my colleagues and believe this claim to be true.

Reproducibility: Yes

Additional Feedback: I can see why the authors would call the condition "triangle free", but favoring explicitness, can the authors add a sentence or two giving intuition for the name? Decompose algorithm: can this be presented more clearly? If the loops are nested, can they be indented? I found the phrase "we branch on Player 1" / branching confusing. I don't think the cited paper [12] uses the term "branch" in it either. I also find the sudden use of lower-case Xi jarring - is this a typo? (ninja edit: algorithms in appendix are clearer). Line 347: can the authors also recommend/cite an open source or free LP solver that might also be able to scale to similar sized games to aid reproducibility? Broader impact: Although I think the authors have struck a good balance in this section, and I agree broadly with the potential positive/negative impacts that the authors suggest, I take slight issue with some of the phrasing. Firstly, it is implies that CEs do not have an equilibrium selection problem "to select, among the infinite number of correlated equilibria of the game, *one* that maximizes a given objective". There is still an equilibrium selection problem in constant-sum games, or general-sum games with a linear objective, and, for example, in the example in Figure 3 shows this where there is a frontier which can trade off payoff between the players (the authors do later mention this in the last sentence). Secondly, I dislike using the term "maximize social welfare", (from "For example, this technology could be used to find correlated equilibria than maximize social welfare, leading to highest societal good".). Although I recognize that "maximum social welfare" is a technical term used in the literature, I feel it is a misnomer, particularly in the context of a section discussing potential ethical impact. For example, even if we agreed that equilibria that provide "highest societal good" satisfy linear objective functions (something I do not believe), all the solutions on the frontier in Figure 3 are technically "maximum social welfare" solutions as defined by the literature - but I don't think you will find many people agreeing that all those equilibria are equally "leading to highest societal good". Can lower k-value Goofspiel Performance results also be added please? Although I understand that the point of the figure is to show that the problem is now tractable - for others seeking to reimplement the idea, having a baseline to check against that doesn't require 200GB of RAM and a commercial LP license would be helpful. On a similar line of argument, could the same plots vs iteration count be provided because wall-clock is hard to compare between machines when attempting to reproduce results. I am a little confused about the scales of the infeasibility. Am I right in thinking that there are 2.3e7 actions to spread mass over? And therefore if mass was spread uniformly each would get 1/2.3e7? And the maximum possible payoff is not more than 5? Why do the infeasibility constraints start around 10e2? And is 10e-8 only a little bit better than what one would obtain with a uniform distribution (to a first approximation)? Has Goofspiel been (CE) solved before (if so, please cite)? Can your approach solve it to an acceptable tolerance? If so, is this worth highlighting as a contribution? *** Thanks for your replies. I hadn't appreciated that the LP could be run (unsoundly) before, which is why I am reducing my score by one point. However I think there is a contribution here and think that extending [12] to public chance / triangle free games is a non-trivial result.

Review 3

Summary and Contributions: This paper studies the complexity of computing an optimal correlated equilibrium in two-player extensive-form games. It has been only known that an optimal correlated equilibrium in games without chance nodes can be computed in polynomial time. This paper shows that an optimal correlated equilibrium in triangle-free games, including games with public chance nodes, can be computed in polynomial time.

Strengths: This paper shows that an optimal correlated equilibrium in triangle-free games can be computed in polynomial time. To achieve this, this paper shows that the set of correlation plans of a triangle-free game coincides with the von Stengel-Forges polytope of the game based on a structural decomposition, where the polytope that only requires a polynomial number of linear “probability-mass-conserving” constraints. Sound theoretical results are provided.

Weaknesses: The novelty of this paper is very limited. The approach in this paper follows the existing work (Farina et al. [12] ) for games without chance nodes, and the techniques in both papers are almost the same except that this paper addresses specific features in triangle-freeness games by extending the techniques in Farina et al. [12]. Unfortunately, this extension is very straightforward. Actually, the key steps leading to the results in this paper are almost the same as the work in Farina et al. [12]. For example, Remark 1 in both papers raise the same problem for the decomposition algorithm; The definition of a triangle-freeness game in Definition 3 in this paper is almost the property of Proposition 2 in Farina et al. [12] except for simply adding I_1 \rightleftharpoons J_2. And the decomposition algorithm follows the one in Farina et al. [12] by handing the specific feature in the branching step. Then, it seems that this paper just tells us that the work in Farina et al. [12] is straightforward to extend to the triangle-freeness game. Triangle-freeness games are very special games. It is not clear whether there are many triangle-freeness games in the real world. *********************** Thanks for your response. I have read the author's response and my opinion remains the same because I still cannot see why the extension is not easy.

Correctness: Seems correct.

Clarity: Satisfied.

Relation to Prior Work: Clear.

Reproducibility: Yes

Additional Feedback: This paper highlights that triangle-freeness games are far more general games than games without chance moves, which needs more explanations. It will be better if illustrating how to avoid the problem in the triangle-freeness game. In Line 244, ‘the topmost game in Figure 2’ is not clear as three games are in the same row. In Line 349, the sentence ‘However, even that quickly becomes impractical’ is incomplete. What is ‘infeasibility’? A formal definition is better.

Review 4

Summary and Contributions: In this paper, the authors study the problem of computing several formulations of correlated equilibrium with an objective function in two-player sequential games in polynomial time. It is well known that these solution concepts can be computed efficiently in case the game does not contain random events, but the problem remains intractable if such events occur. The main result of the authors further refines this boundary by showing that if a game satisfies a so-called triangle-freeness condition, the solution can be found in polynomial time even in certain games with chance moves. In triangle-free games, the algorithm of the authors can represent the space of correlation plans using a polynomial number of linear constraints with a quadratic number of variables. In the experimental evaluation, the authors show that the algorithm can construct the representations even for sequential games of larger size very fast.

Strengths: The authors have done a good job in presenting their theoretical results. All notions are well described, and examples accompany the most important concepts so the reader can quickly get familiar with them. I find the results significant and novel, and I believe they will give rise to new algorithms for computing correlated equilibria in sequential games. Overall, I really enjoyed this paper, and I think it makes a valuable contribution to the literature relevant to the NeurIPS community.

Weaknesses: I am aware that the paper's primary focus is the complexity refinement, and the empirical evaluation is not a central topic. Still, in the end, the decomposition should serve to find correlated equilibria more easily. My main (but minor) concerns are hence related to the application of the introduced decomposition. First, did the authors try to compare their algorithm to the algorithm of Dudík and Gordon (A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games)? It would be interesting to see, e.g., the infeasibility over time for regret minimization vs. the algorithm of Dudík and Gordon. Second, you compute a feasible EFCE without an objective function, which is known to be a polynomial problem in any multi-player sequential game (by Huang, 2008) to start with, if I am not wrong. Therefore it does not take full advantage of your approach in this work. Could you explain why? Is it because you aim to compare regret minimization with linear programming, and the regret-minimization algorithm could not handle an objective? Did you perform any tests with EFCE with an objective function in triangle-free sequential games too? If yes, did you observe any drop in performance in terms of runtime when compared with EFCE without an objective function? Minor note: Could you make clear in Theorem 2 that s_i’s are simplex dimensions?

Correctness: I read through the proofs, and as far as I can tell, they seem correct. In the empirical evaluation, as expected, the authors chose sufficiently large triangle-free games with chance moves to show how their decomposition algorithm scales. Moreover, I am impressed that one-threaded regret minimization performs so well despite Gurobi running in 30 threads.

Clarity: The paper is very well written, and I did notice only a few typos (mentioned below). I read through all the proofs, and as far as I can tell, they seem to be correct. I find it impressive that despite sequential game theory being quite heavy on notation at times, the authors managed to explain their results without confusing the reader with too many symbols. Minor notes: 1/ In Equation 1, the first sum should iterate over A_{I_1}, not A_I and the second sum over A_{I_2}, not A_J. 2/ On line 244, you might be referring to a “leftmost” graph rather than a “topmost” graph. 3/ On line 566, set -> sets.

Relation to Prior Work: The paper describes well how the current research fits into the literature on the topic.

Reproducibility: Yes

Additional Feedback: ---- comments after feedback Thank you for answering my questions. I appreciate your comments on the last three points, regarding optimal correlated equilibria. I still have a few comments/nits, though: 1/ Please cite the work of Dudík and Gordon. It seems relevant to your work, given that literature on EFCE is still very scarce, even though their algorithm might be slow, numerically unstable, and is tailored for maximizing entropy. 2/ Huang gives an explicit LP for EFCE in Theorem 4 of [16] that enables to find optimal EFCE with a linear objective. It can indeed be used to compute optimal equilibria, but not in polynomial time.