Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Edit after author response <<< No concerns after respons and proposed changes for camera ready. I would still like to see OOMD results in addition to the discussion proposed in the response. >>> Given the amount of theory, the paper was fairly clearly written. It extends the recent optimistic regret minimization framework to sequential decision making strategies (treeplexes) -- extending the state of the art for a moderately popular theoretical framework. The paper uses the optimistic regret framework with two saddlepoint solving algorithms, and compares the game solving algorithms to a currently used algorithm CFR+, showing at least one game instance suggesting CFR+ is asymptotically worse in the worst case.
I think the method is interesting, since the use of dilated distance functions it allows interpreting the Mirror Descent type iteration as one of minimizing local regret (at the cost of growing the state space). In addition, a theoretical result of strong convexity is obtained in the case of the Euclidean norm. As far as I can see, the mathematical arguments seem correct. However, I have two questions, regarding the results: -I do not understand where is used the flexibility given by the choice of the $m^t$. This is important, since part of the theoretical work consists in adapting some known results to that case. What kind of vector did you use in these cases? Without further use, it seems that this have been done because it was technically feasible. -The fact that OOMD-type methods do not work in deep games (but those of the OFTRL type do) implies that decomposition as local minimization does not seem to have an important experimental effect. So what could be the advantage of such idea in the general case? %%%%% After Rebuttal After reading the authors' responses and looking at the others reviewers comments and discussions, I agree to change my score. This is a good submission.
------------ Edit after reading the other reviews and the rebuttal: Thank you for answering my questions. I have no concerns. ------------ I have a strong background with CFR / CFR+, but not any in accelerated methods, optimistic regret matching, or the other theoretical components of this work. Through that lens, I liked this paper. This paper considers the application of optimistic regret-minimization algorithms to extensive form games. It follows a similar setup as CFR, the current SOTA algorithm for solving large games, which decomposes the overall regret-minimization problem into independent subproblems at each information set. Theoretically, the new algorithms should have faster convergence than the CFR+ variant of CFR. Empirically, the authors show that these methods converge dramatically faster than CFR+ in small games, but slower in the slightly larger game of Leduc hold'em. I found that the paper was clearly motivated and described. To my knowledge it is novel, and significant: CFR / CFR+ has had a long run as the SOTA algorithm for solving large extensive games, and any technique (such as this one) with stronger theoretical bounds and faster convergence (even if only in smaller games) is of interest. The paper is well structured with ample discussion of each component. While I'm sure that I didn't appreciate all of the details without a theoretical background in this area, I think I could probably implement it from this description. Overall I don't have any substantive issues with this work. I've listed some questions in the Improvements section.