{"title": "A Unified Framework for Extensive-Form Game Abstraction with Bounds", "book": "Advances in Neural Information Processing Systems", "page_first": 615, "page_last": 626, "abstract": "Abstraction has long been a key component in the practical solving of large-scale extensive-form games. Despite this, abstraction remains poorly understood. There have been some recent theoretical results but they have been confined to specific assumptions on abstraction structure and are specific to various disjoint types of abstraction, and specific solution concepts, for example, exact Nash equilibria or strategies with bounded immediate regret. In this paper we present a unified framework for analyzing abstractions that can express all types of abstractions and solution concepts used in prior papers with performance guarantees---while maintaining comparable bounds on abstraction quality. Moreover, our framework gives an exact decomposition of abstraction error in a much broader class of games, albeit only in an ex-post sense, as our results depend on the specific strategy chosen. Nonetheless, we use this ex-post decomposition along with slightly weaker assumptions than prior work to derive generalizations of prior bounds on abstraction quality. We also show, via counterexample, that such assumptions are necessary for some games. Finally, we prove the first bounds for how $\\epsilon$-Nash equilibria computed in abstractions perform in the original game. This is important because often one cannot afford to compute an exact Nash equilibrium in the abstraction. All our results apply to general-sum n-player games.", "full_text": "A Uni\ufb01ed Framework for Extensive-Form Game\n\nAbstraction with Bounds\n\nChristian Kroer\n\nComputer Science Department\n\nPittsburgh, PA 15213\nckroer@cs.cmu.edu\n\nTuomas Sandholm\n\nComputer Science Department\n\nPittsburgh, PA 15213\n\nsandholm@cs.cmu.edu\n\nAbstract\n\nAbstraction has long been a key component in the practical solving of large-scale\nextensive-form games. Despite this, abstraction remains poorly understood. There\nhave been some recent theoretical results but they have been con\ufb01ned to speci\ufb01c\nassumptions on abstraction structure and are speci\ufb01c to various disjoint types of\nabstraction, and speci\ufb01c solution concepts, for example, exact Nash equilibria\nor strategies with bounded immediate regret. In this paper we present a uni\ufb01ed\nframework for analyzing abstractions that can express all types of abstractions\nand solution concepts used in prior papers with performance guarantees\u2014while\nmaintaining comparable bounds on abstraction quality. Moreover, our framework\ngives an exact decomposition of abstraction error in a much broader class of games,\nalbeit only in an ex-post sense, as our results depend on the speci\ufb01c strategy\nchosen. Nonetheless, we use this ex-post decomposition along with slightly weaker\nassumptions than prior work to derive generalizations of prior bounds on abstraction\nquality. We also show, via counterexample, that such assumptions are necessary for\nsome games. Finally, we prove the \ufb01rst bounds for how \u270f-Nash equilibria computed\nin abstractions perform in the original game. This is important because often one\ncannot afford to compute an exact Nash equilibrium in the abstraction. All our\nresults apply to general-sum n-player games.\n\n1\n\nIntroduction\n\nGame-theoretic equilibria have played a key role in several recent advances in the ability to construct\nAIs with superhuman performance in games with imperfect information [5, 9, 32]. In particular these\nresults rely on computing an approximate Nash equilibrium [33] for the game at hand. In typical\nreal-world situations these games are so large that even approximate equilibria are intractable. Instead,\nthe dominant paradigm has been to \ufb01rst construct some smaller abstraction of the game, apply an\niterative algorithm for computing a Nash equilibrium in the abstraction, and map the resulting strategy\nback to the full game. This approach was used in the recent Libratus agent, which beat four top poker\npros in the game of heads-ups no-limit Texas hold\u2019em [9] (in addition to abstraction and equilibrium\napproximation the agent also utilized real-time subgame solving [8] and action abstraction re\ufb01nement).\nAbstraction has also been used in trading-agent competitions [39] and security games [1\u20133].\nIn practice, abstractions are generated heuristically with no theoretical guarantees on solution qual-\nity [4, 10, 14\u201316, 18\u201322, 24, 34, 36]. Ideally, abstraction would be lossless, such that implementing\nan equilibrium from the abstract game results in an equilibrium in the full game. Gilpin and Sandholm\n[17] study lossless abstraction techniques for a structured class of games. Unfortunately, lossless\nabstraction often leads to games that are still too large to solve. Thus, one must turn to lossy\nabstraction. However, signi\ufb01cant abstraction pathologies (nonmonotonicities) have been shown\nin games which cannot exist in single-agent settings: if an abstraction is re\ufb01ned, the equilibrium\nstrategy from that new abstraction can be worse in the original game than the equilibrium strategy\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\ffrom a coarser abstraction [37]! Lossy abstraction remains poorly understood from a theoretical\nperspective. Results have been obtained only for various restricted models of abstraction. Basilico\nand Gatti [3] give bounds for the special game class called patrolling security game. Sandholm\nand Singh [35] provide lossy abstraction algorithms with bounds for stochastic games. Brown and\nSandholm [6], Waugh et al. [38], Brown and Sandholm [7], and \u02c7Cerm\u00e1k et al. [12] develop iterative\nabstraction-re\ufb01nement schemes that have various forms of converge guarantees but they do not give\nsolution-quality guarantees for the original game for strategies computed in limited-size abstractions.\nResults which are for extensive-form games (EFGs) are most related to this work. Lanctot et al. [31]\nshow that the counterfactual regret minimization algorithm (CFR) converges to an approximate NE\nwhen run on an imperfect-recall abstraction that is a skew well-formed game (SWF) with respect to\nthe original game, where the error in the NE has a linear dependence on the number of information\nsets. Kroer and Sandholm [27] show that Nash equilibria and strategies with bounded counterfactual\nregret computed in chance-relaxed SWF (CRSWF) (a generalization of SWF that allows error in\nchance outcomes) are approximate NE in the original game, with a linear dependence on game-tree\nheight. Kroer and Sandholm [25] show that NE computed in perfect-recall abstractions that satisfy\nconditions that are similar to those in CRSWF abstractions are approximate NE in the original game\nwith a constant dependence on payoff error (as opposed to a linear dependence on height in Kroer and\nSandholm [27] or linear dependence on information sets in Lanctot et al. [31]). Kroer and Sandholm\n[26] extend the results of Kroer and Sandholm [25] to continuous action spaces.\nThe results in the previous paragraph are all for disparate models of abstraction, a speci\ufb01c solution\nconcept, or speci\ufb01c algorithm. Yet they share a common structure on the assumptions needed in\norder to obtain theoretical results. They assume that information sets (i.e., decision points) are\naggregated into larger information sets. All pairs of information sets that are aggregated together are\ncompared by de\ufb01ning a mapping between subtrees under the information sets. This mapping then\nrequires that the payoffs are similar, the distribution over chance outcomes is similar, and for pairs\nof leaves mapped to each other, the leaves have the same sequence of information-set-action pairs\nleading to them in the abstraction. Having similar payoffs and chance-outcomes under aggregated\ninformation sets is natural. However, the requirement that information-set-action pairs are the same\nfor leaf nodes mapped to each other is not satis\ufb01ed by the best heuristic abstraction algorithms used\nin practice [10, 14, 24]. In this paper we develop an exact decomposition of the solution-quality error\nthat does not require any such assumption. This is the \ufb01rst decomposition of solution-quality error\nresulting from abstraction. This decomposition depends on several quantities that prior results did\nnot (owing to its more general and exact nature). We then show that by making a weaker variant of\nprevious assumptions, our decomposition can recover all previous solution-quality bounds. We show\nvia counterexample that there exist games where the assumption on information-set-action pairs is,\nin a sense, necessary in order to avoid large abstraction error that is not measurable by the type of\ntechnique presented here and in prior work.\nFinally, we prove the \ufb01rst bounds for how \u270f-Nash equilibria computed in abstractions perform in\nthe original game. This is important because often one cannot afford to compute an exact Nash\nequilibrium in the abstraction. All our results apply to general-sum n-player games.\n\n2 Extensive-form games (EFGs)\n\nAn extensive-form game (EFG) is a game tree, where each node in the tree corresponds to some\nhistory of actions taken by the players. Each node belongs to some player, and the actions available to\nthe player at a given node are represented by the branches. Uncertainty is modeled by having a special\nplayer, Chance, that moves with some prede\ufb01ned \ufb01xed probability distribution over actions. EFGs\nmodel imperfect information by having groups of nodes in information sets, where an information set\nis a group of nodes all belonging to the same player such that the player cannot distinguish among\nthem. In the original game that we are trying to solve, we assume perfect recall, which requires that\nno player forgets information they knew earlier in the game. This is a natural condition since you\ngenerally cannot force players to forget information, and it would not be in their interest to do so.\nFormally, an extensive-form game  is a tuple (H, Z, A, P, \u21e10,{Ii},{ui}). H is the set of nodes\nin the game tree, corresponding to sequences (or histories) of actions. Hi is the subset of histories\nbelonging to Player i. Z \u2713 H is the set of terminal histories, or leaves. A is the set of actions in the\ngame. AI denotes the set of actions available at nodes in information set I. P , the player function,\nmaps each non-terminal history h 2 H \\ Z to {0, . . . , n}, representing the player whose turn it is to\n\n2\n\n\fmove after history h. If P (h) = 0, the player is Chance. \u21e10 is a function that assigns to each h 2 H0\nthe probability of reaching h due to Chance (i.e., assuming that both players play to reach h). An\ninformation set Ii, for i 2{ 1, . . . , n}, is a partition of {h 2 H : P (h) = i}. The utility function ui\nmaps z 2 Z to the utility obtained by player i when the terminal history is reached.\nA behavioral strategy i for a player i is a probability distribution over actions at each information\nset in Ii. A strategy pro\ufb01le  is a behavioral strategy for each player. The probability that  puts\non a 2 AI is denoted (I, a). We let \u21e1(z) and \u21e1(I) denote the probability of reaching z and I\nrespectively, if players choose actions according to . We likewise let \u21e1(z|I) and \u21e1( \u02c6I|I) denote\nthe reach probabilities conditioned on being at information set I. If the probability of reaching I is\nzero due to players excluding i then we de\ufb01ne . For a given strategy pro\ufb01le  we let I!a denote the\nsame strategy except that I!a(I, a) = 1.\nWe will often quantify statements over the set of leaves or information sets that are reachable from\nsome given information set I belonging to Player i, sometimes conditioned on taking a speci\ufb01c action\na 2 AI. We let ZI,DI \u21e2I i be the set of leaves and information sets reachable conditioned on\nbeing at information set I. We let ZI and DI \u21e2I i be the set of leaves and information sets that are\nreachable without Player i taking any further actions before reaching them. We let Z a\nI and\nI be de\ufb01ned analogously but conditioned on taking action a 2 AI.\nDa\nAs is usual we use the subscript i to denote exclusion of Player i, for example, i is the set of\nbehavioral strategies in  except for the strategy of Player i, and \u21e1\ni(z) is the probability of reaching\nleaf node z disregarding actions taken by Player i, that is, assuming that Player i plays to reach z.\n\nI ,Da\n\nI , Za\n\n3 Game abstractions\n\nWe start by giving an intuitive description of how we model abstraction. We are given some perfect-\nrecall EFG  for which we would like to compute a (possibly approximate) Nash equilibrium. Instead\nof solving  directly, we assume that we are given some abstraction of  called 0. Throughout\nwe will assume that 0 is itself an EFG, though it is allowed to be imperfect recall, unlike the\noriginal game. The high-level idea is to compute some approximate solution to 0, and then use that\napproximate solution to construct a strategy for . The type of approximate solution computed for 0\nmay vary. For example, computing an exact Nash equilibrium in 0 may be overkill, since what we\nultimately care about is how strong of a strategy pro\ufb01le we get in the original game. This is especially\ntrue when the abstraction has imperfect recall, in which case a Nash equilibrium is NP-hard to \ufb01nd,\nand it suf\ufb01ces to \ufb01nd a strategy with low counterfactual regret at every information set. We consider\nseveral notions of solution to the abstract game.\nOnce we have an abstraction and a solution thereof, the primary question that we ask in this paper is\nwhether we can construct a solution to the original game that is provably near-optimal. To answer\nthis question we need a way to reason about the differences between the original game and the\nabstract game. We do this by setting up a mapping between the real game and the abstract game:\nevery information set in the real game is assumed to map onto a speci\ufb01c abstract information set.\nThe strategy that we construct for the real game is such that the distribution over actions at a given\ninformation set is constructed from the distribution over actions at the abstract information set that it\nmaps onto. In order to analyze the quality of the obtained strategy we propose a two-step process for\nmeasuring differences between the real and abstract game: In the \ufb01rst step we think of the original\ngame mapping onto an information re\ufb01nement of the abstraction, where the re\ufb01nement is the abstract\ngame but with some abstract information sets re\ufb01ned into two or more new information sets. The\ninformation re\ufb01nement has to be at least \ufb01ne-grained enough to entail perfect recall, although it\nmay be useful in practice to consider re\ufb01nement even of perfect recall information sets. We set up\nmeasures of how different payoffs and probability distributions are in the original game versus in\nthe re\ufb01nement, where these measurements are based on how information sets and actions from the\nreal game are mapped onto the re\ufb01nement of the abstraction. In the second step, we measure the\ndifference between the re\ufb01nement and the abstract game. This is again done by measuring differences\nin payoffs and probability distributions, this time between each information set in the re\ufb01nement\nand the larger abstract information set that it was re\ufb01ned from in the abstraction. This process is\nillustrated in Figure 1. Figure 2 shows an example of how that construction might be arrived at in\npractice, though note that our framework does not require the abstraction to be one that is arrived at\nvia game-tree modi\ufb01cations like this.\n\n3\n\n\f1\n3\n\n1\n3\n\n1\n3\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n3\n\nh\n2\n\n`\n0\n\nh\n1\n\n1\n\n`\n\n2\n\nh\n2\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n\n3 + \u270f\n\n`\n0\n\nh\n\n1 + \u270f\n\n`\n0\n\nh\n1\n\n1\n3\n\nh\n2\n\n`\n0\n\nh\n2\n\n2\n3\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n3\n\nh\n2\n\n`\n0\n\nh\n1\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n1\n\nh\n2\n\n`\n0\n\nh\n2\n\nFigure 1: Abstraction example. Left: Original EFG. Right: Abstraction (which has perfect recall\nin this case). Dotted red arrows denote the mapping of information sets in the original game onto\ninformation set partitions in the abstract game. The dotted orange line in the abstract game denotes\nan information set coarsening relative to \u02dcI.\n\nOriginal game\n\nPerfect-recall re\ufb01nement\n\nAbstraction\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n1\n\n1\n3\n\nh\n2\n\n`\n0\n\nh\n2\n\n1\n3\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n3\n\nh\n2\n\n`\n0\n\nh\n1\n\n1\n3\n\nAbstract right branch\n\nonto middle branch\n\n1\n3\n\n1\n\n`\n\n2\n\nh\n2\n\n1\n\n`\n\n2\n\n`\n0\n\nh\n\n3 + \u270f\n\n`\n0\n\nh\n\n1 + \u270f\n\n`\n0\n\nh\n1\n\nCoarsen Player 2\ninformation sets\n\n1\n3\n\n2\n3\n\n1\n\n1\n\n2\n3\n\n1\n\nh\n2\n\n`\n0\n\nh\n2\n\n`\n\n2\n\n`\n0\n\nh\n3\n\nh\n2\n\n`\n\n2\n\n`\n0\n\nh\n1\n\n`\n0\n\nh\n1\n\nh\n2\n\n`\n0\n\nh\n2\n\n`\n\n2\n\n`\n0\n\nh\n3\n\nh\n2\n\n`\n0\n\nh\n1\n\nFigure 2: Example of how an abstraction could be constructed. First the rightmost red branch is\nremoved. Second the information sets for Player 2 are coarsened as shown by the red dotted line.\n\nThe step where the original game is mapped onto a re\ufb01nement would typically be used to model\naction removal: say we have three actions a1, a2, a3 available at an information set, in the abstraction\nwe may want to have only a1, a2 and consider a3 as mapped onto a2. The re\ufb01nement step can only\nmodel information coarsening, but is very powerful for modeling certain practical types of abstraction.\nAs an example, in poker research cards have typically been abstracted via information coarsening,\nsay treating a pair of aces and a pair of kings as the same hand in the abstraction. We can model this\nin the re\ufb01nement step, where aces and kings would be re\ufb01ned into two separate information sets.\nWe now give a formal description of our framework. As noted above, we consider abstractions that\nare themselves EFGs, but we do not require abstractions to have perfect recall (the leading practical\nabstractions are of imperfect recall [10, 14, 24]). We will use the original game to refer to some\nperfect-recall game = ( H, Z, A, P, \u21e10,{Ii},{ui}) that we would like to compute a Nash equilib-\nrium for. We use the abstract game to refer to some other game 0 = (H0, Z0, A0, P 0,\u21e1 00,{I0i},{u0i})\nthat is an abstraction of . The goal is to compute a (possibly approximate) equilibrium in the\nabstraction, and map the resulting strategy pro\ufb01le to the full game. For analytical purposes we will\nalso introduce an intermediary third game, which we will refer to as the perfect-recall re\ufb01nement\n\u02dc= ( \u02dcH, \u02dcZ, \u02dcA, \u02dcP , \u02dc\u21e10,{\u02dcIi},{\u02dcui}). The perfect-recall re\ufb01nement \u02dc has the same game tree as the\nabstraction 0 (and thus \u02dcH = H0, \u02dcZ = Z0, \u02dcA = A0, \u02dcP = P 0, \u02dc\u21e10 = \u21e100, and \u02dcu = u0), but the\ninformation sets must be re\ufb01ned relative to , i.e. each information set is either intact, or partitioned\ninto several \ufb01ner information sets. Thus \u02dc has a \ufb01ner-grained (i.e. less) abstraction than 0. \u02dc is\nassumed to be a perfect-recall game, unlike 0. Our de\ufb01nition of \u02dc is analogous to that of Lanctot\net al. [31] and Kroer and Sandholm [27].\nWe model abstraction as a two-stage process. First, the full game is mapped onto \u02dc, with every\noriginal information set I 2I i mapping onto some \u02dcI in \u02dc via a function f : I! \u02dcI that maps I\nsurjectively onto \u02dcI. In Figure 1, each of the three original information sets belonging to Player 2 map\nonto the same re\ufb01nement information set, but the leftmost original information set maps onto the left\npartition, whereas the center and right information sets map onto the right partition. In the abstract\ngame in Figure 1, Player 2 has two subsets in \u02dcI: the left and right sides of their single information set.\nActions are similarly mapped with an action mapping g : A ! \u02dcA that maps each AI surjectively onto\n\u02dcAf (I). It is assumed that f respects the information-set tree structure by mapping Da\nI surjectively\nonto \u02dcDg(a)\nf (I). The \ufb01nal part of the \ufb01rst step is a way to map leaf nodes under original information sets\nto leaf nodes under the corresponding abstract information set. For each information set I and action\na 2 AI, we require a surjective leaf-node mapping from the set of leaf nodes reached below I, a\nbefore player i acts again, Za\nThe second step in our abstraction model captures the differences between the abstract game 0\nand \u02dc. This is done by comparing the distribution over leaf nodes conditioned on being at a given\n\nI , onto \u02dcZa0\n\nf (I).\n\n4\n\n\fI0\n\nI0\n\nfor each a0 in a way such that { \u02dcI(z0) : z0 2 \u02dcZ a0\n\n\u02dcI 2 \u02dcI versus the distribution conditioned on being at the corresponding abstract information set I0.\nIn Figure 1 this would correspond to comparing the leaf nodes under e.g. the right pair of nodes in\nPlayer 2\u2019s information set in the abstraction to the leaf nodes in the overall information set for Player\n2. For each partition \u02dcI this is done with a set-valued map  \u02dcI that maps the set of leaf nodes \u02dcZ a0\n\u02dcI onto\nZ0,a0\n. For a given\npartition \u02dcI, we let \u02dcD \u02dcI and \u02dcD \u02dcI be the set of descendant and child partitions, respectively, that can be\nreached from \u02dcI.\nFor a strategy pro\ufb01le 0 computed in 0 we need a way to interpret it as strategy pro\ufb01les in . We\nuse the natural extension of a lifted strategy, originally developed by Sandholm and Singh [35] for\nstochastic games, to EFGs. Intuitively, a lifted strategy \"0 is a strategy where for any abstract\naction a0, the sum of probabilities in \"0 assigned to actions that map to a0 is equal to the probability\nplaced on a0 in 0.\nDe\ufb01nition 1 (Strategy lifting). Given an abstract strategy pro\ufb01le 0, a lifted strategy pro\ufb01le is any\n\n\u02dcI } speci\ufb01es a partitioning of Z0,a0\n\nstrategy pro\ufb01le \"0 such that for all I, all a0 2 A0f (I):Pa2g1(a0) \"0(I, a) = 0(f (I), a0).\n\ni(I) > 0; otherwise it is 0. Analogously, W 0\n\nWe use the de\ufb01nition of counterfactual value of an information set, introduced by Zinkevich et al. [40],\nto reason about the value of an information set under a given strategy pro\ufb01le. The counterfactual value\nof an information set I is the expected utility of the information set, assuming that all players follow\nstrategy pro\ufb01le , except that Player i plays to reach I. It is de\ufb01ned as V \n\u21e1(z|I)ui(z)\n: I0i ! R is the corresponding function\nwhen \u21e1\nfor the abstract game. Note that Zinkevich et al. [40] further multiply the value by the reach\nexcluding i, whereas we do not. For the information set Ir that contains just the root node r, we have\ni (r), which is the value of playing the game with strategy pro\ufb01le . We assume that at\nV \ni (Ir) = V \nthe root node it is not Chance\u2019s turn to move. This is without loss of generality since we can insert\ndummy player nodes above a root node belonging to Chance.\nKroer and Sandholm [25] showed that for an information set I, Vi(I) can be written as a sum over\ndescendant information sets\n\ni (I) =Pz2ZI\n\ni\n\nV \n\ni (I) = Xa2AI\n\n(I, a)\uf8ff XJ2Da\n\nI\n\ni(J|I)V \n\u21e1\n\ni (J) + Xz2Za\n\nI\n\n\u21e1\n\ni(z|I)ui(z),\n\n(1)\n\nThe form stated here is slightly different from the one given by Kroer and Sandholm [25]. They\nassume that information sets have either only leaf nodes or only information sets immediately beneath\nthem, but this slightly more general statement follows easily from their proof. The value of Wi( \u02dcI)\ncan be written similarly.\nWe will show results for three different solution concepts that come up in practice. An \u270f-Nash\nequilibrium is a strategy pro\ufb01le  such that V \ni (r)  \u270f for all players i and \u02c6 = (i, \u02c6i).\nIn other words, each player can gain at most \u270f by deviating to any other strategy \u02c6i. This is what\nis computed by approaches based on \ufb01rst-order methods [23, 28, 29]. A Nash equilibrium is an\n\u270f-Nash equilibrium where \u270f = 0. Finally, a strategy pro\ufb01le  has bounded counterfactual regret\nif for all i, I 2I , and a 2 AI, V I!a\ni (I) + r(I). Strategy pro\ufb01les with bounded\ncounterfactual regret are important because regret minimization algorithms for EFGs converge by\nproducing strategies with low \u21e1i(I)r(I) [9, 11, 13, 30, 40].\n\ni (r)  V \u02c6\n\n(I) \uf8ff V \n\ni\n\n4 Measuring differences between the original game and the abstract game\n\nOur goal is to show a decomposition of the utility difference between the original game and the\nabstract game when using a lifted strategy. In order to do this, we need a way to measure differences\nbetween the original and the re\ufb01ned game. We measure payoff differences between nodes as\n\nWe measure leaf-node reach-probability differences conditioned on the real and abstract strategies\n, 0, and action a, at a given information set I versus its corresponding abstract information set-\n\nR\ni (z, \u02dcz) = ui(z)  \u02dcui(\u02dcz).\n\n5\n\n\fpartition f (I) as follows\n\nP\n\ni(\u02dcz|I, a,, 0) = Xz2 1(z0):z2Za\n\nI\n\ni(z|I)  \u21e10\n\u21e1\n\ni(z0|f (I)),\n\nfor z0 2 Z0,a0\n\nI0\n\n.\n\nWe will also need to measure the difference in probability of reaching information set partitions,\nconditioned on being at the preceding information set partition belonging to the same player,\n\nP\n\ni( \u02dcI|I, a,, 0) = XJ2f1( \u02dcI)\ni(J|I, a) = 0.\n\ni(J|I, a)  \u21e10\n\u21e1\n\ni( \u02dcI|f (I)).\n\nNote that while the set f1( \u02dcI) can include information sets J that do not come after I, a, such\ninformation sets are irrelevant since \u21e1\nWe now prove a technical lemma that will be used as the primary tool for inductively proving that\nstrategies from abstractions have bounded regret.\nLemma 1. For any information set I and pair of lifted strategy pro\ufb01les , 0, assume there is a\nbound (J, f (J)) such that V \ni (I)  W 0\nV \n\ni (f (J)) \uf8ff (J, f (J)) for all J 2 Da\n\ni (J)  W 0\n\ni (z, (z)) +X\u02dcz2 \u02dcZg(a)\n\n(I, a)\uf8ff Xz2Za\ni (f (I)) \uf8ff Xa2AI\ni(J|I)(J, f (J)) + X\u02dcI2 \u02dcDg(a)\n+ XJ2Da\ni (J)  W 0\n\nI , a 2 AI. Then\ni(\u02dcz|I, a,, 0)\u02dcui(\u02dcz)\nP\ni ( \u02dcI)\nI and a 2 AI.\nWe now introduce a shorthand for denoting the utility difference attributable to differences between\na given information set I and its abstract counterpart f (I). This is the utility difference that would\narise from recursively applying Lemma 1 to information sets.\n\ni (f (J)) = (J, f (J0)) for all J 2 Da\n\ni( \u02dcI|I, a,, 0)W 0\nP\n\nThe above holds with equality if V \n\n\u21e1\ni(z|I)R\n\n\u21e1\n\nf (I)\n\nf (I)\n\nI\n\nI\n\nM(I,, 0\n\ni)\n\ndef\n\n= Xa2AI\n+ XJ2Da\n\nI\n\n(I, a)\uf8ff Xz2Za\n\nI\n\ni(J|I)M( J, , 0\n\u21e1\n\n\u21e1\ni(z|I)R\n\ni (z, (z)) +X\u02dcz2 \u02dcZg(a)\ni) + X\u02dcI2 \u02dcDg(a)\n\nf (I)\n\nf (I)\n\ni(\u02dcz|I, a,, 0)\u02dcui(\u02dcz)\nP\ni ( \u02dcI)\n\ni( \u02dcI|I, a,, 0)W 0\nP\n\nIt follows from Lemma 1 that the players\u2019 values in any lifted strategy pro\ufb01le in the original game are\nclose to the players\u2019 values of the corresponding abstract strategy pro\ufb01le:\nLemma 2. Given any abstract strategy pro\ufb01le 0, any lifted strategy pro\ufb01le \"0 achieves utility\n\nW 0\n\ni (r0) = V \"0\n\ni\n\n(r)  M(r, \"0, 0\n\ni)\n\nNext we derive an expression for the difference between an abstract information set and any \u02dcI in its\npartitioning. We will need a way to measure the difference between an information set I0 and any\npartition \u02dcI. For reach probability, we let\n\nP (\u02dcz| \u02dcI,0) = \u21e10(\u02dcz| \u02dcI)  Xz02 \u02dcI (\u02dcz)\n\n\u21e10(z0|I0)\n\n(2)\n\nbe the difference between the probability of arriving at \u02dcz conditioned on a strategy 0 and being in\npartition \u02dcI of I0 and the probability of arriving at any leaf node z0 2 1\n(\u02dcz) conditioned on the same\nstrategy 0 and being in I0. For reward differences we let the utility difference between a leaf node\nz0 2 ZI0 and its corresponding leaf node \u02dcz = 1\n\n(3)\nwhere  \u02dcI > 0 is an arbitrary scalar value that can be chosen to re\ufb02ect the fact that we only need payoffs\nto be similar in a relative sense (for example, consider two subtrees with the same payoffs except that\none subtree has all payoffs scaled by a constant; these subtrees are strategically equivalent).\nThese terms allow us to measure the difference between the value W 0\ninformation set I0 and any \u02dcI in its partition. We let P( \u02dcI,0) denote this difference.\n\ni (I0) and W 0\n\ni ( \u02dcI) for any\n\n(z0) in Z \u02dcI be\ni (z0| \u02dcI) = \u02dcui(\u02dcz)   \u02dcIui(z0)\nR\n\n\u02dcI\n\n\u02dcI\n\n6\n\n\fLemma 3. For any player i, abstract strategy pro\ufb01le 0, information set I0 and any \u02dcI in its partition,\n\nW 0\ni ( \u02dcI)   \u02dcIW 0\n\ni (I0) = Xz02Z0I0\n\n\u21e10(z0|I0)R\n\ni (z0| \u02dcI) +X\u02dcz2Z \u02dcI\n\n5 An exact decomposition of abstraction error\n\nP (\u02dcz| \u02dcI,0)ui(z0)\n\ndef\n=P( \u02dcI,0)\n\nhere \u21e4 = (\u21e4i , \"0\n\ni) +XI2Ii\n\ni)  M(r, \"0, 0\n\nOur \ufb01rst theorem shows that an \u270f-Nash equilibrium in the abstract game maps to an \u270f0-Nash equi-\nlibrium in the original game, where \u270f0 depends on the difference terms introduced in the previous\nsection. We say that the abstract game has a cycle if there exists a sequence of information sets\nI01, . . . , I0k such that for all j 6= k there exist nodes h0j 2 I0j, h0j+1 2 I0j+1 such that h0j is an ancestor\nof h0j+1, and I01 is equal to I0k. The next theorem assumes the abstract game is acyclic. This enables\ninduction over information sets.\nTheorem 1. Given an \u270f0-Nash equilibrium 0 for an acyclic abstract game, any lifted strategy pro\ufb01le\n\"0 is an \u270f-Nash equilibrium in the original game where \u270f = maxi2N \u270fi and\n\u270fi =\u270f0 +M( r, \u21e4, 0\n\n\u21e1\u21e4(I) [P( f (I), \u21e40I0!I)  P(I0I, \u21e40)]\ni ) is \"0 except Player i plays any best response strategy for the original game,\n\u21e40 = (\u21e40i , 0i) is such that \u21e40(I0, a0) =Pg1(a0) \u21e4(I, a) where I 2 f1(I0) is chosen for each\n(r), and \u21e40I0!I is \u21e40 except that at I0 we set the strategy according to\nI, i.e. \u21e40(I0, a0) =Pg1(a0) \u21e4(I, a).\n\nThis theorem is the \ufb01rst to show results for mapping an \u270f0-Nash equilibrium in the abstract game to\nan \u270f-Nash equilibrium in the original game. Prior results have been for abstract strategies that are\neither exact Nash equilibria [25] or with bounded counterfactual regret [27, 31]. That is because all\nprior proofs were based on applying a worst-case counterfactual regret bound as part of the inductive\nstep (which works for exact Nash equilibrium or strategies with bounded counterfactual regret but\nnot \u270f-Nash equilibrium); our proof instead constructs an expression for W \u21e40\n(r0) (i.e., for the value\nof the whole abstract game) before using the fact that 0 is an \u270f-Nash equilibrium. We next show that\nour framework can also measure differences for strategies with bounded counterfactual regret.\nTheorem 2. For an abstract strategy pro\ufb01le 0 with bounded counterfactual regret r(I0) at every\ninformation set I0 2I 0, any lifted strategy pro\ufb01le \"0 is an \u270f-Nash equilibrium with\n\nI0 in order to maximize W \u21e40\n\ni\n\ni\n\n\u270fi,\u270f\n\n\u270f = max\ni2N\n\nwhere \u21e4 = (\u21e4i , \"0\n\n\u21e1\u21e4(I)\u21e5f (I)I r(f (I)) + P(f (I), 0I!\u21e40)  P(I0I, 0)\u21e4\n\ni \uf8ff XI2Ii\ni) is \"0 except for Player i best responding, and each 0I!\u21e4 is equal to 0\n\ni)  M(r, \"0, 0\nexcept that 0I!\u21e4(f (I), a0) =Pa2g1(a0) \u21e4(I, a) for all a0 2 Af (I).\n\nWe will show in the next sections that our two main theorems generalize prior results. In addition,\nour theorems are the \ufb01rst to give an exact expression for the abstraction error; the inequalities arise\nonly from inexactly solving the abstract game.\n\n+M( r, \u21e4, 0\n\ni)\n\n6 Generalizing prior results\n\nWe now show that if the error in each conditional distribution over child leaves and information sets\ndepends only on error at Chance nodes then the exact results from the previous section subsume all\nprior solution quality bounds for EFGs [25, 27, 31] (which also make that assumption or stronger\nassumptions). For that we de\ufb01ne measures of how well Chance outcomes are approximated in the\nabstraction:\n\n0(h, \u02dca) = Xa2g1(\u02dca)\n\n0(h) = X\u02dca2 \u02dcA(h)\n\n0(h, \u02dca)\n\nSimilarly for nodes in infosets belonging to Player i we have the following error\n\n0(h, a)  0(\u02dch, \u02dca),\n0(\u02dch|I) =Xh2I\n\n7\n\n\u21e10(h|I)  \u21e10(\u02dch|f (I)).\n\n\fI denote the set Za\n\nIn order to avoid dependence on the choice of strategy for Player i our result will measure the\nworst-case loss over pure strategies for Player i. We let this set be i. We will use ~ato denote a\nspeci\ufb01c pure strategy, and we let Z~a\nI such that a is the action chosen at I in ~a, and\nsimilarly for D~a\nI . In a slight abuse of notation, we let g(~a) denote the pure strategy in the abstract\ngame corresponding to ~awhen applying g.\nProposition 1. If an abstract strategy pro\ufb01le 0 and a lifted strategy pro\ufb01le \"0 are such that for all\ni,0( \u02dcI|I, a,, 0) = 0 then for\ni, I 2I , P\nall players i and  = (i, \"0\n\ni,0(z|I,, 0) = 0, and P\n\ni,0(\u02dcz|I, a,, 0) = 0, P\ni ) we have\n\n(I|~a)\uf8ff Xz2Z~a\n\nI\n\n\u21e1\"0\ni\n\n(h0|I)A\n\n0 (h0)\u21e10\n\n\u21e1\"0\ni\n\n(z|I)R(z, (z))\n\ni(\u02dcz|\u02dch0, a0)\u02dcui(\u02dcz)35\n\n\u21e1\"0\n\ni) \uf8ff 2 max\n\n~a2i XI2Ii\ni(\u02dcz|\u02dcz[I]) +Xh2I Xh02H0:hvh0\n\nM(r, , 0\n\ni)  M(r, \"0, 0\n\nf (I)\n\n240(\u02dcz[I]|I)\u21e10\nI X\u02dch2f (J)\uf8ff0(\u02dch[f (I)]|I)\u21e10\n\n+ X\u02dcz2 \u02dcZg(~a)\n+ XJ2D~a\n+Xh2I Xh02H0:hvh0\n\n(h0|I)A\n\n\u21e1\"0\ni\n\ni(\u02dch|\u02dch[f (I)])\ni(\u02dch|\u02dch0, \u02dca)W 0\n\n0 (h0)\u21e10\n\ni (f ()I)def\n\n= Mi(\"0, 0)\n\nWe can combine Proposition 1 with Theorem 1 to get a bound that is independent of the best-response\nstrategy:\nCorollary 1. If 0 is an abstract \u270f0-Nash equilibrium, satis\ufb01es the condition of Proposition 1, and\nP is zero everywhere, then any lifted strategy pro\ufb01le \"0 is an \u270f-Nash equilibrium where \u270f is less\nthan maxi2N Mi(\"0, 0) + \u270f0\nThis bound generalizes the bound of Sandholm and Singh [35] while simultaneously tightening their\nbound.\nThe game class discussed by Kroer and Sandholm [25] is easily shown to satisfy the assumptions in\nProposition 1. Thus this shows a more general bound similar to that of Kroer and Sandholm [25],\nwhere we leave in several expectations rather than taking maxima everywhere (the result by Kroer\nand Sandholm [25] required taking several maxima where we leave in the expectation because their\nproof is based on upper-bounding as part of the inductive step). Therefore, Corollary 1 yields tighter\nresults despite also being more general.\nCorollary 1 shows a result for \u270f-Nash equilibrium computed in the abstraction. An analogous\ncorollary for abstract strategies with bounded immediate regret can easily be obtained by combining\nProposition 1 with Theorem 2.\nWe now show that, similar to mapping error, if the reach of leaf nodes in the original and abstract\ngame are the same without considering Chance moves, we can bound partitioning error with an\nexpression that does not depend on the best response \u21e4i of Player i.\nProposition 2. If 0 is such that \u21e10\nP( \u02dcI,0I!\u21e4)  P( \u02dcI,0) \uf8ff 2 max\n0(\u02dcz| \u02dcI, a0) Xz02 \u02dcI (\u02dcz)h\u21e10\n+ X\u02dcz2Z a0\n\n0(z0|I0, a0) for all \u02dcI, a0, \u02dcz, z0 2  \u02dcI(\u02dcz), then\n\u21e10(z0|I0, a0)R\n0 (\u02dcz| \u02dcI, a0iidef\n\ni (z0| \u02dcI)\n= P( \u02dcI,0I!\u21e4, 0), 80I!\u21e4\n\n0(\u02dcz| \u02dcI, a0) = \u21e10\na02AI0h Xz02Z a0\n0 (z0|I0, a0))  \u21e10\n\n\u21e10\n\nI0\n\n\u02dcI\n\nThis can be combined with our main theorems in order to get results for \u270f-Nash equilibrium or\nstrategies with bounded regret where the partition error does not depend on the best response.\nCorollary 2.\nI0 2I\nany lifted strategy \"0\nPI2Ii\n\nIf 0 has bounded counterfactual regret r(I0) at every information set\nthen\nis an \u270f-Nash equilibrium where \u270f = maxi2N \u270fi and \u270fi \uf8ff\n\n\u21e1\u21e4(I)\u21e5f (I)I r(f (I)) + P(f (I), 0I!\u21e4, 0)\u21e4\n\n0, satis\ufb01es the condition of Proposition 2, and M is zero everywhere,\n\n8\n\n\fKroer and Sandholm [27] took maxima in several places where we left in the expectation: they take a\nmaximum over the decisions of Player i in \u21e1\u21e4(I), and they maximize over the partitions in I0. Taking\nthese maxima avoids dependence on \u21e4. Taking these maxima could easily be done in Corollary 2 as\nwell. Kroer and Sandholm [27] also separate the difference in conditional distribution over leaves\ninto separate terms for Chance error that occurs before and after reaching I0; this potentially leads\nto a looser bound than ours (and never tighter since we could combine our Corollary 2 with their\nseparation). An analogue to Corollary 2 but for \u270f-Nash equilibrium can be obtained by combining\nTheorem 1 with Proposition 2.\n\n6.1 Neccessity of distributional similarity of reach probabilities\nWe now show that the style of bound given by Lanctot et al. [31] as well as our corrolaries 1 and 2\ncannot generalize to games where opponents do not have the same sequence of information-set-action\npairs, or in our case the slightly weaker requirements in Propositions 1 and 2, for game nodes that\nmap to each other in the abstraction. The two games that we will use as counterexamples are shown\nin Figure 3. From the perspective of our results, the usefulness of assuming the same sequence of\ninformation-set-action pairs is that it implies the condition used in Propositions 1 and 2; the following\ncounterexamples thus also show that this assumption is a useful way to disallow bad abstractions\nsuch as the ones presented here (although overly restrictive from a practical perspective). Contrary to\nthe prior results, our Theorems 1 and 2 still apply to the games below. Our two theorems would give\nweak bounds commensurate with the large error in the abstract equilibrium; this error is contained in\nthe terms that depend on P .\n\n1\n\n1\n4\n\n\u270f\n\nr\n1\n\n`\n1\n\n1\n4\n\n1\n\n\u270f\n\n2\n\nr\n1\n\n2\n\n`\n1\n\nr\n1\n\n1\n2\n\nr\n\n1\n\n`\n\n2\n\n2\n\n2\n\n1\n2\n\n`\n\n1\n\nr\n\n2\n\n\u270f\n\nv, \u270f\n\n0, 0\n\nv, 0\n\n0,\u270f\n\nv, 0\n\n0,\u270f\n\nv, \u270f\n\n0, 0\n\n1\n4\n\nr\n1\n\n\u270f\n\n1\n4\n\n2\n\n1\n\n`\n1\n\n1\n\n2\n\n`\n1\n\nFigure 3: Left: General-sum EFG with abstraction. Right: zero-sum EFG with abstraction where\nPlayer 1 wants to minimize. Orange dashed lines denote information sets joined in the abstraction.\nBold edges denote actions taken with probability 1 in the abstracted equilibrium.\nOn the left in Figure 3 is a general-sum game where the two nodes belonging to Player 1 are abstracted\ninto a single information set. If we map ` onto ` and r onto r we get an abstraction with low payoff\nerror: \u270f at every node. At a high level, the idea in this counterexample is that Player 2, because\ntheir nodes are not abstracted, can play opposite actions in the left and right subtrees, thus changing\nwhether Player 1 prefers going left or right. In the original game Player 1 can react to this by choosing\ndifferent actions, but not in the abstraction. Formally: Let \u270f> 0. Player 2 plays the bolded edges\nat nodes with non-zero probability of being reached. In the abstraction, Player 1 gets v\n2 for every\nstrategy. In the full game, Player 1 can choose ` in the left subtree and r in the right subtree for a\npayoff of v. Thus in every equilibrium where Player 2 plays according to the bolded edges (which\n2 from abstracting, despite the payoff error being\nincludes all equilibrium re\ufb01nements) Player 1 loses v\narbitrarily small. If we set \u270f = 0, equilibria where Player 2 plays the bolded edges still have high\nloss\u2014despite zero payoff error. This example showed that information-set-action structure has to be\ntaken into account in order to get satisfying bounds in general. While the example is very simple\n(and can thus easily occur in the context of a larger game), it does exploit the fact that Player 1 utility\nis discontinuous in Player 2 utility. We next show that a more intricate counterexample can avoid\nrelying on this discontinuity.\nOn the right in Figure 3 is a zero-sum game where the two bottom information sets belonging to\nPlayer 2 have been abstracted. Consider the following abstract equilibrium: Player 1 plays the bolded\nedges with probability 1, and Player 2 plays `, r with equal probability. Player 2 gets expected utility\n2, but in the full game Player 2 can choose ` (r) in the left (right) information set to get utility\n \u270f\n2 . Thus Player 2 has a utility loss of 1\n2 despite a payoff error of 0. The idea in this example is that,\n1\u270f\nbecause Player 1 is not abstracted, they control the distribution over nodes in Player 2\u2019s information\nset in the abstraction in a way that is inconsistent with Player 2\u2019s original-game information sets: in\nthe abstraction they get an equal distribution over nodes where ` or r is the preferred action, whereas\nin the original game the corresponding strategy for Player 1 means that they know exactly which\nnode they are at.\n\n9\n\n\fAcknowledgments\nThis material is based on work supported by the National Science Foundation under grants IIS-\n1718457, IIS-1617590, and CCF-1733556, and the ARO under award W911NF-17-1- 0082. Christian\nKroer is supported by a Facebook Fellowship.\n\nReferences\n[1] A. Basak, F. Fang, T. H. Nguyen, and C. Kiekintveld. Abstraction methods for solving graph-\nbased security games. In International Conference on Autonomous Agents and Multiagent\nSystems, pages 13\u201333. Springer, 2016.\n\n[2] A. Basak, F. Fang, T. H. Nguyen, and C. Kiekintveld. Combining graph contraction and strategy\ngeneration for green security games. In International Conference on Decision and Game Theory\nfor Security, pages 251\u2013271. Springer, 2016.\n\n[3] N. Basilico and N. Gatti. Automated abstractions for patrolling security games. In AAAI\n\nConference on Arti\ufb01cial Intelligence (AAAI), 2011.\n\n[4] D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron.\nApproximating game-theoretic optimal strategies for full-scale poker. In Proceedings of the\nInternational Joint Conference on Arti\ufb01cial Intelligence (IJCAI), 2003.\n\n[5] M. Bowling, N. Burch, M. Johanson, and O. Tammelin. Heads-up limit hold\u2019em poker is solved.\n\nScience, 347(6218), Jan. 2015.\n\n[6] N. Brown and T. Sandholm. Regret transfer and parameter optimization. In AAAI Conference\n\non Arti\ufb01cial Intelligence (AAAI), 2014.\n\n[7] N. Brown and T. Sandholm. Simultaneous abstraction and equilibrium \ufb01nding in games. In\n\nProceedings of the International Joint Conference on Arti\ufb01cial Intelligence (IJCAI), 2015.\n\n[8] N. Brown and T. Sandholm. Safe and nested subgame solving for imperfect-information games.\nIn Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS),\npages 689\u2013699, 2017.\n\n[9] N. Brown and T. Sandholm. Superhuman AI for heads-up no-limit poker: Libratus beats top\n\nprofessionals. Science, page eaao1733, Dec. 2017.\n\n[10] N. Brown, S. Ganzfried, and T. Sandholm. Hierarchical abstraction, distributed equilibrium\ncomputation, and post-processing, with application to a champion no-limit Texas Hold\u2019em\nagent. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS),\n2015.\n\n[11] N. Burch, M. Lanctot, D. Szafron, and R. G. Gibson. Ef\ufb01cient Monte Carlo counterfactual regret\nminimization in games with many player actions. In Proceedings of the Annual Conference on\nNeural Information Processing Systems (NIPS), pages 1880\u20131888, 2012.\n\n[12] J. \u02c7Cerm\u00e1k, B. Bo\u0161ansky, and V. Lis\u00fd. An algorithm for constructing and solving imperfect\nrecall abstractions of large extensive-form games. In Proceedings of the International Joint\nConference on Arti\ufb01cial Intelligence (IJCAI), pages 936\u2013942, 2017.\n\n[13] G. Farina, C. Kroer, and T. Sandholm. Regret minimization in behaviorally-constrained zero-\n\nsum games. In International Conference on Machine Learning (ICML), 2017.\n\n[14] S. Ganzfried and T. Sandholm. Potential-aware imperfect-recall abstraction with earth mover\u2019s\ndistance in imperfect-information games. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI),\n2014.\n\n[15] A. Gilpin and T. Sandholm. A competitive Texas Hold\u2019em poker player via automated abstrac-\ntion and real-time equilibrium computation. In Proceedings of the National Conference on\nArti\ufb01cial Intelligence (AAAI), pages 1007\u20131013, 2006.\n\n10\n\n\f[16] A. Gilpin and T. Sandholm. Better automated abstraction techniques for imperfect information\ngames, with application to Texas Hold\u2019em poker. In International Conference on Autonomous\nAgents and Multi-Agent Systems (AAMAS), pages 1168\u20131175, 2007.\n\n[17] A. Gilpin and T. Sandholm. Lossless abstraction of imperfect information games. Journal of\n\nthe ACM, 54(5), 2007.\n\n[18] A. Gilpin and T. Sandholm. Expectation-based versus potential-aware automated abstraction in\nimperfect information games: An experimental comparison using poker. In Proceedings of the\nAAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2008. Short paper.\n\n[19] A. Gilpin, T. Sandholm, and T. B. S\u00f8rensen. Potential-aware automated abstraction of sequential\ngames, and holistic equilibrium analysis of Texas Hold\u2019em poker. In Proceedings of the AAAI\nConference on Arti\ufb01cial Intelligence (AAAI), 2007.\n\n[20] A. Gilpin, T. Sandholm, and T. B. S\u00f8rensen. A heads-up no-limit Texas Hold\u2019em poker player:\nIn\n\nDiscretized betting models and automatically generated equilibrium-\ufb01nding programs.\nInternational Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2008.\n\n[21] J. Hawkin, R. Holte, and D. Szafron. Automated action abstraction of imperfect information\n\nextensive-form games. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2011.\n\n[22] J. Hawkin, R. Holte, and D. Szafron. Using sliding windows to generate action abstractions in\n\nextensive-form games. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2012.\n\n[23] S. Hoda, A. Gilpin, J. Pe\u00f1a, and T. Sandholm. Smoothing techniques for computing Nash\n\nequilibria of sequential games. Mathematics of Operations Research, 35(2), 2010.\n\n[24] M. Johanson, N. Burch, R. Valenzano, and M. Bowling. Evaluating state-space abstractions in\nextensive-form games. In International Conference on Autonomous Agents and Multi-Agent\nSystems (AAMAS), 2013.\n\n[25] C. Kroer and T. Sandholm. Extensive-form game abstraction with bounds. In Proceedings of\n\nthe ACM Conference on Economics and Computation (EC), 2014.\n\n[26] C. Kroer and T. Sandholm. Discretization of continuous action spaces in extensive-form games.\nIn International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2015.\n\n[27] C. Kroer and T. Sandholm. Imperfect-recall abstractions with bounds in games. In Proceedings\n\nof the ACM Conference on Economics and Computation (EC), 2016.\n\n[28] C. Kroer, K. Waugh, F. K\u0131l\u0131n\u00e7-Karzan, and T. Sandholm. Faster \ufb01rst-order methods for extensive-\nform game solving. In Proceedings of the ACM Conference on Economics and Computation\n(EC), 2015.\n\n[29] C. Kroer, K. Waugh, F. K\u0131l\u0131n\u00e7-Karzan, and T. Sandholm. Theoretical and practical advances on\nsmoothing for extensive-form games. In Proceedings of the ACM Conference on Economics\nand Computation (EC), 2017.\n\n[30] M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling. Monte Carlo sampling for regret mini-\nmization in extensive games. In Proceedings of the Annual Conference on Neural Information\nProcessing Systems (NIPS), 2009.\n\n[31] M. Lanctot, R. Gibson, N. Burch, M. Zinkevich, and M. Bowling. No-regret learning in\nextensive-form games with imperfect recall. In International Conference on Machine Learning\n(ICML), 2012.\n\n[32] M. Morav\u02c7c\u00edk, M. Schmid, N. Burch, V. Lis\u00fd, D. Morrill, N. Bard, T. Davis, K. Waugh,\nM. Johanson, and M. Bowling. Deepstack: Expert-level arti\ufb01cial intelligence in heads-up\nno-limit poker. Science, 356(6337), May 2017.\n\n[33] J. Nash. Equilibrium points in n-person games. Proceedings of the National Academy of\n\nSciences, 36:48\u201349, 1950.\n\n11\n\n\f[34] T. Sandholm. Abstraction for solving large incomplete-information games. In AAAI Conference\n\non Arti\ufb01cial Intelligence (AAAI), 2015. Senior Member Track.\n\n[35] T. Sandholm and S. Singh. Lossy stochastic game abstraction with bounds. In Proceedings of\n\nthe ACM Conference on Electronic Commerce (EC), 2012.\n\n[36] J. Shi and M. Littman. Abstraction methods for game theoretic poker. In CG \u201900: Revised\nPapers from the Second International Conference on Computers and Games, pages 333\u2013345,\nLondon, UK, 2000. Springer-Verlag.\n\n[37] K. Waugh. Abstraction in large extensive games. Master\u2019s thesis, University of Alberta, 2009.\n[38] K. Waugh, D. Morrill, D. Bagnell, and M. Bowling. Solving games with functional regret\n\nestimation. In AAAI Conference on Arti\ufb01cial Intelligence (AAAI), 2015.\n\n[39] M. P. Wellman, D. M. Reeves, K. M. Lochner, S.-F. Cheng, and R. Suri. Approximate strategic\nreasoning through hierarchical reduction of large symmetric games. In Proceedings of the\nNational Conference on Arti\ufb01cial Intelligence (AAAI), 2005.\n\n[40] M. Zinkevich, M. Bowling, M. Johanson, and C. Piccione. Regret minimization in games\nwith incomplete information. In Proceedings of the Annual Conference on Neural Information\nProcessing Systems (NIPS), 2007.\n\n12\n\n\f", "award": [], "sourceid": 358, "authors": [{"given_name": "Christian", "family_name": "Kroer", "institution": "Faceook, Core Data Science"}, {"given_name": "Tuomas", "family_name": "Sandholm", "institution": "Carnegie Mellon University"}]}