{"title": "Strategy Grafting in Extensive Games", "book": "Advances in Neural Information Processing Systems", "page_first": 2026, "page_last": 2034, "abstract": "Extensive games are often used to model the interactions of multiple agents within an environment. Much recent work has focused on increasing the size of an extensive game that can be feasibly solved. Despite these improvements, many interesting games are still too large for such techniques. A common approach for computing strategies in these large games is to first employ an abstraction technique to reduce the original game to an abstract game that is of a manageable size. This abstract game is then solved and the resulting strategy is used in the original game. Most top programs in recent AAAI Computer Poker Competitions use this approach. The trend in this competition has been that strategies found in larger abstract games tend to beat strategies found in smaller abstract games. These larger abstract games have more expressive strategy spaces and therefore contain better strategies. In this paper we present a new method for computing strategies in large games. This method allows us to compute more expressive strategies without increasing the size of abstract games that we are required to solve. We demonstrate the power of the approach experimentally in both small and large games, while also providing a theoretical justification for the resulting improvement.", "full_text": "Strategy Grafting in Extensive Games\n\nKevin Waugh\n\nwaugh@cs.cmu.edu\n\nDepartment of Computer Science\n\nCarnegie Mellon University\n\nNolan Bard, Michael Bowling\n\n{nolan,bowling}@cs.ualberta.ca\n\nDepartment of Computing Science\n\nUniversity of Alberta\n\nAbstract\n\nExtensive games are often used to model the interactions of multiple agents within\nan environment. Much recent work has focused on increasing the size of an ex-\ntensive game that can be feasibly solved. Despite these improvements, many in-\nteresting games are still too large for such techniques. A common approach for\ncomputing strategies in these large games is to \ufb01rst employ an abstraction tech-\nnique to reduce the original game to an abstract game that is of a manageable size.\nThis abstract game is then solved and the resulting strategy is played in the original\ngame. Most top programs in recent AAAI Computer Poker Competitions use this\napproach. The trend in this competition has been that strategies found in larger ab-\nstract games tend to beat strategies found in smaller abstract games. These larger\nabstract games have more expressive strategy spaces and therefore contain better\nstrategies. In this paper we present a new method for computing strategies in large\ngames. This method allows us to compute more expressive strategies without in-\ncreasing the size of abstract games that we are required to solve. We demonstrate\nthe power of the approach experimentally in both small and large games, while\nalso providing a theoretical justi\ufb01cation for the resulting improvement.\n\n1\n\nIntroduction\n\nExtensive games provide a general model for describing the interactions of multiple agents within an\nenvironment. They subsume other sequential decision making models such as \ufb01nite horizon MDPs,\n\ufb01nite horizon POMDPs, and multiagent scenarios such as stochastic games. This makes extensive\ngames a powerful tool for representing a variety of complex situations. Moreover, it means that tech-\nniques for computing strategies in extensive games are a valuable commodity that can be applied\nin many different domains. The usefulness of the extensive game model is dependent on the avail-\nability of solution techniques that scale well with respect to the size of the model. Recent research,\nparticularly motivated by the domain of poker, has made signi\ufb01cant developments in scalable solu-\ntion techniques. The classic linear programming techniques [5] can solve games with approximately\n107 states [1], while more recent techniques [2, 9] can solve games with over 1012 states.\nDespite the improvements in solution techniques for extensive games, even the motivating domain of\ntwo-player limit Texas Hold\u2019em is far too large to solve, as the game has approximately 1018 states.\nThe typical solution to this challenge is abstraction [1]. Abstraction involves constructing a new\ngame that is tractably sized for current solution techniques, but restricts the information or actions\navailable to the players. The hope is that the abstract game preserves the important strategic structure\nof the game, and so playing a near equilibrium solution of the abstract game will still perform well in\nthe original game. In poker, employed abstractions include limiting the possible betting sequences,\nreplacing all betting in the \ufb01rst round with a \ufb01xed policy [1], and, most commonly, by grouping the\ncards dealt to each player into buckets based on a strength metric [4, 9].\nWith these improvements in solution techniques, larger abstract games have become tractable, and\ntherefore increasingly \ufb01ne abstractions have been employed. Because a \ufb01ner abstraction can rep-\n\n1\n\n\fresent players\u2019 information more accurately and provide a more expressive space of strategies, it is\ngenerally assumed that a solution to a \ufb01ner abstraction will produce stronger strategies for the orig-\ninal game than those computed using a coarser abstraction. Although this assumption is in general\nnot true [7], results from the AAAI Computer Poker Competition [10] have shown that it does often\nhold: near equilibrium strategies with the largest expressive power tend to win the competition.\nIn this paper, we increase the expressive power of computable strategies without increasing the\nsize of game that can be feasibly solved. We do this by partitioning the game into tractably sized\nsub-games called grafts, solving each independently, and then combining the solutions into a single\nstrategy. Unlike previous, subsequently abandoned, attempts to solve independent sub-games [1, 3],\nthe grafting approach uses a base strategy to ensure that the grafts will mesh well as a unit. In fact,\nwe prove that grafted strategies improve on near equilibrium base strategies. We also empirically\ndemonstrate this improvement both in a small poker game as well as limit Texas Hold\u2019em.\n\n2 Background\n\nInformally, an extensive game is a game tree where a player cannot distinguish between two histories\nthat share the same information set. This means a past action, from either chance or another player,\nis not completely observed, allowing one to model situations of imperfect information.\n\nDe\ufb01nition 1 (Extensive Game) [6, p. 200] A \ufb01nite extensive game with imperfect information is\ndenoted \u0393 and has the following components:\n\n\u2022 A \ufb01nite set N of players.\n\u2022 A \ufb01nite set H of sequences, the possible histories of actions, such that the empty sequence\nis in H and every pre\ufb01x of a sequence in H is also in H. Z \u2286 H are the terminal histories.\nNo sequence in Z is a strict pre\ufb01x of any sequence in H. A(h) = {a : (h, a) \u2208 H} are the\nactions available after a non-terminal history h \u2208 H \\ Z.\n\u2022 A player function P that assigns to each non-terminal history a member of N \u222a{c}, where\nc represents chance. P (h) is the player who takes an action after the history h. Let Hi be\nthe set of histories where player i chooses the next action.\n\u2022 A function fc that associates with every history h \u2208 Hc a probability distribution fc(\u00b7|h)\non A(h). fc(a|h) is the probability that a occurs given h.\n\u2022 For each player i \u2208 N, a utility function ui that assigns each terminal history a real value.\nui(z) is rewarded to player i for reaching terminal history z. If N = {1, 2} and for all\nz \u2208 Z, u1(z) = \u2212u2(z), an extensive game is said to be zero-sum.\n\u2022 For each player i \u2208 N, a partition Ii of Hi with the property that A(h) = A(h0) whenever\nh and h0 are in the same member of the partition. Ii is the information partition of player\ni; a set Ii \u2208 Ii is an information set of player i.\n\nIn this paper, we exclusively focus on two-player zero-sum games with perfect recall, which is a\nrestriction on the information partitions that excludes unrealistic situations where a player is forced\nto forget her own past information or decisions.\nTo play an extensive game each player speci\ufb01es a strategy. A strategy determines how a player\nmakes her decisions when confronted with a choice.\n\nDe\ufb01nition 2 (Strategy) A strategy for player i, \u03c3i, that assigns a probability distribution over A(h)\nto each h \u2208 Hi. This function is constrained so that \u03c3i(h) = \u03c3i(h0) whenever h and h0 are in the\nsame information set. A strategy is pure if no randomization is required. We denote \u03a3i as the set of\nall strategies for player i.\n\nDe\ufb01nition 3 (Strategy Pro\ufb01le) A strategy pro\ufb01le in extensive game \u0393 is a set of strategies, \u03c3 =\n{\u03c31, . . . , \u03c3n}, that contains one strategy for each player. We let \u03c3\u2212i denote the set strategies for all\nplayers except player i. We call the set of all strategy pro\ufb01les \u03a3.\n\nWhen all players play according to a strategy pro\ufb01le, \u03c3, we can de\ufb01ne the expected utility of each\nplayer as ui(\u03c3). Similarly, ui(\u03c3i, \u03c3\u2212i) is the expected utility of player i when all other players play\naccording to \u03c3\u2212i and player i plays according to \u03c3i.\nThe traditional solution concept for extensive games is the Nash equilibrium concept.\n\n2\n\n\fDe\ufb01nition 4 (Nash Equilibrium) A Nash equilibrium is a strategy pro\ufb01le \u03c3 where\n\n\u2200i \u2208 N \u2200\u03c30\n\ni \u2208 \u03a3i ui(\u03c3i) \u2265 ui(\u03c30\n\ni, \u03c3\u2212i)\n\nAn approximation of a Nash equilibrium or \u03b5-Nash equilibrium is a strategy pro\ufb01le \u03c3 where\n\n\u2200i \u2208 N \u2200\u03c30\n\ni \u2208 \u03a3i ui(\u03c3i) + \u03b5 \u2265 ui(\u03c30\n\ni, \u03c3\u2212i)\n\n(1)\n\n(2)\n\nA Nash (\u03b5-Nash) equilibrium is a strategy pro\ufb01le where no player can gain (more than \u03b5) through\nunilateral deviation. A Nash equilibrium exists in all extensive games. For zero-sum extensive\ngames with perfect recall we can ef\ufb01ciently compute an \u03b5-Nash equilibrium using techniques such as\nlinear programming [5], counterfactual regret minimization [9] and the excessive gap technique [2].\nIn a zero-sum game we say it is optimal to play any strategy belonging to an equilibrium because\nthis guarantees the equilibrium player the highest expected utility in the worst case. Any deviation\nfrom equilibrium by either player can be exploited by a knowledgeable opponent. In this sense we\ncan call computing an equilibrium in a zero-sum game solving the game.\nMany games of interest are far too large to solve directly and abstraction is often employed to reduce\nthe game to one of a more manageable size. The abstract game is solved and the resulting strategy\nis presumed to be strong in the original game. Abstraction can be achieved by merging information\nsets together, restricting the actions a player can take from a given history, or a combination of both.\n\nDe\ufb01nition 5 (Abstraction) [7] An abstraction for player i is a pair \u03b1i =(cid:10)\u03b1I\n\n(cid:11), where,\n\ni , \u03b1A\ni\n\n\u2022 \u03b1I\n\u2022 \u03b1A\n\ni\n\nis a function on histories where \u03b1A\n\ni is a partition of Hi, de\ufb01ning a set of abstract information sets coarser1 than Ii, and\nh and h0 in the same abstract information set. We will call this the abstract action set.\n\ni (h0) for all histories\nThe null abstraction for player i, is \u03c6i = hIi, Ai. An abstraction \u03b1 is a set of abstractions \u03b1i,\none for each player. Finally, for any abstraction \u03b1, the abstract game, \u0393\u03b1, is the extensive game\nobtained from \u0393 by replacing Ii with \u03b1I\n\ni (h) \u2286 A(h) and \u03b1A\n\ni (h) when P (h) = i, for all i.\n\ni and A(h) with \u03b1A\n\ni (h) = \u03b1A\n\nStrategies for abstract games are de\ufb01ned in the same manner as for unabstracted games. However,\nthe strategy must assign the same distribution to all histories in the same block of the abstraction\u2019s\ninformation partition, as well as assigning zero probability to actions not in the abstract action set.\n\n3 Strategy Grafting\n\nThough there is no guarantee that optimal strategies in abstract games are strong in the original\ngame [7], these strategies have empirically been shown to perform well against both other com-\nputers [9] and humans [1]. Currently, strong strategies are solved for in one single equilibrium\ncomputation for a single abstract game. Advancement typically involves developing algorithmic im-\nprovements to equilibrium \ufb01nding techniques in order to \ufb01nd solutions to yet larger abstract games.\nIt is simple to show that a strategy space must include at least as good, if not better, strategies than\na smaller space that it re\ufb01nes [7]. At \ufb01rst glance, this would seem to imply that a larger abstraction\nwould always be better, but upon closer inspection we see this depends on our method of selecting\na strategy from the space. In poker, when using arbitrary equilibrium strategies that are evaluated in\na tournament setting, this intuition empirically holds true.\nOne potentially important factor for the empirical evidence is the presence of dominated strategies\nin the support of the abstract equilibrium strategies.\n\nDe\ufb01nition 6 (Dominated Strategy) A dominated strategy for player i is a pure strategy, \u03c3i, such\nthat there exists another strategy, \u03c30\n\ni, where for all opponent strategies \u03c3\u2212i,\nui(\u03c30\n\ni, \u03c3\u2212i) \u2265 ui(\u03c3i, \u03c3\u2212i)\n\n(3)\n\nand the inequality must hold strictly for at least one opponent strategy.\n\n1Partition A is coarser than partition B, if and only if every set in B is a subset of some set in A, or\n\nequivalently x and y are in the same set in A if x and y are in the same set in B.\n\n3\n\n\fThis implies that a player can never bene\ufb01t by playing a dominated strategy. When abstracting one\ncan, in effect, merge a dominated strategy in with a non-dominated strategy. In the abstract game,\nthis combined strategy might become part of an equilibrium and hence the abstract strategy would\nmake occasional mistakes. That is, abstraction does not necessarily preserve strategy domination.\nAs a result of their expressive power, \ufb01ner abstractions may better preserve domination and thus can\nresult in less play of dominated strategies.\nDecomposition is a natural approach for using larger strategy spaces without incurring additional\ncomputational costs and indeed it has been employed toward this end. In extensive games with\nimperfect information, though, straightforward decomposition can be problematic. One way that\nequilibrium strategies guard against exploitation is information hiding, i.e., the equilibrium plays in\na fashion that hinders an opponent\u2019s ability to effectively reconstruct the player\u2019s private informa-\ntion. Independent solutions to a set of sub-games, though, may not \u201cmesh\u201d, or hide information,\neffectively as a whole. For example, an observant opponent might be able to determine which sub-\ngame is being played, which itself could be valuable information that could be exploited.\nArmed with some intuition for why increasing the size of the strategy space may improve the quality\nof the solution and why decomposition can be problematic, we will now begin describing the strategy\ngrafting algorithm and provide some theoretical results regarding the quality of grafted strategies.\nFirst, we will explain how a game of imperfect information is formally divided into sub-games.\nDe\ufb01nition 7 (Grafting Partition) G = {G0, G1, . . . , Gp} is a grafting partition for player i if\n\n1. G is a partition of Hi,\n2. \u2200I \u2208 Ii \u2203j \u2208 {0, . . . , p} such that I \u2286 Gj, and\n3. \u2200j \u2208 {1, . . . , p} if h is a pre\ufb01x of h0 \u2208 Hi and h \u2208 Gj then h0 \u2208 Gj \u222a G0.\n\nUsing the elements of a grafting partition, we construct a set of sub-games. The solutions to these\nsub-games are called grafts, and we can combine them naturally, since they are disjoint sets, into\none single grafted strategy.\nDe\ufb01nition 8 (Grafted Strategy) Given a strategy \u03c3i \u2208 \u03a3i and a grafting partition G for player i.\nFor j \u2208 {1, . . . , p}, de\ufb01ne \u0393\u03c3i,j to be an extensive game derived from the original game \u0393 where\nfor all h \u2208 Hi \\ Gj, P (h) = c and fc(a|h) = \u03c3i(h, a). That is, player i only controls her actions\nfor histories in Gj and is forced to play according to \u03c3i elsewhere. Let the graft of Gj, \u03c3\u2217,j, be an\n\u0001-Nash equilibrium of the game \u0393\u03c3i,j. Finally, de\ufb01ne the grafted strategy for player i \u03c3\u2217\n\ni as,\n\nWe will call \u03c3i the base strategy and G the grafting partition for the grafted strategy \u03c3\u2217\ni .\nThere are a few key ideas to observe about grafted strategies that distinguish them from previous\nsub-game decomposition methods. First, we start out with a base strategy for the player. This base\nstrategy can be constructed using current techniques for a tractably sized abstraction. It is important\nthat we use the same base strategy for all grafts, as it is the only information that is shared between\nthe grafts. Second, when we construct a graft, only the portion of the game that the graft plays is\nallowed to vary for our player of interest. The actions over the remainder of the game are played\naccording to the base strategy. This allows us to re\ufb01ne the abstraction for that block of the grafting\npartition, so that it itself is as large as the largest tractably solvable game. Third, note that when we\nconstruct a graft, we continue to use an equilibrium \ufb01nding technique, but we are not interested in\nthe pair of strategies \u2014 we are only interested in the strategy for the player of interest. This means\nin games like poker, where we are interested in a strategy for both players, we must construct a\ngrafted strategy separately for each player. Finally, when we construct a graft, our opponent must\nlearn a strategy for the entire, potentially abstract, game. By letting our opponent\u2019s strategy vary\ncompletely, our graft will be a strategy that is less prone to exploitation, forcing each individual\ngraft to mesh well with the base strategy and in turn with each other graft when combined.\nStrategy grafting allows us to construct a strategy with more expressive power that what can be\ncomputed by solving a single game. We now show that strategy grafting uses this expressive power\nto its advantage, causing an (approximate) improvement over its base strategy. Note that we cannot\nguarantee a strict improvement as the base strategy may already be an optimal strategy.\n\n(cid:26) \u03c3i(h, a)\n\n\u03c3\u2217\ni (h, a) =\n\n\u2217,j\ni\n\n\u03c3\n\n(h, a)\n\nif h \u2208 G0\nif h \u2208 Gj\n\n4\n\n\fTheorem 1 For strategies \u03c31, \u03c32 where \u03c32 is an \u0001-best response to \u03c31, if \u03c3\u2217\nfor player 1 where \u03c31 is used as the base strategy and G is the grafting partition then,\n\n1 is the grafted strategy\n\nu1(\u03c3\u2217\n\n1, \u03c32) \u2212 u1(\u03c31, \u03c32) =\n\nu1(\u03c3\n\n\u2217,j\n1 , \u03c32) \u2212 u1(\u03c31, \u03c32)\n\n(cid:17) \u2265 \u22123p\u0001.\n\nIn other words, the grafted strategy\u2019s improvement against \u03c32 is equal to the sum of the gains of the\nindividual grafts against \u03c32 and this gain is no less than \u22123p\u0001.\nPROOF. De\ufb01ne Zj as follows,\n\n\u2200j \u2208 {1, . . . , p} Zj = {z \u2208 Z |\u2203h \u2208 Gj with h a pre\ufb01x of z}\n\npX\n\n(cid:16)\n\nj=1\n\nZ0 = Z \\ p[\n\nZj\n\nj=1\n\n(4)\n\n(5)\n\n(6)\n\n(7)\n\n(8)\n\n(10)\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n(15)\n\nBy condition (3) of De\ufb01nition 7, Zj=0,...,p are disjoint and therefore form a partition of Z.\n\n(cid:17)\n\u2217,j\n1 , \u03c32) \u2212 u1(\u03c31, \u03c32)\n1 , \u03c32) \u2212X\n\nu1(z) Pr(z|\u03c3\n\n\u2217,j\n\nj=1\n\n(cid:16)\npX\npX\npX\n\nj=1\n\n=\n\n=\n\nu1(\u03c3\n\n X\nX\npX\n\nz\u2208Z\n\nj=1\n\nk=0\n\nz\u2208Zk\n\n(cid:16)\n\nu1(z)\n\nPr(z|\u03c3\n\nu1(z) Pr(z|\u03c31, \u03c32)\n\nz\u2208Z\n\n(cid:17)\n\u2217,j\n1 , \u03c32) \u2212 Pr(z|\u03c31, \u03c32)\n\n!\n\nNotice that for all z \u2208 Zk6=j, Pr(z|\u03c3\nnon-zero.\n\n\u2217,j\n1 , \u03c32) = Pr(z|\u03c31, \u03c32), so only when k = j is the summand\n(cid:16)\n\n(cid:17)\n\u2217,j\n1 , \u03c32) \u2212 Pr(z|\u03c31, \u03c32)\n\nPr(z|\u03c3\n\n(9)\n\nu1(z)\n\nz\u2208Zj\n\n=\n\nj=1\n\npX\nX\npX\nX\n=X\n X\n\nz\u2208Z\n\nj=1\n\n=\n\nu1(z) (Pr(z|\u03c3\u2217\n\n1, \u03c32) \u2212 Pr(z|\u03c31, \u03c32))\n\nz\u2208Zj\nu1(z) (Pr(z|\u03c3\u2217\n\n1, \u03c32) \u2212 Pr(z|\u03c31, \u03c32))\n\n!\n\nu1(z) Pr(z|\u03c31, \u03c32)\n\n1, \u03c32) \u2212X\n\nz\u2208Z\n\nare strategies of the \u0001-Nash equilibrium \u03c3\u2217,j,\n\u2217,j\n2 ) \u2212 \u0001\n\n\u2217,j\n2 ) \u2265 u1(\u03c31, \u03c3\n\n\u2217,j\n1 , \u03c3\n\nu1(z) Pr(z|\u03c3\u2217\n=\nz\u2208Z\n1, \u03c32) \u2212 u1(\u03c31, \u03c32)\n= u1(\u03c3\u2217\n\u2217,j\n\u2217,j\nand \u03c3\n2\n1\n\u2217,j\n1 , \u03c32) + \u0001 \u2265 u1(\u03c3\nMoreover, because \u03c32 is an \u0001-best response to \u03c31,\n\nFurthermore, since \u03c3\n\nu1(\u03c3\n\nSo,Pp\n\nj=1\n\n(cid:16)\n\nu1(\u03c3\n\n\u2217,j\n1 , \u03c32) \u2212 u1(\u03c31, \u03c32)\n\n\u2217,j\n2 ) \u2265 u1(\u03c31, \u03c32) \u2212 \u0001\n\nu1(\u03c31, \u03c3\n\n(cid:17) \u2265 \u22123p\u0001.\n\nThe main application of this theorem is in the following corollary, which follows immediately from\nthe de\ufb01nition of an \u0001-Nash equilibrium.\n\nCorollary 1 Let \u03b1 be an abstraction where \u03b12 = \u03c62 and \u03c3 be an \u0001-Nash equilibrium strategy for\nthe game \u0393\u03b1, then any grafted strategy \u03c3\u2217\n1 in \u0393 with \u03c31 used as the base strategy will be at most 3p\u0001\nworse than \u03c31 against \u03c32.\n\n5\n\n\fAlthough these results suggest that a grafted strategy will (approximately) improve on its base strat-\negy against an optimal opponent, there is one caveat: it assumes we know the opponent\u2019s abstraction\nor can solve a game with the opponent unabstracted. Without this knowledge or ability, this guaran-\ntee does not hold. However, all previous work that employs the use of abstract equilibrium strate-\ngies also implicitly makes this assumption. Though we know that re\ufb01ning an abstraction also has\nno guarantee on improving worst-case performance in the original game [7], the AAAI Computer\nPoker Competition [10] has shown that in practice larger abstractions and more expressive strategies\nconsistently perform well in the original game, even though competition opponents are not using the\nsame abstractions. We might expect a similar result even when the theorem\u2019s assumptions are not\nsatis\ufb01ed. In the next section we examine empirically both situations where we know our opponent\u2019s\nabstraction and situations where we do not.\n\n4 Experimental Results\n\nThe AAAI Computer Poker Competitions use various types of large Texas Hold\u2019em poker games.\nThese games are quite large and the resulting abstract games can take weeks of computation to solve.\nWe begin our experiments in a smaller poker game called Leduc Hold\u2019em where we can examine\nseveral grafted strategies. This is followed by analysis of a grafted strategy for two-player limit\nTexas Hold\u2019em that was submitted to the 2009 AAAI Poker Competition.\n\n4.1 Leduc Hold\u2019em\n\nLeduc Hold\u2019em is a two player poker game. The deck used in Leduc Hold\u2019em contains six cards,\ntwo jacks, two queens and two kings, and is shuf\ufb02ed prior to playing a hand. At the beginning of a\nhand, each player pays a one chip ante to the pot and receives one private card. A round of betting\nthen takes place starting with player one. After the round of betting, a single public card is revealed\nfrom the deck, which both players use to construct their hand. This card is called the \ufb02op. Another\nround of betting occurs after the \ufb02op, again starting with player one, and then a showdown takes\nplace. At a showdown, if either player has paired their private card with the public card they win all\nthe chips in the pot. In the event neither player pairs, the player with the higher card is declared the\nwinner. The players split the money in the pot if they have the same private card.\nEach betting round follows the same format. The \ufb01rst player to act has the option to check or bet.\nWhen betting the player adds chips into the pot and action moves to the other player. When a player\nfaces a bet, they have the option to fold, call or raise. When folding, a player forfeits the hand and\nall the money in the pot is awarded to the opposing player. When calling, a player places enough\nchips into the pot to match the bet faced and the betting round is concluded. When raising, the player\nmust put more chips into the pot than the current bet faced and action moves to the opposing player.\nIf the \ufb01rst player checks initially, the second player may check to conclude the betting round or bet.\nIn Leduc Hold\u2019em there is a limit of one bet and one raise per round. The bets and raises are of a\n\ufb01xed size. This size is two chips in the \ufb01rst betting round and four chips in the second.\n\nTournament Setup. Despite using a smaller poker game, we aim to create a tournament setting\nsimilar to the AAAI Poker Competition. To accomplish this we will create a variety of equilibrium-\nlike players using abstractions of varying size. Each of these strategies will then be used as a base\nstrategy to create two grafted strategies. All strategies are then played against each other in a round-\nrobin tournament. A strategy is said to beat another strategy if its expected winnings against the other\nis positive. Unlike the AAAI Poker Competition, in our smaller game we can feasibly compute the\nexpected value of one strategy against another and thus we are not required to sample.\nThe abstractions used are J.Q.K, JQ.K, and J.QK. Prior to the \ufb02op, the \ufb01rst abstraction can distin-\nguish all three cards, the second abstraction cannot distinguish a jack from a queen and the third\ncannot distinguish a queen from a king. Post\ufb02op, all three abstractions are only aware of if they\nhave paired their private card. These three abstractions were hand chosen as they are representative\nof how current abstraction techniques will group hands together. The \ufb01rst abstraction is the biggest,\nand hence we would expect it to do the best. The second and third abstractions are the same size.\nWe chose to train two types of grafted strategies: pre\ufb02op grafts and \ufb02op grafts. Both types consist\nof three individual grafts for each player: one to play each card with complete information. That is,\n\n6\n\n\f(1) J.Q.K pre\ufb02op grafts\n(2) J.Q.K \ufb02op grafts\n(3) JQ.K \ufb02op grafts\n(4) JQ.K pre\ufb02op grafts\n(5) J.QK pre\ufb02op grafts\n(6) J.Q.K\n(7) JQ.K\n(8) J.QK \ufb02op grafts\n(9) J.QK\n\n(1)\n\n-2.3\n-28.0\n-17.5\n-12.2\n-26.6\n-36.7\n-22.3\n-54.7\n\n(2)\n2.3\n\n-28.6\n-18.6\n-16.9\n-23.9\n-39.7\n-24.7\n-49.6\n\n(3)\n28.0\n28.6\n\n47.2\n-67.0\n0.9\n-28.5\n-79.9\n-89.2\n\n(4)\n17.5\n18.6\n-47.2\n\n11.2\n-9.0\n-67.3\n-3.7\n-62.8\n\n(5)\n12.2\n16.9\n67.0\n-11.2\n\n-8.1\n20.0\n-30.9\n-110.0\n\n(6)\n26.6\n23.9\n-0.9\n9.0\n8.1\n\n-13.6\n-7.5\n-32.5\n\n(7)\n36.7\n39.7\n28.5\n67.3\n-20.0\n13.6\n\n-42.2\n-70.6\n\n(8)\n22.3\n24.7\n79.9\n3.7\n30.9\n7.5\n42.2\n\n-83.3\n\n(9)\n54.7\n49.6\n89.2\n62.8\n110.0\n32.5\n70.6\n83.3\n\nAvg.\n25.0\n25.0\n20.0\n17.9\n5.5\n-1.6\n-6.6\n-16.0\n-69.1\n\nTable 1: Expected winnings of the row player against the column player in millibets per hand (mb/h)\n\nWins Losses Exploitability\n\nStrategy\nJ.Q.K pre\ufb02op grafts\nJ.Q.K \ufb02op grafts\nJQ.K pre\ufb02op grafts\nJQ.K \ufb02op grafts\nJ.QK pre\ufb02op grafts\nJ.Q.K\nJQ.K\nJ.QK \ufb02op grafts\nJ.QK\n\n8\n7\n5\n4\n4\n4\n3\n1\n0\n\n0\n1\n3\n4\n4\n4\n5\n7\n8\n\n298.3\n321.1\n465.9\n509.0\n507.3\n315.1\n246.8\n503.5\n371.1\n\nTable 2: Each strategy\u2019s number of wins, losses, and exploitability in unabstracted Leduc Hold\u2019em\nin millibets per hand (mb/h)\n\neach graft does not abstract the sub-game for the observed card. These two types differ in that the\npre\ufb02op grafts play for the entire game whereas the \ufb02op grafts only play the game after the \ufb02op. For\npre\ufb02op grafts, this means G0 is empty, i.e., the \ufb01nal grafted strategy is always using the probabilities\nfrom some graft and never the base strategy. For \ufb02op grafts, the grafted strategy follows the base\nstrategy in all pre\ufb02op information sets. We use \u03b5-Nash equilibria in the three abstract games as our\nbase strategies. Each base strategy and graft is trained using counterfactual regret minimization for\none billion iterations. The equilibria found are \u03b5-Nash equilibria where no player can bene\ufb01t more\nthan \u03b5 = 10\u22125 chips by deviating within the abstract game. We measure the expected winnings in\nmillibets per hand or mb/h. A millibet is one thousandth of a small bet, or 0.002 chips.\n\nResults. We can see in Table 1 that the grafted strategies perform well in a \ufb01eld of equilibrium-\nlike strategies. The base strategy seems to be of great importance when training a grafted strategy.\nThough JQ.K and J.QK are the same size, the JQ.K strategy performs better in this tournament\nsetting. Similarly, the grafted strategies appear to maintain the ordering of their base strategies\neither when considering the expected winnings in Table 1 or the number of wins in Table 2 (though\nJQ.K \ufb02op grafts switches places with JQ.K pre\ufb02op grafts in the ordering). Although the choice of\nbase strategy is important, the grafted strategies do well under both evaluation criteria and even the\nworst base strategy sees great relative improvement when used to train grafted strategies.\nThere are also a few other interesting trends in these results. First, our intuition that larger strategies\nperform better seems to hold in all cases except for J.QK \ufb02op grafts. Larger abstractions also perform\nbetter for the non-grafted strategies as J.Q.K is the biggest equilibrium strategy and it performs the\nbest out of this group. Second, it appears that the pre\ufb02op grafts are usually better than the \ufb02op grafts.\nThis can be explained by the fact that the pre\ufb02op grafts have more information about the original\ngame. Finally, observe that the grafted strategies can have worse exploitability in the original game\nthan their corresponding base strategy. Although this can make grafted strategies more vulnerable\nto exploitive strategies, they appear to perform well against a \ufb01eld of equilibrium-like opponents.\nIn fact, in our experiment, grafted strategies appear to only improve upon the base strategy despite\nnot always knowing the opponent\u2019s abstraction. This suggests that exploitability is not the only\nimportant measure of strategy quality. Contrast the grafted strategies with the strategy that always\nfolds, which is exploitable at 500 mb/h. Although always folding is less exploitable than some of\nthe grafted strategies, it cannot win against any opponent and would place last in this tournament.\n\n7\n\n\f(1) 20x8 Grafted\n(2) 20x32\n(3) 20x8 (Base)\n(4) 20x7\n(5) 14\n(6) 12\n\nRelative Size\n\n1.0\n2.53\n1.0\n0.43\n0.82\n0.45\n\n(1)\n\n-2.1\n-14.5\n-18.1\n-13.7\n-18.7\n\n(2)\n2.1\n\n-4.9\n-9.4\n-11.8\n-15.5\n\n(3)\n14.5\n4.9\n\n-6.2\n-7.2\n-10.7\n\n(4)\n18.1\n9.4\n6.2\n\n-1.7\n-5.0\n\n(5)\n13.7\n11.8\n7.2\n1.7\n\n-5.3\n\n(6)\n18.7\n15.5\n10.7\n5.0\n5.3\n\nAvg.\n13.4\n7.9\n0.9\n-5.4\n-5.8\n-11.0\n\nTable 3: Sampled expected winnings in Texas Hold\u2019em of the row player against the column player\nin millibets per hand (mb/h). 95% con\ufb01dence intervals are between 0.8 and 1.6. Relative size is the\nratio of the size of the abstract game(s) solved for the row strategy and the base strategy.\n\n4.2 Texas Hold\u2019em\n\nTwo-player limit Texas Hold\u2019em bears many similarities to Leduc Hold\u2019em but is much larger in\nscale with respect to the parameters: cards in the deck, private cards, public cards, betting rounds and\nbets per round. Due to the computational cost2 needed to solve a strong equilibrium, our experiments\nconsist of a single grafted strategy. Table 3 shows the results of running this large grafted strategy\nagainst equilibrium-like strategies using a variety of abstractions.\nThe 20x32 strategy is the largest single imperfect recall abstract game solved to date. It is approxi-\nmately 2.53 times larger than the base strategy used with grafting, 20x8. The 20x7 (imperfect recall)\nand 12 (perfect recall) strategies were the entrants put forward by the Computer Poker Research\nGroup for the 2008 and 2007 AAAI Computer Poker Competitions, respectively. The 14 strategy\nwas considered for the 2008 competition, but it was ultimately superseded by the smaller 20x7. For\na detailed description of these abstractions and the rules of Texas Hold\u2019em see A Practical Use of\nImperfect Recall [8].\nAs evident in the results, the grafted strategy beats all of the players with statistical signi\ufb01cance, even\nthe largest single strategy. In addition to these results against other Computer Poker Research Group\nstrategies, the grafted strategy also performed well at the 2009 AAAI Computer Poker Competition.\nThere, against a \ufb01eld of thirteen strong strategies, it placed second and fourth (narrowly behind the\nthird place entrant) in the limit run-off and limit bankroll competitions, respectively.\nThese results demonstrate that strategy grafting is competitive and allows one to augment their\nexisting strategies. Any improvement to the quality of a base strategy should in turn improve the\nquality of the grafted strategy in similar tournament settings. This means that strategy grafting can\nbe used transparently on top of more sophisticated strategy-computing methods.\n\n5 Conclusion\n\nWe have introduced a new method, called strategy grafting, for independently solving and com-\nbining sub-games in large extensive games. This method allows us to create larger strategies than\npreviously possible by solving many sub-games. These new strategies seem to maintain the features\nof good equilibrium-like strategies. By creating larger strategies we hope to play fewer dominated\nstrategies and, in turn, make fewer mistakes. Against a static equilibrium-like opponent, making\nfewer mistakes should lead to an improvement in the quality of play. Our empirical results con\ufb01rm\nthis intuition and demonstrate that this new method can improve the performance of the state-of-the-\nart in both a simulated competition and the actual AAAI Computer Poker Competition. It is likely\nthat much of the strength of these new strategies will be bounded by the quality of the base strategy\nused. In this regard, we are still limited by the capabilities of current methods.\n\nAcknowledgments\n\nThe authors would like to thank the members of the Computer Poker Research Group at the Univer-\nsity of Alberta for helpful conversations pertaining to this research. This research was supported by\nNSERC, iCORE, and Alberta Ingenuity.\n\n2This particular grafted strategy was computed on a large cluster using 640 processors over almost 6 days.\n\n8\n\n\fReferences\n[1] Darse Billings, Neil Burch, Aaron Davidson, Robert Holte, Jonathan Schaeffer, Terance\nSchauenberg, and Duane Szafron. Approximating Game-Theoretic Optimal Strategies for\nFull-scale Poker. In International Joint Conference on Arti\ufb01cial Intelligence, pages 661\u2013668,\n2003.\n\n[2] Andrew Gilpin, Samid Hoda, Javier Pe\u02dcna, and Tuomas Sandholm. Gradient-based Algorithms\nfor Finding Nash Equilibria in Extensive Form Games. In Proceedings of the Eighteenth In-\nternational Conference on Game Theory, 2007.\n\n[3] Andrew Gilpin and Tuomas Sandholm. A Competitive Texas Hold\u2019em Poker Player via Auto-\nmated Abstraction and Real-time Equilibrium Computation. In Proceedings of the Twenty-First\nConference on Arti\ufb01cial Intelligence, 2006.\n\n[4] Andrew Gilpin and Tuomas Sandholm. Expectation-Based Versus Potential-Aware Automated\nAbstraction in Imperfect Information Games: An Experimental Comparison Using Poker. In\nProceedings of the Twenty-Third Conference on Arti\ufb01cial Intelligence, 2008.\n\n[5] Daphne Koller and Avi Pfeffer. Representations and Solutions for Game-Theoretic Problems.\n\nArti\ufb01cial Intelligence, 94:167\u2013215, 1997.\n\n[6] Martin Osborne and Ariel Rubinstein. A Course in Game Theory. The MIT Press, Cambridge,\n\nMassachusetts, 1994.\n\n[7] Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron. Abstraction Patholo-\nIn Proceedings of the Eighth International Joint Conference on\n\ngies in Extensive Games.\nAutonomous Agents and Multi-Agent Systems, pages 781\u2013788, 2009.\n\n[8] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and\nMichael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Sym-\nposium on Abstraction, Reformulation and Approximation, 2009.\n\n[9] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret Mini-\nmization in Games with Incomplete Information. In Advances in Neural Information Process-\ning Systems Twenty, pages 1729\u20131736, 2008. A longer version is available as a University of\nAlberta Technical Report, TR07-14.\n\n[10] Martin Zinkevich and Michael Littman. The AAAI Computer Poker Competition. Journal of\n\nthe International Computer Games Association, 29, 2006. News item.\n\n9\n\n\f", "award": [], "sourceid": 831, "authors": [{"given_name": "Kevin", "family_name": "Waugh", "institution": null}, {"given_name": "Nolan", "family_name": "Bard", "institution": null}, {"given_name": "Michael", "family_name": "Bowling", "institution": null}]}