{"title": "An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games", "book": "Advances in Neural Information Processing Systems", "page_first": 817, "page_last": 823, "abstract": null, "full_text": "An Efficient, Exact Algorithm for Solving \n\nTree-Structured Graphical Games \n\nMichael L. Littman \nAT&T Labs- Research \n\nFlorham Park, NJ 07932-0971 \nmlittman\u00a9research.att.com \n\nMichael Kearns \n\nDepartment of Computer & Information Science \n\nUniversity of Pennsylvania \nPhiladelphia, PA 19104-6389 \n\nmkearns\u00a9cis.upenn.edu \n\nSatinder Singh \nSyntek Capital \n\nNew York, NY 10019-4460 \nbaveja\u00a9cs. colorado. edu \n\nAbstract \n\nWe describe a new algorithm for computing a Nash equilibrium in \ngraphical games, a compact representation for multi-agent systems \nthat we introduced in previous work. The algorithm is the first \nto compute equilibria both efficiently and exactly for a non-trivial \nclass of graphical games. \n\n1 \n\nIntroduction \n\nSeeking to replicate the representational and computational benefits that graph(cid:173)\nical models have provided to probabilistic inference, several recent works \nhave introduced graph-theoretic frameworks for the study of multi-agent sys(cid:173)\ntems (La Mura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest \nof these formalisms, each vertex represents a single agent, and the edges represent \npairwise interaction between agents. As with many familiar network models, the \nmacroscopic behavior of a large system is thus implicitly described by its local inter(cid:173)\nactions, and the computational challenge is to extract the global states of interest. \nClassical game theory is typically used to model multi-agent interactions, and the \nglobal states of interest are thus the so-called Nash equilibria, in which no agent \nhas a unilateral incentive to deviate. \n\nIn a recent paper (Kearns et al. 2001), we introduced such a graphical formalism for \nmulti-agent game theory, and provided two algorithms for computing Nash equilib(cid:173)\nria when the underlying graph is a tree (or is sufficiently sparse). The first algorithm \n\n\fcomputes approximations to all Nash equilibria, in time polynomial in the size of \nthe representation and the quality of the desired approximation. A second and \nrelated algorithm computes all Nash equilibria exactly, but in time exponential in \nthe number of agents. We thus left open the problem of efficiently computing exact \nequilibria in sparse graphs. \nIn this paper, we describe a new algorithm that solves this problem. Given as input \na graphical game that is a tree, the algorithm computes in polynomial time an ex(cid:173)\nact Nash equilibrium for the global multi-agent system. The main advances involve \nthe definition of a new data structure for representing \"upstream\" or partial Nash \nequilibria, and a proof that this data structure can always be extended to a global \nequilibrium. The new algorithm can also be extended to efficiently accommodate \nparametric representations of the local game matrices, which are analogous to para(cid:173)\nmetric conditional probability tables (such as noisy-OR and sigmoids) in Bayesian \nnetworks. \n\nThe analogy between graphical models for multi-agent systems and probabilistic \ninference is tempting and useful to an extent. The problem of computing Nash \nequilibria in a graphical game, however, appears to be considerably more difficult \nthan computing conditional probabilities in Bayesian networks. Nevertheless, the \nanalogy and the work presented here suggest a number of interesting avenues for \nfurther work in the intersection of game theory, network models, probabilistic in(cid:173)\nference, statistical physics, and other fields. \n\nThe paper is organized as follows. Section 2 introduces graphical games and other \nnecessary notation and definitions. Section 3 presents our algorithm and its analysis, \nand Section 4 gives a brief conclusion. \n\n2 Preliminaries \n\nAn n-player, two-action 1 game is defined by a set of n matrices Mi (1 ~ i ~ n), \neach with n indices. The entry Mi(Xl, ... ,xn ) = Mi(X) specifies the payoff to player \ni when the joint action of the n players is x E {O, I} n. Thus, each Mi has 2n entries. \nIf a game is given by simply listing the 2n entries of each of the n matrices, we will \nsay that it is represented in tabular form. \nThe actions \u00b0 and 1 are the pure strategies of each player, while a mixed strategy \nfor player i is given by the probability Pi E [0, 1] that the player will play 1. For \nany joint mixed strategy, given by a product distribution p, we define the expected \npayoff to player i as Mi(i/) = Ex~p[Mi(X)], where x'\" pindicates that each Xj is 1 \nwith probability Pj and \u00b0 with probability 1 - Pj. \n\nWe use p[i : P:] to denote the vector that is the same as p except in the ith \ncomponent, where the value has been changed to P:. A Nash equilibrium for the \ngame is a mixed strategy p such that for any player i, and for any value P: E \n[0,1], Mi(i/) ::::: Mi(p[i : pm. (We say that Pi is a best response to jJ.) In other \n\nwords, no player can improve its expected payoff by deviating unilaterally from a \nNash equilibrium. The classic theorem of Nash (1951) states that for any game, \nthere exists a Nash equilibrium in the space of joint mixed strategies (product \ndistri butions). \n\nAn n-player graphical game is a pair (G, M), where G is an undirected graph2 on n \n\n1 At present, no polynomial-time algorithm is known for finding Nash equilibria even in \n2-player games with more than two actions, so we leave the extension of our work to the \nmulti-action setting for future work. \n\n2The directed tree-structured case is trivial and is not addressed in this paper. \n\n\fvertices and M is a set of n matrices Mi (1 ::; i ::; n), called the local game matrices . \nPlayer i is represented by a vertex labeled i in G. We use N G (i) ~ {I, ... , n} \nto denote the set of neighbors of player i in G-\nthose vertices j such that the \nundirected edge (i , j) appears in G. By convention, NG(i) always includes i itself. \nThe interpretation is that each player is in a game with only his neighbors in G. \nThus, if ING(i) I = k, the matrix Mi has k indices, one for each player in NG(i) , and \nif x E [0, Ilk, Mi(X) denotes the payoff to i when his k neighbors (which include \nhimself) play x. The expected payoff under a mixed strategy jJ E [0, Ilk is defined \nanalogously. Note that in the two-action case, Mi has 2k entries, which may be \nconsiderably smaller than 2n. \nSince we identify players with vertices in G, it will be easier to treat vertices sym(cid:173)\nbolically (such as U, V and W) rather than by integer indices. We thus use Mv to \ndenote the local game matrix for the player identified with vertex V. \n\nNote that our definitions are entirely representational, and alter nothing about the \nunderlying game theory. Thus, every graphical game has a Nash equilibrium. Fur(cid:173)\nthermore, every game can be trivially represented as a graphical game by choosing \nG to be the complete graph and letting the local game matrices be the original \ntabular form matrices. Indeed, in some cases, this may be the most compact graph(cid:173)\nical representation of the tabular game. However, exactly as for Bayesian networks \nand other graphical models for probabilistic inference, any game in which the local \nneighborhoods in G can be bounded by k \u00ab n, exponential space savings accrue. \nThe algorithm presented here demonstrates that for trees, exponential computa(cid:173)\ntional benefits may also be realized. \n\n3 The Algorithm \n\nIf (G, M) is a graphical game in which G is a tree, then we can always designate \nsome vertex Z as the root. For any vertex V, the single neighbor of Von the path \nfrom V to Z shall be called the child of V, and the (possibly many) neighbors of V \non paths towards the leaves shall be called the parents of V. Our algorithm consists \nof two passes: a downstream pass in which local data structures are passed from the \nleaves towards the root, and an upstream pass progressing from the root towards \nthe leaves. \n\nThroughout the ensuing discussion, we consider a fixed vertex V with parents \nUI , ... , Uk and child W. On the downstream pass of our algorithm, vertex V will \ncompute and pass to its child W a breakpoint policy, which we now define. \n\nDefinition 1 A breakpoint policy for V consists of an ordered set of W -breakpoints \n\nWo = \u00b0 < WI < W2 < ... < Wt-I < Wt = 1 and an associated set of V-values \n\nVI , . .. ,Vt\u00b7 The interpretation is that for any W E [0,1], if Wi-I < W < Wi for some \nindex i and W plays w, then V shall play Vii and if W = Wi for some index i , then \nV shall play any value between Vi and Vi+I. We say such a breakpoint policy has \nt - 1 breakpoints. \n\nA breakpoint policy for V can thus be seen as assigning a value (or range of values) \nto the mixed strategy played by V in response to the play of its child W. In a slight \nabuse of notation, we will denote this breakpoint policy as a function Fv(w), with \nthe understanding that the assignment V = Fv(w) means that V plays either the \nfixed value determined by the breakpoint policy (in the case that W falls between \nbreakpoints), or plays any value in the interval determined by the breakpoint policy \n(in the case that W equals some breakpoint). \n\n\fLet G V denote the subtree of G with root V, and let M~=w denote the subset \nof the set of local game matrices M corresponding to the vertices in GV , except \nthat the matrix M v is collapsed one index by setting W = w, thus marginalizing \nW out. On its downstream pass, our algorithm shall maintain the invariant that if \nwe set the child W = w, then there is a Nash equilibrium for the graphical game \n(G v , M~=w) (an upstream Nash) in which V = Fv(w). If this property is satisfied \nby Fv(w), we shall say that Fv(w) is a Nash breakpoint policy for V. Note that \nsince (Gv, M~=w) is just another graphical game, it of course has (perhaps many) \nNash equilibria, and V is assigned some value in each. The trick is to commit \nto one of these values (as specified by Fv (w)) that can be extended to a Nash \nequilibrium for the entire tree G, before we have even processed the tree below V . \nAccomplishing this efficiently and exactly is one of the main advances in this work \nover our previous algorithm (Kearns et al. 2001). \nThe algorithm and analysis are inductive: V computes a Nash breakpoint policy \nFv(w) from Nash breakpoint policies FUl (v), ... , FUk (v) passed down from its par(cid:173)\nents (and from the local game matrix Mv). The complexity analysis bounds the \nnumber of breakpoints for any vertex in the tree. We now describe the inductive \nstep and its analysis. \n\n3.1 Downstream Pass \nFor any setting it E [0, l]k for -0 and w E [0,1] for W, let us define \n\n~v(i1,w) == Mv(l,it,w) - Mv(O,it,w). \n\nThe sign of ~v(it, w) tells us V's best response to the setting of the local neighbor(cid:173)\nhood -0 = it, W = w; positive sign means V = 1 is the best response, negative that \nV = 0 is the best response, and 0 that V is indifferent and may play any mixed \nstrategy. Note also that we can express ~v(it,w) as a linear function of w: \n\n~v(it,w) = ~v(it, O) + w(~v(it, 1) - ~v(it, 0)). \n\nFor the base case, suppose V is a leaf with child W; we want to describe the Nash \nbreakpoint policy for V. If for all w E [0,1], the function ~v(w) is non-negative \n(non-positive, respectively), V can choose 1 (0, respectively) as a best response \n(which in this base case is an upstream Nash) to all values W = w. Otherwise, \n~ v (w) crosses the w-axis, separating the values of w for which V should choose \n1, 0, or be indifferent (at the crossing point). Thus, this crossing point becomes \nthe single breakpoint in Fv(w). Note that if V is indifferent for all values of w, we \nassume without loss of generality that V plays l. \n\nThe following theorem is the centerpiece of the analysis. \n\nTheorem 2 Let vertex V have parents UI , ... ,Uk and child W, and assume V has \nreceived Nash breakpoint policies FUi (v) from each parent Ui . Then V can efficiently \ncompute a Nash breakpoint policy Fv (w). The number of breakpoints is no more \nthan two plus the total number of breakpoints in the FUi (v) policies. \n\nProof: Recall that for any fixed value of v, the breakpoint policy FUi (v) specifies \neither a specific value for Ui (if v falls between two breakpoints of FUi (v)) , or a range \nof allowed values for Ui (if v is equal to a breakpoint). Let us assume without loss of \ngenerality that no two FUi (v) share a breakpoint, and let Vo = 0 < VI < ... < Vs = 1 \nbe the ordered union of the breakpoints of the FUi (v). Thus for any breakpoint Vi, \nthere is at most one distinguished parent Uj (that we shall call the free parent) for \nwhich Fu; (Vi) specifies an allowed interval of play for Uj . All other Ui are assigned \n\n\ffixed values by Fu; (ve). For each breakpoint Ve, we now define the set of values for \nthe child W that, as we let the free parent range across its allowed interval, permit \nV to play any mixed strategy as a best response. \n\nDefinition 3 Let Vo = 0 < VI < ... < Vs = 1 be the ordered union of the break(cid:173)\npoints of the parent policies Fu; (v). Fix any breakpoint Ve, and assume without loss \nof generality that UI is the free parent of V for V = Ve. Let [a, b] be the allowed \ninterval ofUI specified by FUI (ve), and letui = Fu;(ve) for all 2 :::; i:::; k. We define \n\nWe = {w E [0,1]: (:lUI E [a,b])6.v(UI,U2, ... ,Uk,W) = O}. \n\nIn words, We is the set of values that W can play that allow V to play any mixed \nstrategy, preserving the existence of an upstream Nash from V given W = w. \n\nThe next lemma, which we state without proof and is a special case of Lemma 6 in \nKearns et al. (2001), limits the complexity of the sets We. It also follows from the \nearlier work that We can be computed in time proportional to the size of V's local \ngame matrix - O(2k) for a vertex with k parents. \nWe say that an interval [a, b] ~ [0, 1] is floating if both a -I- 0 and b -I- 1. \n\nLemma 4 For any breakpoint Ve, the set We is either empty, a single interval, or \nthe union of two intervals that are not floating. \n\nWe wish to create the (inductive) Nash breakpoint policy Fv(w) from the sets W e \nand the Fu; policies. The idea is that if w E We for some breakpoint index e, \nthen by definition of We, if W plays wand the Uis play according to the setting \ndetermined by the Fu; policies (including a fixed setting for the free parent of V), \nany play by V is a best response-so in particular, V may play the breakpoint value \nVe, and thus extend the Nash solution constructed, as the UiS can also all be best \nresponses. For b E {O, I}, we define W b as the set of values w such that if W = w \nand the Uis are set according to their breakpoint policies for V = b, V = b is a \nbest response. To create Fv (w) as a total function, we must first show that every \nw E [0, 1] is contained in some We or WO or WI. \n\nLemma 5 Let Vo = 0 < VI < ... < Vs = 1 be the ordered union of the breakpoints \nof the Fu; (v) policies. Then for any value w E [0, 1], either w E w b for some \nbE {O, I} , or there exists an index e such that wE W e. \n\nProof: Consider any fixed value of w, and for each open interval (vi> vj+d de(cid:173)\ntermined by adjacent breakpoints, label this interval by V 's best response (0 or \n1) to W = wand 0 set according to the Fu; policies for this interval. If either \nthe leftmost interval [O ,vd is labeled with 0 or the rightmost interval [vs-I , I] is \nlabeled with 1, then w is included in W O or WI , respectively (V playing 0 or 1 \nis a best response to what the Uis will play in response to a 0 or 1). Otherwise, \nsince the labeling starts at 1 on the left and ends at 0 on the right, there must be \na breakpoint Ve such that V's best response changes over this breakpoint. Let Ui \nbe the free parent for this breakpoint. By continuity, there must be a value of Ui \nin its allowed interval for which V is indifferent to playing 0 or 1, so w E We. This \ncompletes the proof of Lemma 5. \nArmed with Lemmas 4 and 5, we can now describe the construction of Fv(w). Since \nevery w is contained in some W e (Lemma 5), and since every We is the union of at \nmost two intervals (Lemma 4), we can uniquely identify the set WeI that covers the \nlargest (leftmost) interval containing w = 0; let [0, a] be this interval. Continuing \nin the same manner to the right, we can identify the unique set We2 that contains \n\n\fv7r----- --- ----- --- ----- --- ----- --- ----- -------- - r - - - - - -\nv6 - ------~ -------- - -- - -- - -- - -- - - - - - - - - - - - - - - - - - - - - -\n\nV \n\nv5 ------------------------------ - , - - - - - - - ' ---------------\nv4 -\n------------- ------------ ---------------------\n\nv3f------.- ---------------------- --------------------------- - -\n\nv2 _______ --'-_ _____ ----L _________________________________ _ \n\nvI ----------------------------------------------------\n\nw \n\nFigure 1: Example of the inductive construction of Fv(w). The dashed horizontal \nlines show the vrbreakpoints determined by the parent policies Fu; (v). The solid \nintervals along these breakpoints are the sets We. As shown in Lemma 4, each of \nthese sets consists of either a single (possibly floating) interval, or two non-floating \nintervals. As shown in Lemma 5, each value of w is covered by some We. The \nconstruction of Fv(w) (represented by a thick line) begins on the left, and always \nnext \"jumps\" to the interval allowing greatest progress to the right. \n\nw = a and extends farthest to the right of a. Any overlap between We 1 and We2 \ncan be arbitrarily assigned coverage by We 1 , and We2 \"trimmed\" accordingly; see \nFigure 1. This process results in a Nash breakpoint policy Fv(w). \nFinally, we bound the number of breakpoints in the Fv (w) policy. By construction, \neach of its breakpoints must be the rightmost portion of some interval in WO, WI, or \nsome We. After the first breakpoint, each of these sets contributes at most one new \nbreakpoint (Lemma 4). The final breakpoint is at w = 1 and does not contribute \nto the count (Definition 1). There is at most one We for each breakpoint in each \nFu; (v) policy, plus WO and WI, plus the initial leftmost interval and minus the \nfinal breakpoint, so the total breakpoints in Fv(w) can be no more than two plus \nthe total number of breakpoints in the Fu; (v) policies. Therefore, the root of a size \nn tree will have a Nash breakpoint policy with no more than 2n breakpoints. \n\nThis completes the proof of Theorem 2. \n\n3.2 Upstream Pass \n\nThe downstream pass completes when each vertex in the tree has had its Nash \nbreakpoint policy computed. For simplicity of description, imagine that the root of \nt he tree includes a dummy child with constant payoffs and no influence on t he root, \nso the root's breakpoint policy has t he same form as the others in the tree. \n\nTo produce a Nash equilibrium, our algorithm performs an upstream pass over \nthe tree, starting from the root. Each vertex is told by its child what value to \nplay, as well as the value the child itself will play. The algorithm ensures that all \ndownstream vertices are Nash (playing best response to their neighbors). Given \nthis information, each vertex computes a value for each of its parents so that its \n\n\fown assigned action is a best response. This process can be initiated by the dummy \nvertex picking an arbitrary value for itself, and selecting the root's value according \nto its Nash breakpoint policy. \nInductively, we have a vertex V connected to parents U1 , ... , Uk (or no parents if \nV is a leaf) and child W. The child of V has informed V to chose V = v and that \nit will play W = w. To decide on values for V's parents to enforce V playing a best \nresponse, we can look at the Nash breakpoint policies FUi (v), which provide a value \n(or range of values) for Ui as a function of v that guarantee an upstream Nash. The \nvalue v can be a breakpoint for at most one Ui . For each Ui , if v is not a breakpoint \nin FUi (v) , then Ui should be told to select Ui = FUi (v). If v is a breakpoint in \nFUi (v), then Ui's value can be computed by solving ~V(Ul \"'\" Ui,\"\" Uk, w) = 0; \nthis is the value of Ui that makes V indifferent. The equation is linear in Ui and has \na solution by the construction of the Nash breakpoint policies on the downstream \npass. Parents are passed their assigned values as well as the fact that V = v. \nWhen the upstream pass completes, each vertex has a concrete choice of action such \nthat jointly they have formed a Nash equilibrium. \n\nThe total running time of the algorithm can be bounded as follows. Each vertex \nis involved in a computation in the downstream pass and in the upstream pass. \nLet t be the total number of breakpoints in the breakpoint policy for a vertex V \nwith k parents. Sorting the breakpoints and computing the W\u00a3 sets and computing \nthe new breakpoint policy can be completed in 0 (t log t + t2 k ). In the upstream \npass, only one breakpoint is considered, so 0 (log t + 2k) is sufficient for passing \nbreakpoints to the parents. By Theorem 2, t :S 2n , so the entire algorithm executes \nin time O(n2 10g n + n22k), where k is the largest number of neighbors of any vertex \nin the network. \n\nThe algorithm can be implemented to take advantage of local game matrices pro(cid:173)\nvided in a parameterized form. For example, if each vertex's payoff is solely a \nfunction of the number of 1s played by the vertex's neighbors, the algorithm takes \nO(n2 10gn + n 2 k), eliminating the exponential dependence on k. \n\n4 Conclusion \n\nThe algorithm presented in this paper finds a single Nash equilibrium for a game \nrepresented by a tree-structured network. By building representations of all equilib(cid:173)\nria, our earlier algorithm (Kearns et al. 2001) was able to select equilibria efficiently \naccording to criteria like maximizing the total expected payoff for all players. The \npolynomial-time algorithm described in this paper throws out potential equilibria \nat many stages, most significantly during the construction of the Nash breakpoint \npolicies. An interesting area for future work is to manipulate this process to produce \nequilibria with particular properties. \n\nReferences \nMichael Kearns, Michael L. Littman, and Satinder Singh. Graphical models for game \ntheory. In Proceedings of the 17th Conference on Uncertainty in Artificial Int elligence \n(UAI), pages 253- 260, 200l. \n\nDaphne Koller and Brian Milch. Multi-agent influence diagrams for representing and \n\nsolving games. Submitted, 2001. \n\nPierfrancesco La Mura. Game networks. In Proceedings of the 16th Conference on Uncer(cid:173)\n\ntainty in Artificial Intelligence (UAI), pages 335- 342, 2000. \n\nJ . F. Nash. Non-cooperative games. Annals of Math ematics, 54:286- 295, 1951. \n\n\f", "award": [], "sourceid": 2101, "authors": [{"given_name": "Michael", "family_name": "Littman", "institution": null}, {"given_name": "Michael", "family_name": "Kearns", "institution": null}, {"given_name": "Satinder", "family_name": "Singh", "institution": null}]}