{"title": "Solving Large Sequential Games with the Excessive Gap Technique", "book": "Advances in Neural Information Processing Systems", "page_first": 864, "page_last": 874, "abstract": "There has been tremendous recent progress on equilibrium-finding algorithms for zero-sum imperfect-information extensive-form games, but there has been a puzzling gap between theory and practice. First-order methods have significantly better theoretical convergence rates than any counterfactual-regret minimization (CFR) variant. Despite this, CFR variants have been favored in practice. Experiments with first-order methods have only been conducted on small- and medium-sized games because those methods are complicated to implement in this setting, and because CFR variants have been enhanced extensively for over a decade they perform well in practice. In this paper we show that a particular first-order method, a state-of-the-art variant of the excessive gap technique---instantiated with the dilated entropy distance function---can efficiently solve large real-world problems competitively with CFR and its variants. We show this on large endgames encountered by the Libratus poker AI, which recently beat top human poker specialist professionals at no-limit Texas hold'em. We show experimental results on our variant of the excessive gap technique as well as a prior version. We introduce a numerically friendly implementation of the smoothed best response computation associated with first-order methods for extensive-form game solving. We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games. We present comparisons of several excessive gap technique and CFR variants.", "full_text": "SolvingLargeSequentialGameswiththeExcessiveGapTechniqueChristianKroer,GabrieleFarina,andTuomasSandholmDepartmentofComputerScienceCarnegieMellonUniversityPittsburgh,PA15213{ckroer,gfarina,sandholm}@cs.cmu.eduAbstractTherehasbeentremendousrecentprogressonequilibrium-\ufb01ndingalgorithmsforzero-sumimperfect-informationextensive-formgames,buttherehasbeenapuz-zlinggapbetweentheoryandpractice.First-ordermethodshavesigni\ufb01cantlybettertheoreticalconvergenceratesthananycounterfactual-regretminimization(CFR)variant.Despitethis,CFRvariantshavebeenfavoredinpractice.Experimentswith\ufb01rst-ordermethodshaveonlybeenconductedonsmall-andmedium-sizedgamesbecausethosemethodsarecomplicatedtoimplementinthissetting,andbecauseCFRvariantshavebeenenhancedextensivelyforoveradecadetheyperformwellinpractice.Inthispaperweshowthataparticular\ufb01rst-ordermethod,astate-of-the-artvariantoftheexcessivegaptechnique\u2014instantiatedwiththedilatedentropydistancefunction\u2014canef\ufb01cientlysolvelargereal-worldproblemscompetitivelywithCFRanditsvariants.WeshowthisonlargeendgamesencounteredbytheLibratuspokerAI,whichrecentlybeattophumanpokerspecialistprofessionalsatno-limitTexashold\u2019em.Weshowexperimentalresultsonourvariantoftheexcessivegaptechniqueaswellasapriorversion.Weintroduceanumericallyfriendlyimplementationofthesmoothedbestresponsecomputationassociatedwith\ufb01rst-ordermethodsforextensive-formgamesolving.Wepresent,toourknowledge,the\ufb01rstGPUimplementationofa\ufb01rst-ordermethodforextensive-formgames.WepresentcomparisonsofseveralexcessivegaptechniqueandCFRvariants.1IntroductionTwo-playerzero-sumextensive-formgames(EFGs)areageneralrepresentationthatenablesonetomodelamyriadofsettingsrangingfromsecuritytobusinesstomilitarytorecreational.TheNashequilibriumsolutionconcept[22]prescribesasoundnotionofrationalplayforthissetting.Itisalsorobustinthisclassofgame:iftheopponentplayssomeotherstrategythananequilibriumstrategy,thatcanonlyhelpus.Therehasbeentremendousrecentprogressonequilibrium-\ufb01ndingalgorithmsforextensive-formzero-sumgames.However,therehasbeenavexinggapbetweenthetheoryandpracticeofequilibrium-\ufb01ndingalgorithms.Inthispaperwewillhelpclosethatgap.Itiswell-knownthatthestrategyspacesofanextensive-formgamecanbetransformedintoconvexpolytopesthatallowabilinearsaddle-pointformulation(BSPP)oftheNashequilibriumproblemasfollows[26,28,16].minx\u2208Xmaxy\u2208Yhx,Ayi=maxy\u2208Yminx\u2208Xhx,Ayi(1)32ndConferenceonNeuralInformationProcessingSystems(NeurIPS2018),Montr\u00e9al,Canada.\fProblem(1)canbesolvedinanumberofways.Earlyon,vonStengel[28]showedthatitcanbesolvedwithalinearprogram(LP)\u2014bytakingthedualoftheoptimizationproblemfacedbyoneplayer(saytheyplayer)whenholdingthestrategyofthexplayer\ufb01xed,andinjectingtheprimalx-playerconstraintsintothedualLP.Thisapproachwasusedinearlyworkonextensive-formgamesolving,uptogamesofsize105[15].GilpinandSandholm[10]coupleditwithlosslessabstractioninordertosolveRhodeIslandhold\u2019emwhichhas109nodesinthegametree.Sincethen,LPapproacheshavefallenoutoffavor.TheLPisoftentoolargeto\ufb01tinmemory,andevenwhenitdoes\ufb01ttheiterationsofthesimplexorinterior-pointmethodsusedtosolvetheLPtaketoolong\u2014evenifonlymodestaccuracyisrequired.Instead,modernworkonsolvingthisgameclassinthelargefocusesoniterativemethodsthatconvergetoaNashequilibriuminthelimit.Twotypesofalgorithmshavebeenpopularinparticular:regret-minimizationalgorithmsbasedoncounterfactualregretminimization(CFR)[29,20,1,5,21,4],and\ufb01rst-ordermethods(FOMs)basedoncombiningafastbilinearsaddle-pointproblem(BSPP)solversuchastheexcessivegaptechnique(EGT)[24]withanappropriatedistance-generatingfunction(DGF)forEFGstrategies[11,17,19,18].TheCFRfamilyhasbeenmostpopularinpracticesofar.TheCFR+variant[27]wasusedtonear-optimallysolveheads-uplimitTexashold\u2019em[1],agamethathas1013decisionpointsafterlosslessabstraction.CFR+wasalsousedforsubgamesolvinginLibratus[4],anAIthatbeatfourtopprofessionalheads-upno-limitTexashold\u2019empoker(HUNL)players\u2014agamethathas10161decisionpoints(beforeabstraction)[12].AvariantofCFRwasalsousedbyLibratustocomputethewhole-gamestrategy(alsoknownas\u201cblueprint\u201dstrategy)[4].AhybridofCFRandCFR+wasutilizedbyDeepStacktobeatHUNLprofessionals[21].CFR-basedalgorithmsconvergeatarateof1\u221aT,whereassomealgorithmsbasedonFOMsconvergeatarateof1T.Despitethistheoreticallysuperiorconvergencerate,FOMshavehadrelativelylittleadoptioninpractice.ComparisonsofCFR-basedalgorithmsandFOMswereconductedbyKroeretal.[17]andKroeretal.[19],wheretheyfoundthataheuristicvariantofEGTinstantiatedwithanappropriatedistancemeasureissuperiortoCFRregretmatching(RM)andCFRwithregret-matching+(RM+)forsmall-to-medium-sizedgames.Inthispaper,wepresentthe\ufb01rstexperimentsonalargegame\u2014arealgameplayedbyhumans\u2014showingthatanaggressivevariantofEGTinstantiatedwiththeDGFofKroeretal.[19]iscompetitivewiththeCFRfamilyinpractice.ItoutperformsCFRwithRM+,althoughCFR+isstillslightlyfaster.Thisisthe\ufb01rsttimethataFOMhasbeenshownsuperiortoanyCFRvariantonareal-worldproblem.WeshowthisonsubgamesencounteredbyLibratus.TheLibratusagentsolvedanabstractionofthefullgameofno-limitTexashold\u2019emaheadoftimeinordertoobtaina\u201cblueprint\u201dstrategy.Duringplay,Libratusthenre\ufb01nedthisblueprintstrategybysolvingsubgameswithsigni\ufb01cantlymoredetailedabstractionsinrealtime[4,3].OurexperimentsareonsolvingendgamesencounteredbyLibratusinthebeginningofthefourth(\u201criver\u201dinpokerlingo)bettinground,withthefull\ufb01ne-grainedabstractionactuallyusedbyLibratus.Thisabstractionhasnoabstractionofcards,thatis,themodelcapturesallaspectsofthecards.Thereisabstractionofbetsizestokeepthebranchingfactorreasonable;inourexperimentsweusetheexactfull\ufb01ne-grainedbettingabstractionthatwasusedbyLibratus.ThusweshowthatitispossibletogetthetheoreticallysuperiorguaranteeofFOMswhilealsogettingstrongpracticalperformance.Inordertomakeourapproachpractical,weintroduceanumberofpracticaltechniquesforrunningFOMsonEFGs.Inparticular,wederiveef\ufb01cientandnumericallyfriendlyexpressionsforthesmoothed-bestresponse(SBR)andproxmapping,twooptimizationsubproblemsthatEGTsolvesateveryiteration.Furthermore,weintroduceaGPU-basedvariantoftheseoperationswhichallowsustoparallelizeEGTiterations.WeshowexperimentsforseveralvariantsofbothEGTandCFR.ForEGT,weconsidertwopracticalvariants,onethathastheinitialsmoothingparametersetoptimistically,andonethatadditionallyperformsaggressivestepsizing.ForCFR,weshowexperimentalresultsforCFRwithRM,RM+,andCFR+(i.e.,CFRwithlinearaveragingandRM+).Wewilldescribethesevariantsindetailinthebodyofthepaper.WeconductedalltheexperimentsonparallelizedGPUcode.2\f2BilinearSaddle-PointProblemsThecomputationofaNashequilibriuminazero-sumimperfect-informationEFGcanbeformulatedasthefollowingbilinearsaddle-pointproblem:minx\u2208Xmaxy\u2208Yhx,Ayi=maxy\u2208Yminx\u2208Xhx,Ayi,(2)whereX,Yareconvex,compactsetsinEuclideanspacesEx,Ey.Aisthesequence-formpayoffmatrixandX,Yarethesequence-formstrategyspacesofPlayer1and2,respectively.SeveralFOMswithattractiveconvergencepropertieshavebeenintroducedforBSPPs[25,24,23,8].ThesemethodsrelyonhavingsomeappropriatedistancemeasureoverXandY,calledadistance-generatingfunction(DGF).Generally,FOMsusetheDGFtochoosesteps:givenagradientandascalarstepsize,aFOMmovesinthenegativegradientdirectionby\ufb01ndingthepointthatminimizesthesumofthegradientandoftheDGFevaluatedatthenewpoint.Inotherwords,thenextstepcanbefoundbysolvingaregularizedoptimizationproblem,wherelonggradientstepsarediscouragedbytheDGF.ForEGTonEFGs,theDGFcanbeinterpretedasasmoothingfunctionappliedtothebest-responseproblemsfacedbytheplayers.De\ufb01nition1.Adistance-generatingfunctionforXisafunctiond(x):X\u2192RwhichisconvexandcontinuousonX,admitscontinuousselectionofsubgradientsonthesetX\u25e6={x\u2208X:\u2202d(x)6=\u2205},andhasstrongconvexitymodulus\u03d5w.r.t.k\u00b7k.Distance-generatingfunctionsforYarede\ufb01nedanalogously.GivenDGFsdX,dYforX,Ywithstrongconvexitymoduli\u03d5Xand\u03d5Yrespectively,wenowdescribeEGT[24]appliedto(1).EGTformstwosmoothedfunctionsusingtheDGFsf\u00b5y(x)=maxy\u2208Yhx,Ayi\u2212\u00b5ydY,\u03c6\u00b5x(y)=minx\u2208Xhx,Ayi+\u00b5xdX.(3)Thesefunctionsaresmoothedapproximationstotheoptimizationproblemfacedbythexandyplayer,respectively.Thescalars\u00b5x,\u00b5y>0aresmoothnessparametersdenotingtheamountofsmoothingapplied.Lety\u00b5y(x)andx\u00b5x(y)refertotheyandxvaluesattainingtheoptimain(3).Thesecanbethoughtofassmoothedbestresponses.Nesterov[25]showsthatthegradientsofthefunctionsf\u00b5y(x)and\u03c6\u00b5x(y)existandareLipschitzcontinuous.ThegradientoperatorsandLipschitzconstantsare\u2207f\u00b5y(x)=a1+Ay\u00b5y(x),\u2207\u03c6\u00b5x(y)=a2+A>x\u00b5x(y),L1(cid:0)f\u00b5y(cid:1)=kAk2\u03d5Y\u00b5y,L2(\u03c6\u00b5x)=kAk2\u03d5X\u00b5x,wherekAkisthe\u20181-normoperatornorm.LettheconvexconjugateofdX:X\u2192Rbedenotedbyd\u2217X(g)=maxx\u2208XgTx\u2212d(x).Thegradient\u2207d\u2217(g)oftheconjugatethengivesthesolutiontothesmoothed-best-responseproblem.Basedonthissetup,EGTminimizesthefollowingsaddle-pointresidual,whichisequaltothesumofregretsfortheplayers.\u0001sad(xt,yt)=maxy\u2208Y(xt)TAy\u2212minx\u2208XxTAytTheideabehindEGTistomaintaintheexcessivegapcondition(EGC),EGV(x,y):=\u03c6\u00b5x(y)\u2212f\u00b5y(x)>0.TheEGCimpliesaboundonthesaddle-pointresidual:\u0001sad(xt,yt)\u2264\u00b5x\u2126X+\u00b5y\u2126Y,where\u2126X=maxx,x0dX(x)\u2212dX(x0),and\u2126Yde\ufb01nedanalogously.WeformallystateEGT[24]asAlgorithm1.TheEGTalgorithmalternatesbetweentakingstepsfocusedonXandY.Algorithm2showsasinglestepfocusedonX.Stepsfocusedonyareanalogous.Algorithm1showshowthealternatingstepsandstepsizesarecomputed,aswellashowinitialpointsareselected.Supposetheinitialvalues\u00b5x,\u00b5ysatisfy\u00b5x=\u03d5XL1(f\u00b5y).Then,ateveryiterationt\u22651ofEGT,thecorrespondingsolutionzt=[xt;yt]satis\ufb01esxt\u2208X,yt\u2208Y,theexcessivegapconditionismaintained,and\u0001sad(xT,yT)\u22644kAkT+1s\u2126X\u2126Y\u03d5X\u03d5Y.Consequently,EGThasaconvergencerateofO(1T)[24].3\fAlgorithm1EGT(DGF-centerx\u03c9,DGFweights\u00b5x,\u00b5y,and\u0001>0)1:x0=\u2207d\u2217X(cid:0)\u00b5\u22121x\u2207f\u00b5y(x\u03c9)(cid:1)2:y0=y\u00b5y(x\u03c9)3:t=04:while\u0001sad(xt,yt)>\u0001do5:\u03c4t=2t+36:iftiseventhen7:(\u00b5t+1x,xt+1,yt+1)=STEP(\u00b5tx,\u00b5ty,xt,yt,\u03c4)8:else9:(\u00b5t+1y,yt+1,xt+1)=STEP(\u00b5ty,\u00b5tx,yt,xt,\u03c4)10:t=t+111:returnxt,ytAlgorithm2STEP(\u00b5x,\u00b5y,x,y,\u03c4)1:\u02c6x=(1\u2212\u03c4)x+\u03c4x\u00b5x(y)2:y+=(1\u2212\u03c4)y+\u03c4y\u00b5y(\u02c6x)3:\u02dcx=\u2207d\u2217X(cid:16)\u2207dX(x\u00b5x(y))\u2212\u03c4(1\u2212\u03c4)\u00b5x\u2207f\u00b5y(\u02c6x)(cid:17)4:x+=(1\u2212\u03c4)x+\u03c4\u02dcx5:\u00b5+x=(1\u2212\u03c4)\u00b5x6:return\u00b5+x,x+,y+3TreeplexesHodaetal.[11]introducedthetreeplex,aclassofconvexpolytopesthatcapturesthesequence-formofthestrategyspacesinperfect-recallEFGs.De\ufb01nition2.Treeplexesarede\ufb01nedrecursively:1.Basicsets:Thestandardsimplex\u2206misatreeplex.2.Cartesianproduct:IfQ1,...,Qkaretreeplexes,thenQ1\u00d7\u00b7\u00b7\u00b7\u00d7Qkisatreeplex.3.Branching:GivenatreeplexP\u2286[0,1]p,acollectionoftreeplexesQ={Q1,...,Qk}whereQj\u2286[0,1]nj,andl={l1,...,lk}\u2286{1,...,p},thesetde\ufb01nedbyPlQ:=n(x,y1,...,yk)\u2208Rp+Pjnj:x\u2208P,y1\u2208xl1\u00b7Q1,...,yk\u2208xlk\u00b7Qkoisatreeplex.WesayxljisthebranchingvariableforthetreeplexQj.Oneinterpretationofthetreeplexisasasetofsimplexes,whereeachsimplexisweightedbythevalueofthevariableaboveitintheparentbranchingoperation(or1ifthereisnobranchingoperationprecedingthesimplex).Thusthesimplexesgenerallysumtothevalueoftheparentratherthan1.ForatreeplexQ,wedenotebySQtheindexsetofthesetofsimplexescontainedinQ(inanEFGSQisthesetofinformationsetsbelongingtotheplayer).Foreachj\u2208SQ,thetreeplexrootedatthej-thsimplex\u2206jisreferredtoasQj.Givenvectorq\u2208Qandsimplex\u2206j,weletIjdenotethesetofindicesofqthatcorrespondtothevariablesin\u2206jandde\ufb01neqjtobethesubvectorofqcorrespondingtothevariablesinIj.Foreachsimplex\u2206jandbranchi\u2208Ij,thesetDijrepresentsthesetofindicesofsimplexesreachedimmediatelyafter\u2206jbytakingbranchi(inanEFG,Dijisthesetofpotentialnext-stepinformationsetsfortheplayer).Givenavectorq\u2208Q,simplex\u2206j,andindexi\u2208Ij,eachchildsimplex\u2206kforeveryk\u2208Dijisscaledbyqi.Foragivensimplex\u2206j,weletpjdenotetheindexinqoftheparentbranchingvariableqpjscaling\u2206j.Weusetheconventionthatqpj=1ifQissuchthatnobranchingoperationprecedes\u2206j.Foreachj\u2208SQ,djisthemaximumdepthofthetreeplexrootedat\u2206j,thatis,themaximumnumberofsimplexesreachablethroughaseriesofbranchingoperationsat\u2206j.ThendQgivesthedepthofQ.WeusebjQtoidentifythenumberofbranchingoperationsprecedingthej-thsimplexinQ.WesaythatasimplexjsuchthatbjQ=0isarootsimplex.Figure1illustratesanexampletreeplexQ.ThistreeplexQisconstructedfromninetwo-to-three-dimensionalsimplexes\u22061,...,\u22069.Atlevel1,wehavetworootsimplexes,\u22061,\u22062,obtainedbyaCartesianproductoperation(denotedby\u00d7).Wehavemaximumdepthsd1=2,d2=1beneaththem.Sincetherearenoprecedingbranchingoperations,theparentvariablesforthesesimplexes\u22061and\u22062areqp1=qp2=1.For\u22061,thecorrespondingsetofindicesinthevectorqisI1={1,2},whilefor\u22062wehaveI2={3,4,5}.Atlevel2,wehavethesimplexes\u22063,...,\u22067.Theparentvariableof\u22063isqp3=q1;therefore,\u22063isscaledbytheparentvariableqp3.Similarly,eachofthesimplexes4\f\u22063,...,\u22067isscaledbytheirparentvariablesqpjthatthebranchingoperationwasperformedon.Soonfor\u22068and\u22069aswell.Thenumberofbranchingoperationsrequiredtoreachsimplexes\u22061,\u22063and\u22068isb1Q=0,b3Q=1andb8Q=2,respectively.\u22061q2\u00b7\u22064q7q8q1\u00b7\u22063q6\u00b7\u22068q16q17q6\u00b7\u22067q13q14q15q5q6q1q2\u22062q4\u00b7\u22066q11q12q3\u00b7\u22065q9q10q3q4\u00d7\u00d7Figure1:Anexampletreeplexconstructedfrom9simplexes.Cartesianproductoperationisdenotedby\u00d7.4SmoothedBestResponsesLetdj(x)=Pi\u2208Ijxilogxi+lognbetheentropyDGFforthen-dimensionalsimplex\u2206n,wherenisthedimensionofthej\u2019thsimplexinQ.Kroeretal.[19]introducedthefollowingDGFforQbydilatingdsforeachsimplexinSQandtaketheirsum:d(q)=Pj\u2208SQ\u03b2jqpjdj(cid:16)qjqpj(cid:17),where\u03b2j=2+Pk\u2208Dj2\u03b2k.OtherdilatedDGFsfortreeplexeswereintroducedbyHodaetal.[11]andwerealsostudiedbyKroeretal.[17].Kroeretal.[19]provedthatthisDGFisstronglyconvexmodulus1MwhereMisthemaximumvalueofthe\u20181normoverQ.EGTinstantiatedwiththisDGFconvergesatarateofLM22dlognTwhereListhemaximumentryinthepayoffmatrix,disthedepthofthetreeplex,andnisthemaximumdimensionofanyindividualsimplex.Wenowshowhowtosolve(3)forthisparticularDGF.WhileitisknownthatthisDGFhasaclosed-formsolution,thisisthe\ufb01rsttimetheapproachhasbeenshowninapaper.Furthermore,webelievethatourparticularsolutionisnovel,andleadstobettercontrolovernumericalissues.Theproblemwewishtosolveisthefollowing.argminXj\u2208SQhqj,gji+\u03b2jqpjdj(qj/qpj)=argminXj\u2208SQqpj(h\u00afqj,gji+\u03b2jdj(\u00afqj))(4)wheretheequalityfollowsbythefactthatqi=qpj\u00afqi.Foraleafsimplexj,itscorrespondingterminthesummationhasnodependenceonanyotherpartofthegametreeexceptforthemultiplicationbyxpj(becausenoneofitsvariablesareparenttoanyothersimplex).Becauseofthislackofdependence,theexpressionh\u00afqj/qpj,gji+\u03b2jdj(qj/qpj)canbeminimizedindependentlyasifitwereanoptimizationproblemoverasimplexwithvariables\u00afqj=xj/qpj(thiswasalsopointedoutinProposition3.4inHodaetal.[11]).Weshowhowtosolvetheoptimizationproblemataleaf:min\u00afqj\u2208\u2206jh\u00afqj,gji+\u03b2jdj(\u00afqj).WritingtheLagrangianwithrespecttothesimplexconstraintandtakingthederivativewrt.\u00afqigivesmin\u00afqjh\u00afqj,gji+\u03b2jdj(\u00afqj)+\u03bb(1\u2212Xi\u2208Ij\u00afqi)\u21d2gi+\u03b2j(1+log\u00afqi)=\u03bb\u21d2\u00afqi\u221de\u2212gi/\u03b2jThisshowshowtosolvethesmoothed-best-responseproblemataleaf.Foraninternalsimplexj,Proposition3.4ofHodaetal.[11]saysthatwecansimplycomputethevalueatallsimplexesbelowj,addthevaluetogj(thisiseasilyseenfrom(4);eachqiactsasascalaronthevalueofallsimplexes5\fafteri),andproceedbyinduction.Letting|Ij|=n,wenowsimplifytheobjectivefunction:h\u00afqj,gji+\u03b2j(Xi\u2208Ij(\u00afqilog\u00afqi)+logn)=Xi(\u00afqi(gi+\u03b2jlog\u00afqi))+\u03b2jlogn=Xi(\u00afqi(\u03bb\u2212\u03b2j))+\u03b2jlogn=\u03bb\u2212\u03b2j+\u03b2jlogn,wherethelasttwoequalitiesfollow\ufb01rstbyapplyingourderivationfor\u03bbandthenthefactthat\u00afqjsumstoone.Thisshowsthatwecanchooseanarbitraryindexi\u2208Ijandpropagatethevaluegi+\u03b2jlog\u00afqi+\u03b2jlogn.Inparticular,fornumericalreasonswechoosetheonethatmaximizes\u00afqi.Inadditiontosmoothedbestresponses,fastFOMsusuallyalsorequirecomputationofproximalmappings,whicharesolutionstoargminq\u2208Qhq,gi+D(qkq0),whereD(qkq0)=d(q)\u2212d(q0)\u2212h\u2207d(q0),q\u2212q0iistheBregmandivergenceassociatedwiththechosenDGFd.Unlikethesmoothedbestresponse,weareusuallyonlyinterestedintheminimizingsolutionandnottheassociatedvalue.Thereforewecandroptermsthatdonotdependonqandtheproblemreducestoargminq\u2208Qhq,gi+d(q)\u2212h\u2207d(q0),qi,whichcanbesolvedwithoursmoothedbestresponseapproachbyusingtheshiftedgradient\u02dcg=g\u2212\u2207d(q0).Thishasonepotentialnumericalpitfall:theDGF-gradient\u2207d(q0)maybeunstableneartheboundaryofQ,forexamplebecausetheentropyDGF-gradientrequirestakinglogarithms.Itispossibletoderiveaseparateexpressionfortheproximalmappingthatissimilartowhatwedidforthesmoothedbestresponse;thisexpressioncanhelpavoidthisissue.However,becauseweonlycareaboutgettingtheoptimalsolution,notthevalueassociatedwithit,thisisnotnecessary.Thelargegradientsneartheboundaryonlyaffectthesolutionbysettingbadactionstooclosetozero,whichdoesnotseemtoaffectperformance.5PracticalEGTRatherthantheoverlyconservativestepsizeand\u00b5parameterssuggestedinthetheoryforEGTweusemorepracticalvariantscombiningpracticaltechniquesfromKroeretal.[19]andHodaetal.[11].ThepseudocodeisshowninAlgorithm3.AsinKroeretal.[19]weuseapractically-tunedinitialchoicefortheinitialsmoothingparameters\u00b5.Furthermore,ratherthanalternatingthestepsonplayers1and2,wealwayscallSTEPontheplayerwithahigher\u00b5value(thischoiceissomewhatreminiscentofthe\u00b5-balancingheuristicemployedbyHodaetal.[11]althoughourapproachavoidsanadditional\ufb01ttingstep).TheEGTalgorithmwithapractically-tuned\u00b5andthis\u00b5balancingheuristicwillbedenotedEGTinourexperiments.Inaddition,weuseanEGTvariantthatemploystheaggressive\u00b5reductiontechniqueintroducedbyHodaetal.[11].Aggressive\u00b5reductionusestheobservationthattheoriginalEGTstepsizechoices,whichare\u03c4=23+t,arechosentoguaranteetheexcessivegapcondition,butmaybeoverlyconservative.Instead,aggressive\u00b5reductionsimplymaintainssomecurrent\u03c4,initiallysetto0.5,andtriestoapplythesamestepsize\u03c4repeatedly.Aftereverystep,wecheckthattheexcessivegapconditionstillholds;ifitdoesnotholdthenwebacktrack,\u03c4isdecreased,andwerepeattheprocess.A\u03c4thatmaintainstheconditionisalwaysguaranteedtoexistbyTheorem2ofNesterov[24].ThepseudocodeforthisisgiveninAlgorithm4.EGTwithaggressive\u00b5reduction,apracticallytunedinitial\u00b5,and\u00b5balancing,willbedenotedEGT/ASinourexperiments.6AlgorithmImplementationTocomputesmoothedbestresponses,weuseaparallelizationscheme.WeparallelizeacrosstheinitialCartesianproductoftreeplexesattheroot.AslongasthisCartesianproductiswideenough,thesmoothedbestresponsecomputationwilltakefulladvantageofparallelization.Thisisacommonstructureinreal-worldproblems,forexamplerepresentingthestartinghandinpoker,orsomestochasticprivatestateofeachplayerinotherapplications.Thisparallelizationschemealsoworksforgradientcomputationbasedontreetraversal.However,inthispaperwedogradientcomputationbywritingdownasparsepayoffmatrixusingCUDA\u2019ssparselibraryandletCUDAparallelizethegradientcomputation.Forpoker-speci\ufb01capplications(andcertainothergameswhereutilitiesdecomposenicelybasedonprivateinformation)itispossibletospeedupthegradientcomputationsubstantiallybyemployingthe6\fAlgorithm3EGT/AS(DGF-centerx\u03c9,DGFweights\u00b5x,\u00b5y,and\u0001>0)1:x0=\u2207d\u2217X(cid:0)\u00b5\u22121x\u2207f\u00b5y(x\u03c9)(cid:1)2:y0=y\u00b5y(x\u03c9)3:t=04:\u03c4=125:while\u0001sad(xt,yt)>\u0001do6:if\u00b5x>\u00b5ythen7:(\u00b5t+1x,xt+1,yt+1,\u03c4)=DECR(\u00b5tx,\u00b5ty,xt,yt,\u03c4)8:else9:(\u00b5t+1y,yt+1,xt+1,\u03c4)=DECR(\u00b5ty,\u00b5tx,yt,xt,\u03c4)10:t=t+111:returnxt,ytAlgorithm4DECR(\u00b5x,\u00b5y,x,y,\u03c4)1:(\u00b5+x,x+,y+)=STEP(\u00b5x,\u00b5y,x,y,\u03c4)2:whileEGV(x,y)<0do3:\u03c4=12\u03c44:(\u00b5+x,x+,y+)=STEP(\u00b5x,\u00b5y,x,y,\u03c4)5:return\u00b5+xxt,yt,\u03c4acceleratedtreetraversalofJohansonetal.[13].Wedidnotusethistechnique.Inourexperiments,themajorityoftimeisspentingradientcomputation,sothisaccelerationislikelytoaffectallthetestedalgorithmsequally.Furthermore,sincethetechniqueisspeci\ufb01ctogameswithcertainstructures,ourexperimentsgiveabetterestimateofgeneralEFG-solvingperformance.7ExperimentsWenowpresentexperimentalresultsonrunningallthepreviouslydescribedalgorithmsonaGPU.AllexperimentswererunonaGoogleCloudinstancewithanNVIDIATeslaK80GPUwith12GBavailable.AllcodewasimplementedinC++usingCUDAforGPUoperations,andcuSPARSEforthesparsepayoffmatrix.WecompareagainstseveralCFRvariants.1WerunCFRwithRM(CFR(RM)),RM+(CFR(RM+)),andCFR+whichisCFRwithRM+andalinearaveragingscheme.Wenowdescribethesevariants.DetaileddescriptionscanalsobefoundinZinkevichetal.[29]andTammelinetal.[27].Ourexperimentsareconductedonreallarge-scale\u201criver\u201dendgamesfacedbytheLibratusAI[4].Libratuswascreatedforthegameofheads-upno-limitTexashold\u2019em.Libratuswasconstructedby\ufb01rstcomputinga\u201cblueprint\u201dstrategyforthewholegame(basedonabstractionandMonte-CarloCFR[20]).Then,duringplay,Libratuswouldsolveendgamesthatarereachedusingasigni\ufb01cantly\ufb01ner-grainedabstraction.Inparticular,thoseendgameshavenocardabstraction,andtheyhavea\ufb01ne-grainedbettingabstraction.Forthebeginningofthesubgame,theblueprintstrategygivesaconditionaldistributionoverhandsforeachplayer.ThesubgameisconstructedbyhavingaChancenodedealouthandsaccordingtothisconditionaldistribution.2Asubgameisstructuredandparameterizedasfollows.Thegameisparameterizedbytheconditionaldistributionoverhandsforeachplayer,currentpotsize,boardstate(5cardsdealttotheboard),andabettingabstraction.First,Chancedealsouthandstothetwoplayersaccordingtotheconditionalhanddistribution.Then,Libratushasthechoiceoffolding,checking,orbettingbyanumberofmultipliersofthepotsize:0.25x,0.5x,1x,2x,4x,8x,andall-in.IfLibratuschecksandtheotherplayerbetsthenLibratushasthechoiceoffolding,calling(i.e.matchingthebetandendingthebetting),orraisingbypotmultipliers0.4x,0.7x,1.1x,2x,andall-in.IfLibratusbetsandtheotherplayerraisesLibratuscanfold,call,orraiseby0.4x,0.7x,2x,andall-in.FinallywhenfacingsubsequentraisesLibratuscanfold,call,orraiseby0.7xandall-in.Whenfacedwithaninitialcheck,theopponentcanfold,check,orraiseby0.5x,0.75x,1x,andall-in.Whenfacedwithaninitialbettheopponentcanfold,call,orraiseby0.7x,1.1x,andall-in.Whenfacedwithsubsequentraisestheopponentcanfold,call,orraiseby0.7xandall-in.Thegameendswheneveraplayerfolds(theotherplayerwinsallmoneyinthepot),calls(ashowdownoccurs),orbothplayerscheckastheir\ufb01rstactionofthegame(ashowdownoccurs).Inashowdowntheplayerwiththebetterhandswinsthepot.Thepotissplitincaseofatie.(ForourexperimentsweusedendgameswhereitisLibratus\u2019sturntomove\ufb01rst.)1Allvariantsusethealternatingupdatesscheme.2Libratususedtwodifferentsubgame-solvingtechniques,one\u201cunsafe\u201dandone\u201csafe\u201d[3].Thecomputationalprobleminthetwoisessentiallyidentical.Weexperimentwiththe\u201cunsafe\u201dversion,whichusesthepriordistributionsdescribedhere.7\f10010110210310410\u2212310\u2212210\u22121100101102103Iteration\u0001(regretsum)[mbb]Endgame2CFR+EGTEGT/ASCFR(RM)CFR(RM+)10010110210310410\u2212310\u2212210\u22121100101102103Iteration\u0001(regretsum)[mbb]Endgame7CFR+EGTEGT/ASCFR(RM)CFR(RM+)Figure2:Solutionqualityasafunctionofthenumberofiterationsforallalgorithmsontworiversubgames.Thesolutionqualityisgivenasthesumofregretsfortheplayersinmilli-big-blinds.WeconductedexperimentsontworiverendgamesextractedfromLibratusplay:Endgame2andEndgame7.Endgame2hasapotofsize2100atthebeginningoftheriverendgame.Ithasdimension140kand144kforLibratusandtheopponent,respectively,and176Mleavesinthegamestree.Endgame7hasapotofsize$3750atthebeginningoftheriversubgame.Ithasdimension43kand86kfortheplayers,and54Mleaves.Inthe\ufb01rstsetofexperimentswelookattheper-iterationperformanceofeachalgorithm.TheresultsareshowninFigure2.They-axisshowsthesumoftheregretsforeachplayer,thatis,howmuchutilitytheycangainbyplayingabestresponseratherthantheircurrentstrategy.Theunitismilli-big-blinds(mbb);atthebeginningoftheoriginalpokergame,Libratus,asthe\u201cbigblind\u201d,putin$100andtheopponentputin$50,inordertoinducebetting.Mbbisathousandthofthebigblindvalue,thatis,10cents.Thisisastandardunitusedinresearchthatusespokergamesforevaluation.Onembbisoftenconsideredtheconvergencegoal.CFR+andEGT/ASperformthebest;bothreachthegoalof1mbbafterabout400iterationsinbothEndgame2and7.EGT,CFR(RM),andCFR(RM+)alltakeabout3000iterationstoreach1mbbinEndgame7.InEndgame2,EGTisslowest,althoughtheslopeissteeperthanforCFR(RM)andCFR(RM+).WesuspectthatbetterinitializationofEGTcouldleadtoitbeatingbothalgorithms.NotealsothatEGTwasshownbetterthanCFR(RM)andCFR(RM+)byKroeretal.[19]inthesmallergameofLeduchold\u2019emwithanautomated\u00b5-tuningapproach.Theirresultsfurthersuggestthatbetterinitializationmayhelpenhanceconvergespeedsigni\ufb01cantly.Oneissuewithper-iterationconvergenceratesisthatthealgorithmsdonotperformthesameamountofworkperiteration.AllCFRvariantsinourexperimentscompute2gradientsperiteration,whereasEGTcomputes3,andEGT/AScomputes4(theadditionalgradientcomputationisneededinordertoevaluatetheexcessivegap).Furthermore,EGT/ASmayuseadditionalgradientcomputationsiftheexcessivegapcheckfailsandasmaller\u03c4istried(inourexperimentsabout15adjustmentswereneeded).Inoursecondsetofplots,weshowtheconvergencerateasafunctionofthetotalnumberofgradientcomputationsperformedbythealgorithm.ThisisshowninFigure3.Bythismeasure,EGT/ASandEGTperformslightlyworserelativetotheirperformanceasmeasuredbyiterationcount.Inparticular,CFR+takesabout800gradientcomputationsinordertoreach1mbbineithergame,whereasEGT/AStakesabout1800.InourexperimentsCFR+vastlyoutperformsitstheoreticalconvergencerate(infact,everyCFRvariantdoessigni\ufb01cantlybetterthanthetheorypredicts,butCFR+especiallyso).However,CFR+isknowntoeventuallyreachapointwhereitslowsdownandperformsworsethan1T.InourexperimentswestarttoseeCFR+slowingdowntowardstheendofEndgame7.EGT,incontrast,isguaranteedtomaintainarateof1T,andsomaybepreferableifaguaranteeagainstslowdownisdesiredorhighprecisionisneeded.8ConclusionsandFutureResearchWeintroducedapracticalvariantoftheEGTalgorithmthatemploysaggressivestepsizes,\u00b5balancing,anumerically-friendlysmoothed-best-responsealgorithm,parallelizationviaCartesianproductoperationsattherootofthestrategytreeplex,andaGPUimplementation.Weshowedforthe\ufb01rsttime,viaexperimentsonreallarge-scaleLibratusendgames,thatFOMs(withthedilatedentropy8\f10110210310410510\u2212310\u2212210\u22121100101102103Gradientcomputations\u0001(regretsum)[mbb]Endgame2CFR+EGTEGT/ASCFR(RM)CFR(RM+)10110210310410510\u2212310\u2212210\u22121100101102103Gradientcomputations\u0001(regretsum)[mbb]Endgame7CFR+EGTEGT/ASCFR(RM)CFR(RM+)Figure3:Solutionqualityasafunctionofthenumberofgradientcomputationsforallalgorithmsontworiversubgames.Thesolutionqualityisgivenasthesumofregretsfortheplayersinmilli-big-blinds.DGF)arecompetitivewiththeCFRfamilyofalgorithms.Speci\ufb01cally,theyoutperformtheotherCFRvariantsandarecloseinef\ufb01ciencytoCFR+.OurbestvariantofEGTcansolvesubgamestothedesiredaccuracyataspeedthatiswithinafactoroftwoofCFR+.OurresultssuggestthatitmaybepossibletomakeFOMsfasterthanCFR+.Forexample,wedidnotspendmuchefforttuningtheparametersofEGT,andtuningthemwouldmakethealgorithmevenmoreef\ufb01cient.Second,weonlyinvestigatedEGT,whichhasbeenmostpopularFOMinEFGsolving.However,itispossiblethatotherFOMssuchasmirrorprox[23]ortheprimal-dualalgorithmbyChambolleandPock[8]couldbemadeevenfaster.Furthermore,stochasticFOMs(i.e.,oneswherethegradientisapproximatedbysamplingtomakethegradientcomputationdramaticallyfaster)couldbeinvestigatedaswell.Kroeretal.[17]triedthisusingstochasticmirrorprox[14]withoutpracticalsuccess,butitislikelythatthisapproachcouldbemadebetterwithmoreengineering.ItwouldalsobeinterestingtocompareourEGTapproachtoCFRalgorithmsforcomputingequi-libriumre\ufb01nements,forexampleintheapproximateextensive-formperfectequilibriummodelinvestigatedbyKroeretal.[18]andFarinaetal.[9].Pruningtechniques(fortemporarilyskippingpartsofthegametreeonsomeiterations)havebeenshowneffectiveforbothCFRandEGT-likealgorithms,andcouldpotentiallybeincorporatedaswell[20,6,2].Finally,whileEGT,aswellasotherFOM-basedapproachestocomputingzero-sumNashequilibria,arenotapplicabletothecomputationofgeneral-sumNashequilibriaintheorytheycouldstillbeappliedtothecomputationofstrategiesinpractice(gradientscanstillbecomputed,andsothesmoothedbestresponsesandcorrespondingstrategyupdatesarestillwell-de\ufb01ned).ForCFRtheanalogousapproachseemstoperformreasonablywell[7],andyoumightexpectthesamefromFOMssuchasEGT.AcknowledgmentsThismaterialisbasedonworksupportedbytheNationalScienceFoundationundergrantsIIS-1718457,IIS-1617590,andCCF-1733556,andtheAROunderawardW911NF-17-1-0082.ChristianKroerissupportedbyaFacebookFellowship.References[1]M.Bowling,N.Burch,M.Johanson,andO.Tammelin.Heads-uplimithold\u2019empokerissolved.Science,347(6218),Jan.2015.[2]N.BrownandT.Sandholm.Reducedspaceandfasterconvergenceinimperfect-informationgamesviapruning.InInternationalConferenceonMachineLearning(ICML),2017.[3]N.BrownandT.Sandholm.Safeandnestedsubgamesolvingforimperfect-informationgames.InProceedingsoftheAnnualConferenceonNeuralInformationProcessingSystems(NIPS),pages689\u2013699,2017.[4]N.BrownandT.Sandholm.SuperhumanAIforheads-upno-limitpoker:Libratusbeatstopprofessionals.Science,pageeaao1733,Dec.2017.9\f[5]N.Brown,S.Ganzfried,andT.Sandholm.Hierarchicalabstraction,distributedequilibriumcomputation,andpost-processing,withapplicationtoachampionno-limitTexasHold\u2019emagent.InInternationalConferenceonAutonomousAgentsandMulti-AgentSystems(AAMAS),2015.[6]N.Brown,C.Kroer,andT.Sandholm.Dynamicthresholdingandpruningforregretminimiza-tion.InAAAIConferenceonArti\ufb01cialIntelligence(AAAI),2017.[7]J.\u02c7Cerm\u00e1k,B.Bo\u0161ansk`y,andN.Gatti.Strategyeffectivenessofgame-theoreticalsolutionconceptsinextensive-formgeneral-sumgames.InAutonomousAgentsandMulti-AgentSystems,pages1813\u20131814.InternationalFoundationforAutonomousAgentsandMultiagentSystems,2015.[8]A.ChambolleandT.Pock.A\ufb01rst-orderprimal-dualalgorithmforconvexproblemswithapplicationstoimaging.JournalofMathematicalImagingandVision,2011.[9]G.Farina,C.Kroer,andT.Sandholm.Regretminimizationinbehaviorally-constrainedzero-sumgames.InInternationalConferenceonMachineLearning(ICML),2017.[10]A.GilpinandT.Sandholm.Losslessabstractionofimperfectinformationgames.JournaloftheACM,54(5),2007.[11]S.Hoda,A.Gilpin,J.Pe\u00f1a,andT.Sandholm.SmoothingtechniquesforcomputingNashequilibriaofsequentialgames.MathematicsofOperationsResearch,35(2),2010.[12]M.Johanson.Measuringthesizeoflargeno-limitpokergames.Technicalreport,UniversityofAlberta,2013.[13]M.Johanson,K.Waugh,M.Bowling,andM.Zinkevich.Acceleratingbestresponsecalculationinlargeextensivegames.InProceedingsoftheInternationalJointConferenceonArti\ufb01cialIntelligence(IJCAI),2011.[14]A.Juditsky,A.Nemirovski,andC.Tauvel.Solvingvariationalinequalitieswithstochasticmirror-proxalgorithm.StochasticSystems,1(1):17\u201358,2011.[15]D.KollerandA.Pfeffer.Representationsandsolutionsforgame-theoreticproblems.Arti\ufb01cialIntelligence,94(1):167\u2013215,July1997.[16]D.Koller,N.Megiddo,andB.vonStengel.Ef\ufb01cientcomputationofequilibriaforextensivetwo-persongames.GamesandEconomicBehavior,14(2),1996.[17]C.Kroer,K.Waugh,F.K\u0131l\u0131n\u00e7-Karzan,andT.Sandholm.Faster\ufb01rst-ordermethodsforextensive-formgamesolving.InProceedingsoftheACMConferenceonEconomicsandComputation(EC),2015.[18]C.Kroer,G.Farina,andT.Sandholm.Smoothingmethodforapproximateextensive-formper-fectequilibrium.InProceedingsoftheInternationalJointConferenceonArti\ufb01cialIntelligence(IJCAI),2017.[19]C.Kroer,K.Waugh,F.K\u0131l\u0131n\u00e7-Karzan,andT.Sandholm.Theoreticalandpracticaladvancesonsmoothingforextensive-formgames.arXivpreprintarXiv:1702.04849,2017.[20]M.Lanctot,K.Waugh,M.Zinkevich,andM.Bowling.MonteCarlosamplingforregretmini-mizationinextensivegames.InProceedingsoftheAnnualConferenceonNeuralInformationProcessingSystems(NIPS),2009.[21]M.Morav\u02c7c\u00edk,M.Schmid,N.Burch,V.Lis\u00fd,D.Morrill,N.Bard,T.Davis,K.Waugh,M.Johanson,andM.Bowling.Deepstack:Expert-levelarti\ufb01cialintelligenceinheads-upno-limitpoker.Science,356(6337),May2017.[22]J.Nash.Equilibriumpointsinn-persongames.ProceedingsoftheNationalAcademyofSciences,36:48\u201349,1950.10\f[23]A.Nemirovski.Prox-methodwithrateofconvergenceO(1/t)forvariationalinequalitieswithLipschitzcontinuousmonotoneoperatorsandsmoothconvex-concavesaddlepointproblems.SIAMJournalonOptimization,15(1),2004.[24]Y.Nesterov.Excessivegaptechniqueinnonsmoothconvexminimization.SIAMJournalofOptimization,16(1),2005.[25]Y.Nesterov.Smoothminimizationofnon-smoothfunctions.MathematicalProgramming,103,2005.[26]I.Romanovskii.Reductionofagamewithcompletememorytoamatrixgame.SovietMathematics,3,1962.[27]O.Tammelin,N.Burch,M.Johanson,andM.Bowling.Solvingheads-uplimitTexashold\u2019em.InProceedingsofthe24thInternationalJointConferenceonArti\ufb01cialIntelligence(IJCAI),2015.[28]B.vonStengel.Ef\ufb01cientcomputationofbehaviorstrategies.GamesandEconomicBehavior,14(2):220\u2013246,1996.[29]M.Zinkevich,M.Bowling,M.Johanson,andC.Piccione.Regretminimizationingameswithincompleteinformation.InProceedingsoftheAnnualConferenceonNeuralInformationProcessingSystems(NIPS),2007.11\f", "award": [], "sourceid": 466, "authors": [{"given_name": "Christian", "family_name": "Kroer", "institution": "Faceook, Core Data Science"}, {"given_name": "Gabriele", "family_name": "Farina", "institution": "Carnegie Mellon University"}, {"given_name": "Tuomas", "family_name": "Sandholm", "institution": "Carnegie Mellon University"}]}