{"title": "Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games", "book": "Advances in Neural Information Processing Systems", "page_first": 154, "page_last": 162, "abstract": "We study a general adversarial online learning problem, in which we are given a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X. At each iteration, we choose an action from X', based on the observed sequence of previous rewards. Our goal is to minimize regret, defined as the gap between the realized reward and the reward of the best fixed action in hindsight. Using results from infinite dimensional convex analysis, we generalize the method of Dual Averaging (or Follow the Regularized Leader) to our setting and obtain upper bounds on the worst-case regret that generalize many previous results. Under the assumption of uniformly continuous rewards, we obtain explicit regret bounds in a setting where the decision set is the set of probability distributions on a compact metric space S. Importantly, we make no convexity assumptions on either the set S or the reward functions. We also prove a general lower bound on the worst-case regret for any online algorithm. We then apply these results to the problem of learning in repeated two-player zero-sum games on compact metric spaces. In doing so, we first prove that if both players play a Hannan-consistent strategy, then with probability 1 the empirical distributions of play weakly converge to the set of Nash equilibria of the game. We then show that, under mild assumptions, Dual Averaging on the (infinite-dimensional) space of probability distributions indeed achieves Hannan-consistency.", "full_text": "Minimizing Regret on Re\ufb02exive Banach Spaces and\nNash Equilibria in Continuous Zero-Sum Games\n\nMaximilian Balandat, Walid Krichene, Claire Tomlin, Alexandre Bayen\n\nElectrical Engineering and Computer Sciences, UC Berkeley\n\n[balandat,walid,tomlin]@eecs.berkeley.edu, bayen@berkeley.edu\n\nAbstract\n\nWe study a general adversarial online learning problem, in which we are given a\ndecision set X in a re\ufb02exive Banach space X and a sequence of reward vectors\nin the dual space of X. At each iteration, we choose an action from X , based on\nthe observed sequence of previous rewards. Our goal is to minimize regret. Using\nresults from in\ufb01nite dimensional convex analysis, we generalize the method of\nDual Averaging to our setting and obtain upper bounds on the worst-case regret that\ngeneralize many previous results. Under the assumption of uniformly continuous\nrewards, we obtain explicit regret bounds in a setting where the decision set is the\nset of probability distributions on a compact metric space S. Importantly, we make\nno convexity assumptions on either S or the reward functions. We also prove a\ngeneral lower bound on the worst-case regret for any online algorithm. We then\napply these results to the problem of learning in repeated two-player zero-sum\ngames on compact metric spaces. In doing so, we \ufb01rst prove that if both players play\na Hannan-consistent strategy, then with probability 1 the empirical distributions\nof play weakly converge to the set of Nash equilibria of the game. We then show\nthat, under mild assumptions, Dual Averaging on the (in\ufb01nite-dimensional) space\nof probability distributions indeed achieves Hannan-consistency.\n\n1\n\nIntroduction\n\nRegret analysis is a general technique for designing and analyzing algorithms for sequential decision\nproblems in adversarial or stochastic settings (Shalev-Shwartz, 2012; Bubeck and Cesa-Bianchi,\n2012). Online learning algorithms have applications in machine learning (Xiao, 2010), portfolio\noptimization (Cover, 1991), online convex optimization (Hazan et al., 2007) and other areas. Regret\nanalysis also plays an important role in the study of repeated play of \ufb01nite games (Hart and Mas-\nColell, 2001). It is well known, for example, that in a two-player zero-sum \ufb01nite game, if both\nplayers play according to a Hannan-consistent strategy (Hannan, 1957), their (marginal) empirical\ndistributions of play almost surely converge to the set of Nash equilibria of the game (Cesa-Bianchi\nand Lugosi, 2006). Moreover, it can be shown that playing a strategy that achieves sublinear regret\nalmost surely guarantees Hannan-consistency.\nA natural question then is whether a similar result holds for games with in\ufb01nite action sets. In this\narticle we provide a positive answer. In particular, we prove that in a continuous two-player zero sum\ngame over compact (not necessarily convex) metric spaces, if both players follow a Hannan-consistent\nstrategy, then with probability 1 their empirical distributions of play weakly converge to the set of\nNash equilibria of the game. This in turn raises another important question: Do algorithms that\nensure Hannan-consistency exist in such a setting? More generally, can one develop algorithms that\nguarantee sub-linear growth of the worst-case regret? We answer these questions af\ufb01rmatively as well.\nTo this end, we develop a general framework to study the Dual Averaging (or Follow the Regularized\nLeader) method on re\ufb02exive Banach spaces. This framework generalizes a wide range of existing\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fresults in the literature, including algorithms for online learning on \ufb01nite sets (Arora et al., 2012) and\n\ufb01nite-dimensional online convex optimization (Hazan et al., 2007).\nGiven a convex subset X of a re\ufb02exive Banach space X, the generalized Dual Averaging (DA)\nmethod maximizes, at each iteration, the cumulative past rewards (which are elements of X\u2217, the dual\nspace of X) minus a regularization term h. We show that under certain conditions, the maximizer in\nthe DA update is the Fr\u00e9chet gradient Dh\u2217 of the regularizer\u2019s conjugate function. In doing so, we\ndevelop a novel characterization of the duality between essential strong convexity of h and essential\nFr\u00e9chet differentiability of h\u2217 in re\ufb02exive Banach spaces, which is of independent interest.\nWe apply these general results to the problem of minimizing regret when the rewards are uniformly\ncontinuous functions over a compact metric space S. Importantly, we do not assume convexity of\neither S or the rewards, and show that it is possible to achieve sublinear regret under a mild geometric\ncondition on S (namely, the existence of a locally Q-regular Borel measure). We provide explicit\nbounds for a class of regularizers, which guarantee sublinear worst-case regret. We also prove a\ngeneral lower bound on the regret for any online algorithm and show that DA asymptotically achieves\nthis bound up to a \u221alog t factor.\nOur results are related to work by Lehrer (2003) and Sridharan and Tewari (2010); Srebro et al.\n(2011). Lehrer (2003) gives necessary geometric conditions for Blackwell approachability in in\ufb01nite-\ndimensional spaces, but no implementable algorithm guaranteeing Hannan-consistency. Sridharan\nand Tewari (2010) derive general regret bounds for Mirror Descent (MD) under the assumption that\nthe strategy set is uniformly bounded in the norm of the Banach space. We do not make such an\nassumption here. In fact, this assumption does not hold in general for our applications in Section 3.\nThe paper is organized as follows: In Section 2 we introduce and provide a general analysis of Dual\nAveraging in re\ufb02exive Banach spaces. In Section 3 we apply these results to obtain explicit regret\nbounds on compact metric spaces with uniformly continuous reward functions. We use these results\nin Section 4 in the context of learning Nash equilibria in continuous two-player zero sum games, and\nprovide a numerical example in Section 4. All proofs are given in the supplementary material.\n\n2 Regret Minimization on Re\ufb02exive Banach Spaces\n\nConsider a sequential decision problem in which we are to choose a sequence (x1, x2, . . . ) of actions\nfrom some feasible subset X of a re\ufb02exive Banach space X, and seek to maximize a sequence\n(u1(x1), u2(x2), . . . ) of rewards, where the u\u03c4 : X \u2192 R are elements of a given subset U \u2282 X\u2217,\nwith X\u2217 the dual space of X. We assume that xt, the action chosen at time t, may only depend\non the sequence of previously observed reward vectors (u1, . . . , ut\u22121). We call any such algorithm\nan online algorithm. We consider the adversarial setting, i.e., we do not make any distributional\nassumptions on the rewards. In particular, they could be picked maliciously by some adversary.\nThe notion of regret is a standard measure of performance for such a sequential decision problem. For\nreward and the reward under x, i.e., Rt(x) :=(cid:80)t\na sequence (u1, . . . , ut) of reward vectors, and a sequence of decisions (x1, . . . , xt) produced by an\nalgorithm, the regret of the algorithm w.r.t. a (\ufb01xed) decision x \u2208 X is the gap between the realized\n\u03c4 =1 u\u03c4 (x\u03c4 ). The regret is de\ufb01ned\nas Rt := supx\u2208X Rt(x). An algorithm is said to have sublinear regret if for any sequence (ut)t\u22651\nin the set of admissible reward functions U, the regret grows sublinearly, i.e. lim supt Rt/t \u2264 0.\nExample 1. Consider a \ufb01nite action set S = {1, . . . , n}, let X = X\u2217 = Rn, and let X = \u2206n\u22121,\nthe probability simplex in Rn. A reward function can be identi\ufb01ed with a vector u \u2208 Rn, such that\nthe i-th element ui is the reward of action i. A choice x \u2208 X corresponds to a randomization over\nthe n actions in S. This is the classic setting of many regret-minimizing algorithms in the literature.\nExample 2. Suppose S is a compact metric space with \u00b5 a \ufb01nite measure on S. Consider X =\nX\u2217 = L2(S, \u00b5) and let X = {x \u2208 X : x \u2265 0 a.e.,(cid:107)x(cid:107)1 = 1}. A reward function is an L2-\nintegrable function on S, and each choice x \u2208 X corresponds to a probability distribution (absolutely\ncontinuous w.r.t. \u00b5) over S. We will explore a more general variant of this problem in Section 3.\n\n\u03c4 =1 u\u03c4 (x) \u2212\n\n(cid:80)t\n\noptimization problem xt+1 = arg maxx\u2208X (cid:10)\u03b7t\n\nIn this Section, we prove a general bound on the worst-case regret for DA. DA was introduced\nby Nesterov (2009) for (\ufb01nite dimensional) convex optimization, and has also been applied to online\nlearning, e.g. by Xiao (2010). In the \ufb01nite dimensional case, the method solves, at each iteration, the\n\u2212 h(x), where h is a strongly convex\n\n\u03c4 =1 u\u03c4 , x(cid:11)\n(cid:80)t\n\n2\n\n\fregularizer de\ufb01ned on X \u2282 Rn and (\u03b7t)t\u22650 is a sequence of learning rates. The regret analysis of\nthe method relies on the duality between strong convexity and smoothness (Nesterov, 2009, Lemma\n1). In order to generalize DA to our Banach space setting, we develop an analogous duality result in\nTheorem 1. In particular, we show that the correct notion of strong convexity is (uniform) essential\nstrong convexity. Equipped with this duality result, we analyze the regret of the Dual Averaging\nmethod and derive a general bound in Theorem 2.\n\n2.1 Preliminaries\nLet (X,(cid:107) \u00b7 (cid:107)) be a re\ufb02exive Banach space, and denote by (cid:104)\u00b7 , \u00b7(cid:105) : X \u00d7 X\u2217\n\u2192 R the canonical\npairing between X and its dual space X\u2217, so that (cid:104)x, \u03be(cid:105) := \u03be(x) for all x \u2208 X, \u03be \u2208 X\u2217. By\nthe effective domain of an extended real-valued function f : X \u2192 [\u2212\u221e, +\u221e] we mean the set\ndom f = {x \u2208 X : f (x) < +\u221e}. A function f is proper if f > \u2212\u221e and dom f is non-empty.\nThe conjugate or Legendre-Fenchel transform of f is the function f\u2217 : X\u2217\n\u2192 [\u2212\u221e, +\u221e] given by\n(1)\nfor all \u03be \u2208 X\u2217. If f is proper, lower semicontinuous and convex, its subdifferential \u2202f is the\ndom \u2202f := {x \u2208 X : \u2202f (x) (cid:54)= \u2205}. Let \u0393 denote the set of all convex, lower semicontinuous\nfunctions \u03b3 : [0,\u221e) \u2192 [0,\u221e] such that \u03b3(0) = 0, and let\n(2)\n\nset-valued mapping \u2202f (x) = (cid:8)\u03be \u2208 X\u2217 : f (y) \u2265 f (x) + (cid:104)y \u2212 x, \u03be(cid:105) for all y \u2208 X(cid:9). We de\ufb01ne\n\n\u0393L :=(cid:8)\u03b3 \u2208 \u0393 : \u03b3(r)/r \u2192 0, as r \u2192 0(cid:9)\n\n\u0393U :=(cid:8)\u03b3 \u2208 \u0393 : \u2200r > 0, \u03b3(r) > 0(cid:9)\n\nx\u2208X (cid:104)x, \u03be(cid:105) \u2212 f (x)\n\n(\u03be) = sup\n\n\u2217\n\nf\n\nWe now introduce some de\ufb01nitions. Additional results are reviewed in the supplementary material.\nDe\ufb01nition 1 (Str\u00f6mberg, 2011). A proper convex lower semicontinuous function f : X \u2192 (\u2212\u221e,\u221e]\nis essentially strongly convex if\n\n(i) f is strictly convex on every convex subset of dom \u2202f\n(ii) (\u2202f )\u22121 is locally bounded on its domain\n(iii) for every x0 \u2208 dom \u2202f there exists \u03be0 \u2208 X\u2217 and \u03b3 \u2208 \u0393U such that\n\nf (x) \u2265 f (x0) + (cid:104)x \u2212 x0, \u03be0(cid:105) + \u03b3((cid:107)x \u2212 x0(cid:107)),\n\n\u2200x \u2208 X.\n\n(3)\n\nIf (3) holds with \u03b3 independent of x0, f is uniformly essentially strongly convex with modulus \u03b3.\nDe\ufb01nition 2 (Str\u00f6mberg, 2011). A proper convex lower semicontinuous function f : X \u2192 (\u2212\u221e,\u221e]\nis essentially Fr\u00e9chet differentiable if int dom f (cid:54)= \u2205, f is Fr\u00e9chet differentiable on int dom f with\nFr\u00e9chet derivative Df, and (cid:107)Df (xj)(cid:107)\u2217 \u2192 \u221e for any sequence (xj)j in int dom f converging to\nsome boundary point of dom f.\nDe\ufb01nition 3. A proper Fr\u00e9chet differentiable function f : X \u2192 (\u2212\u221e,\u221e] is essentially strongly\nsmooth if \u2200 x0 \u2208 dom \u2202f, \u2203 \u03be0 \u2208 X\u2217, \u03ba \u2208 \u0393L such that\n(4)\nIf (4) holds with \u03ba independent of x0, f is uniformly essentially strongly smooth with modulus \u03ba.\nWith this we are now ready to give our main duality result:\nTheorem 1. Let f : X \u2192 (\u2212\u221e, +\u221e] be proper, lower semicontinuous and uniformly essentially\nstrongly convex with modulus \u03b3 \u2208 \u0393U . Then\n\nf (x) \u2264 f (x0) + (cid:104)\u03be0, x \u2212 x0(cid:105) + \u03ba((cid:107)x \u2212 x0(cid:107)),\n\n(i) f\u2217 is proper and essentially Fr\u00e9chet differentiable with Fr\u00e9chet derivative\n\n\u2200 x \u2208 X.\n\n\u2217\n\nDf\n\n(\u03be) = arg max\n\nx\u2208X (cid:104)x, \u03be(cid:105) \u2212 f (x).\n\nIf, in addition, \u02dc\u03b3(r) := \u03b3(r)/r is strictly increasing, then\n\n(6)\nIn other words, Df\u2217 is uniformly continuous with modulus of continuity \u03c7(r) = \u02dc\u03b3\u22121(r/2).\n\n(\u03be1) \u2212 Df\n\n(\u03be2)(cid:107) \u2264 \u02dc\u03b3\n\n(cid:107)Df\n\n\u2217\n\n\u2217\n\n\u22121(cid:0)\n\n(cid:107)\u03be1 \u2212 \u03be2(cid:107)\u2217/2(cid:1).\n\n(5)\n\n(ii) f\u2217 is uniformly essentially smooth with modulus \u03b3\u2217.\n\nCorollary 1. If \u03b3(r) \u2265 C r1+\u03ba, \u2200 r \u2265 0 then (cid:107)Df\u2217(\u03be1) \u2212 Df\u2217(\u03be2)(cid:107) \u2264 (2C)\u22121/\u03ba(cid:107)\u03be1 \u2212 \u03be2(cid:107)1/\u03ba\u2217\n.\n2 r2, De\ufb01nition 1 becomes the classic de\ufb01nition of K-strong convexity,\nIn particular, with \u03b3(r) = K\nand (6) yields the result familiar from the \ufb01nite-dimensional case that the gradient Df\u2217 is 1/K\nLipschitz with respect to the dual norm (Nesterov, 2009, Lemma 1).\n\n3\n\n\f2.2 Dual Averaging in Re\ufb02exive Banach Spaces\nWe call a proper convex function h : X \u2192 (\u2212\u221e, +\u221e] a regularizer function on a set X \u2282 X if\nh is essentially strongly convex and dom h = X . We emphasize that we do not assume h to be\nFr\u00e9chet-differentiable. De\ufb01nition 1 in conjunction with Lemma S.1 (supplemental material) implies\nthat for any regularizer h, the supremum of any function of the form (cid:104)\u00b7 , \u03be(cid:105) \u2212 h(\u00b7 ) over X, where\n\u03be \u2208 X\u2217, will be attained at a unique element of X , namely Dh\u2217(\u03be), the Fr\u00e9chet gradient of h\u2217 at \u03be.\nusing the simple update rule xt+1 = Dh\u2217(\u03b7tUt), where Ut =(cid:80)t\nDA with regularizer h and a sequence of learning rates (\u03b7t)t\u22651 generates a sequence of decisions\nTheorem 2. Let h be a uniformly essentially strongly convex regularizer on X with modulus \u03b3 and\nlet (\u03b7t)t\u22651 be a positive non-increasing sequence of learning rates. Then, for any sequence of payoff\nfunctions (ut)t\u22651 in X\u2217 for which there exists M < \u221e such that supx\u2208X |(cid:104)ut, x(cid:105)| \u2264 M for all t, the\nsequence of plays (xt)t\u22650 given by\n(7)\n\n\u03c4 =1 u\u03c4 and U0 := 0.\n\n(cid:80)t\n\n\u2217(cid:0)\u03b7t\n\n\u03c4 =1 u\u03c4\n\nensures that\n\nRt(x) :=\n\nt(cid:88)\n\u03c4 =1(cid:104)u\u03c4 , x(cid:105) \u2212\n\nxt+1 = Dh\n\nt(cid:88)\n\u03c4 =1(cid:104)u\u03c4 , x\u03c4(cid:105) \u2264\n\nh(x) \u2212 h\n\n\u03b7t\n\n+\n\n(cid:1)\nt(cid:88)\n\u03c4 =1(cid:107)u\u03c4(cid:107)\u2217 \u02dc\u03b3\n\n\u22121(cid:16) \u03b7\u03c4\u22121\n\n(cid:17)\n\n2 (cid:107)u\u03c4(cid:107)\u2217\n\n(8)\n\nwhere h = inf x\u2208X h(x), \u02dc\u03b3(r) := \u03b3(r)/r and \u03b70 := \u03b71.\nIt is possible to obtain a regret bound similar to (8) also in a continuous-time setting. In fact,\nfollowing Kwon and Mertikopoulos (2014), we derive the bound (8) by \ufb01rst proving a bound on\na suitably de\ufb01ned notion of continuous-time regret, and then bounding the difference between the\ncontinuous-time and discrete-time regrets. This analysis is detailed in the supplementary material.\nNote that the condition that supx\u2208X |(cid:104)ut, x(cid:105)| \u2264 M in Theorem 2 is weaker than the one in Sridharan\nand Tewari (2010), as it does not imply a uniformly bounded strategy set (e.g., if X = L2(R) and X\nis the set of distributions on X, then X is unbounded in L2, but the condition may still hold).\nTheorem 2 provides a regret bound for a particular choice x \u2208 X . Recall that Rt := supx\u2208X Rt(x).\nIn Example 1 the set X is compact, so any continuous regularizer h will be bounded, and hence\ntaking the supremum over x in (8) poses no issue. However, this is not the case in our general\nsetting, as the regularizer may be unbounded on X . For instance, consider Example 2 with the\nS x(s) log(x(s))ds, which is easily seen to be unbounded on X . As a\nconsequence, obtaining a worst-case bound will in general require additional assumptions on the\nreward functions and the decision set X . This will be investigated in detail in Section 3.\nCorollary 2. Suppose that \u03b3(r) \u2265 C r1+\u03ba, \u2200 r \u2265 0 for some C > 0 and \u03ba > 0. Then\n\nentropy regularizer h(x) =(cid:82)\n\nt(cid:88)\n\u03c4\u22121(cid:107)u\u03c4(cid:107)1+1/\u03ba\n\u03b71/\u03ba\n(cid:17)1/\u03ba\n(cid:16) \u03b7\nIn particular, if (cid:107)ut(cid:107)\u2217 \u2264 M for all t and \u03b7t = \u03b7 t\u2212\u03b2, then\n\nh(x) \u2212 h\n\nRt(x) \u2264\n\n+ (2C)\n\n\u22121/\u03ba\n\n\u03c4 =1\n\n\u03b7t\n\n\u2217\n\n.\n\n(9)\n\n(10)\n\nRt(x) \u2264\n\nh(x) \u2212 h\n\nt\u03b2 +\n\n\u03b7\n\n\u03ba\n\u03ba \u2212 \u03b2\n\n2C\n\nM 1+1/\u03ba t1\u2212\u03b2/\u03ba.\n\nAssuming h is bounded, optimizing over \u03b2 yields a rate of Rt(x) = O(t\n2 r2, which corresponds to the classic de\ufb01nition of strong convexity, then Rt(x) = O(\u221at).\n\u03b3(r) = K\nFor non-vanishing u\u03c4 we will need that \u03b7t (cid:38) 0 for the sum in (9) to converge. Thus we could get\npotentially tighter control over the rate of this term for \u03ba < 1, at the expense of larger constants.\n\n1+\u03ba ). In particular, if\n\n\u03ba\n\n3 Online Optimization on Compact Metric Spaces\n\nWe now apply the above results to the problem minimizing regret on compact metric spaces under\nthe additional assumption of uniformly continuous reward functions. We make no assumptions on\nconvexity of either the feasible set or the rewards. Essentially, we lift the non-convex problem of\nminimizing a sequence of functions over the (possibly non-convex) set S to the convex (albeit in\ufb01nite-\ndimensional) problem of minimizing a sequence of linear functionals over a set X of probability\nmeasures (a convex subset of the vector space of measures on S).\n\n4\n\n\f3.1 An Upper Bound on the Worst-Case Regret\n\np + 1\n\nLet (S, d) be a compact metric space, and let \u00b5 be a Borel measure on S. Suppose that the reward\nvectors u\u03c4 are given by elements in Lq(S, \u00b5), where q > 1. Let X = Lp(S, \u00b5), where p and q are\nH\u00f6lder conjugates, i.e., 1\nq = 1. Consider X = {x \u2208 X : x \u2265 0 a.e.,(cid:107)x(cid:107)1 = 1}, the set of\nprobability measures on S that are absolutely continuous w.r.t. \u00b5 with p-integrable Radon-Nikodym\nderivatives. Moreover, denote by Z the class of non-decreasing \u03c7 : [0,\u221e) \u2192 [0,\u221e] such that\nlimr\u21920 \u03c7(r) = \u03c7(0) = 0. The following assumption will be made throughout this section:\nAssumption 1. The reward vectors ut have modulus of continuity \u03c7 on S, uniformly in t. That is,\nthere exists \u03c7 \u2208 Z such that |ut(s) \u2212 ut(s(cid:48))| \u2264 \u03c7(d(s, s(cid:48))) for all t and for all s, s(cid:48)\nLet B(s, r) = {s(cid:48)\n\u2208 S : d(s, s(cid:48)) < r} and denote by B(s, \u03b4) \u2282 X the elements of X with support\ncontained in B(s, \u03b4). Furthermore, let DS := sups,s(cid:48)\u2208S d(s, s(cid:48)). Then we have the following:\nTheorem 3. Let (S, d) be compact, and suppose that Assumption 1 holds. Let h be a uniformly\nessentially strongly convex regularizer on X with modulus \u03b3, and let (\u03b7t)t\u22651 be a positive non-\nincreasing sequence of learning rates. Then, under (7), for any positive sequence (\u03d1t)t\u22651,\n\n\u2208 S.\n\n\u22121(cid:16) \u03b7\u03c4\u22121\n\nt(cid:88)\n\u03c4 =1(cid:107)u\u03c4(cid:107)\u2217 \u02dc\u03b3\n\n(cid:17)\n\n2 (cid:107)u\u03c4(cid:107)\u2217\n\n.\n\n(11)\n\nRt \u2264\n\nsups\u2208S inf x\u2208B(s,\u03d1t) h(x) \u2212 h\n\n\u03b7t\n\n+ t \u03c7(\u03d1t) +\n\nRemark 1. The sequence (\u03d1t)t\u22651 in Theorem 3 is not a parameter of the algorithm, but rather a\nparameter in the regret bound. In particular, (11) holds true for any such sequence, and we will use\nthis fact later on to obtain explicit bounds by instantiating (11) with a particular choice of (\u03d1t)t\u22651.\nIt is important to realize that the in\ufb01mum over B(s, \u03d1t) in (11) may be in\ufb01nite, in which case the\nbound is meaningless. This happens for example if s is an isolated point of some S \u2282 Rn and \u00b5 is the\nLebesgue measure, in which case B(s, \u03d1t) = \u2205. However, under an additional regularity assumption\non the measure \u00b5 we can avoid such degenerate situations.\nDe\ufb01nition 4 (Heinonen. et al., 2015). A Borel measure \u00b5 on a metric space (S, d) is (Ahlfors)\nQ-regular if there exist 0 < c0 \u2264 C0 < \u221e such that for any open ball B(s, r)\nWe say that \u00b5 is r0-locally Q-regular if (12) holds for all 0 < r \u2264 r0.\nIntuitively, under an r0-locally Q-regular measure, the mass in the neighborhood of any point of S is\nuniformly bounded from above and below. This will allow, at each iteration t, to assign suf\ufb01cient\nprobability mass around the maximizer(s) of the cumulative reward function.\nExample 3. The canonical example for a Q-regular measure is the Lebesgue measure \u03bb on Rn. If d\nis the metric induced by the Euclidean norm, then Q = n and the bound (12) is tight with c0 = C0,\na dimensional constant. However, for general sets S \u2282 Rn, \u03bb need not be locally Q-regular. A\nsuf\ufb01cient condition for local regularity of \u03bb is that S is v-uniformly fat (Krichene et al., 2015).\nAssumption 2. The measure \u00b5 is r0-locally Q-regular on (S, d).\nUnder Assumption 2, B(s, \u03d1t) (cid:54)= \u2205 for all s \u2208 S and \u03d1t > 0, hence we may hope for a bound on\ninf x\u2208B(s,\u03d1t) h(x) uniform in s. To obtain explicit convergence rates, we have to consider a more\nspeci\ufb01c class of regularizers.\n\nc0rQ \u2264 \u00b5(B(s, r)) \u2264 C0rQ.\n\n(12)\n\n3.2 Explicit Rates for f-Divergences on Lp(S)\n\nWe consider a particular class of regularizers called f-divergences or Csisz\u00e1r divergences (Csisz\u00e1r,\n1967). Following Audibert et al. (2014), we de\ufb01ne \u03c9-potentials and the associated f-divergence.\nDe\ufb01nition 5. Let \u03c9 \u2264 0 and a \u2208 (\u2212\u221e, +\u221e]. A continuous increasing diffeomorphism\n\u03c6 : (\u2212\u221e, a) \u2192 (\u03c9,\u221e), is an \u03c9-potential if limz\u2192\u2212\u221e \u03c6(z) = \u03c9, limz\u2192a \u03c6(z) = +\u221e and\n1 \u03c6\u22121(z) dz\n\n\u03c6(0) \u2264 1. Associated to \u03c6 is the convex function f\u03c6 : [0,\u221e) \u2192 R de\ufb01ned by f\u03c6(x) =(cid:82) x\nand the f\u03c6-divergence, de\ufb01ned by h\u03c6(x) =(cid:82)\n\n(cid:0)x(s)(cid:1) d\u00b5(s) + \u03b9X (x), where \u03b9X is the indicator\n\nfunction of X (i.e. \u03b9X (x) = 0 if x \u2208 X and \u03b9X (x) = +\u221e if x /\u2208 X ).\nA remarkable fact is that for regularizers based on \u03c9 potentials, the DA update (7) can be computed\nef\ufb01ciently. More precisely, it can be shown (see Proposition 3 in Krichene (2015)) that the maximizer\nin this case has a simple expression in terms of the dual problem, and the problem of computing\nxt+1 = Dh\u2217(\u03b7t\n\n(cid:80)t\n\u03c4 =1 u\u03c4 ) reduces to computing a scalar dual variable \u03bd\u2217\nt .\n\nS f\u03c6\n\n5\n\n\fProposition 1. Suppose that \u00b5(S) = 1, and that Assumption 2 holds with constants r0 > 0 and\n0 < c0 \u2264 C0 < \u221e. Under the Assumptions of Theorem 3, with h = h\u03c6 the regularizer associated to\nan \u03c9-potential \u03c6, we have that, for any positive sequence (\u03d1t)t\u22651 with \u03d1t \u2264 r0,\n\n(cid:1) + \u03c7(\u03d1t) +\n\nt(cid:88)\n\u03c4 =1(cid:107)u\u03c4(cid:107)\u2217 \u02dc\u03b3\n\n1\nt\n\n\u22121(cid:16) \u03b7\u03c4\u22121\n\n2 (cid:107)u\u03c4(cid:107)\u2217\n\n(cid:17)\n\n.\n\nRt\nt \u2264\n\nmin(C0\u03d1Q\nt \u03b7t\n\nt , \u00b5(S))\n\n\u22121\n0 \u03d1\n\n\u2212Q\nt\n\nf\u03c6\n\n(cid:0)c\n\n(13)\n\nFor particular choices of the sequences (\u03b7t)t\u22651 and (\u03d1t)t\u22651, we can derive explicit regret rates.\n\n3.3 Analysis for Entropy Dual Averaging (The Generalized Hedge Algorithm)\n\nTaking \u03c6(z) = ez\u22121, we have that f\u03c6(x) =(cid:82) x\nh\u03c6(x) =(cid:82)\n\nS x(s) log x(s)d\u00b5(s). Then Dh\u2217(\u03be)(s) = exp \u03be(s)\n(cid:107) exp \u03be(s)(cid:107)1\n\n1 \u03c6\u22121(z)dz = x log x, and hence the regularizer is\n. This corresponds to a generalized\nHedge algorithm (Arora et al., 2012; Krichene et al., 2015) or the entropic barrier of Bubeck and\nEldan (2014) for Euclidean spaces. The regularizer h\u03c6 can be shown to be essentially strongly convex\nwith modulus \u03b3(r) = 1\nCorollary 3. Suppose that \u00b5(S) = 1, that \u00b5 is r0-locally Q-regular with constants c0, C0, that\n(cid:107)ut(cid:107)\u2217 \u2264 M for all t, and that \u03c7(r) = C\u03b1r\u03b1 for 0 < \u03b1 \u2264 1 (that is, the rewards are\n\u03b1-H\u00f6lder continuous). Then, under Entropy Dual Averaging, choosing \u03b7t = \u03b7\nlog t/t with\n\u03b7 = 1\nM\n\n(cid:1)1/2 and \u03d1 > 0, we have that\n(cid:114) 2C0\n(cid:16)\n(cid:17)\n\n(cid:19)(cid:114)\n\n(cid:112)\n\n(cid:18)\n\n2 r2.\n\nlog(c\n\n\u22121\n0 \u03d1\u2212Q/\u03b1) +\n\nQ\n2\u03b1\n\n+ C\u03b1\u03d1\n\nlog t\n\nt\n\n(14)\n\n2\u03b1\n\nlog(c\n\n\u22121\n0 \u03d1\u2212Q/\u03b1) + Q\nRt\nt \u2264\nlog t/t < r\u03b1\n\n2M\n0 \u03d1\u22121.\n\nc0\n\n(cid:0) C0Q\nwhenever(cid:112)\n\n2c0\n\nOne can now further optimize over the choice of \u03d1 to obtain the best constant in the bound. Note also\nthat the case \u03b1 = 1 corresponds to Lipschitz continuity.\n\n3.4 A General Lower Bound\nTheorem 4. Let (S, d) be compact, suppose that Assumption 2 holds, and let w : R \u2192 R be any\nfunction with modulus of continuity \u03c7 \u2208 Z such that (cid:107)w(d(\u00b7 , s(cid:48)))(cid:107)q \u2264 M for some s(cid:48)\n\u2208 S for which\nthere exists s \u2208 S with d(s, s(cid:48)) = DS. Then for any online algorithm, there exist a sequence (u\u03c4 )t\nof reward vectors u\u03c4 \u2208 X\u2217 with (cid:107)u\u03c4(cid:107)\u2217 \u2264 M and modulus of continuity \u03c7\u03c4 < \u03c7 such that\n\n\u03c4 =1\n\nRt \u2265\n\nw(DS)\n2\u221a2\n\n\u221at,\n\n(15)\n\n(16)\n\nMaximizing the constant in (15) is of interest in order to benchmark the bound against the upper\nbounds obtained in the previous sections. This problem is however quite challenging, and we will\ndefer this analysis to future work. For H\u00f6lder-continuous functions, we have the following result:\nProposition 2. In the setting of Theorem 4, suppose that \u00b5(S) = 1 and that \u03c7(r) = C\u03b1r\u03b1 for some\n0 < \u03b1 \u2264 1. Then\n\nmin(cid:0)C 1/\u03b1\n\nS , M(cid:1)\n\nRt \u2265\n\n\u03b1 D\u03b1\n2\u221a2\n\n\u221at.\n\nObserve that, up to a \u221alog t factor, the asymptotic rate of this general lower bound for any online\nalgorithm matches that of the upper bound (14) of Entropy Dual Averaging.\n\n4 Learning in Continuous Two-Player Zero-Sum Games\n\nConsider a two-player zero sum game G = (S1, S2, u), in which the strategy spaces S1 and S2 of\nplayer 1 and 2, respectively, are Hausdorff spaces, and u : S1 \u00d7 S2 \u2192 R is the payoff function of\nplayer 1 (as G is zero-sum, the payoff function of player 2 is \u2212u). For each i, denote by Pi := P(Si)\n(cid:82)\nthe set of Borel probability measures on Si. Denote S := S1 \u00d7 S2 and P := P1 \u00d7 P2. For a\n(joint) mixed strategy x \u2208 P, we de\ufb01ne the natural extension \u00afu : P \u2192 R by \u00afu(x) := Ex[u] =\nS u(s1, s2) dx(s1, s2), which is the expected payoff of player 1 under x.\n\n6\n\n\fA continuous zero-sum game G is said to have value V if\n\nsup\nx1\u2208P1\n\ninf\nx2\u2208P2\n\n\u00afu(x1, x2) = inf\n\nx2\u2208P2\n\nsup\nx1\u2208P1\n\n\u00afu(x1, x2) = V.\n\n(17)\n\nThe elements x1 \u00d7 x2 \u2208 P at which (17) holds are the (mixed) Nash Equilibria of G. We denote the\nset of Nash equilibria of G by N (G). In the case of \ufb01nite games, it is well known that every two-player\nzero-sum game has a value. This is not true in general for continuous games, and additional conditions\non strategy sets and payoffs are required, see e.g. (Glicksberg, 1950).\n\n\u2212i\n\u03c4 )\n\n\u2264 0\n\n(18)\n\n1\nt\n\n\u03c4 =1\n\n\u03c4 =1\n\n(cid:19)\n\n(cid:18)\n\nsup\nsi\u2208Si\n\nui(si\n\n\u03c4 , s\n\nui(si, s\n\nlim sup\nt\u2192\u221e\n\nt(cid:88)\n\n\u2212i\n\u03c4 ) \u2212\n\nt )t\u22651 and (s2\n\ni)t\u22651 generated by \u03c3i has sublinear regret almost surely.\n\nt )t\u22651, we say that player i has sublinear (realized) regret if\n\n4.1 Repeated Play\nWe consider repeated play of the continuous two-player zero-sum game. Given a game G and a\nsequence of plays (s1\n\nt(cid:88)\nwhere we use \u2212i to denote the other player.\nA strategy \u03c3i for player i is, loosely speaking, a (possibly random) mapping from past observations\nto its actions. Of primary interest to us are Hannan-consistent strategies:\nDe\ufb01nition 6 (Hannan, 1957). A strategy \u03c3i of player i is Hannan consistent if, for any sequence\n(st\u2212i)t\u22651, the sequence of plays (st\nNote that the almost sure statement in De\ufb01nition 6 is with respect to the randomness in the strategy \u03c3i.\nThe following result is a generalization of its counterpart for discrete games (e.g. Corollary 7.1 in\n(cid:80)t\n(Cesa-Bianchi and Lugosi, 2006)):\nProposition 3. Suppose G has value V and consider a sequence of plays (s1\nassume that both players have sublinear realized regret. Then limt\u2192\u221e 1\nt\nAs in the discrete case (Cesa-Bianchi and Lugosi, 2006), we can also say something about convergence\nof the empirical distributions of play to the set of Nash Equilibria. Since these distributions have\n(cid:80)t\n\ufb01nite support for every t, we can at best hope for convergence in the weak sense as follows:\nTheorem 5. Suppose that in a repeated two-player zero sum game G that has a value both players\nthe marginal empirical\nfollow a Hannan-consistent strategy, and denote by \u02c6xi\ndistribution of play of player i at iteration t. Let \u02c6xt := (\u02c6x1\nt ). Then \u02c6xt (cid:42) N (G) almost surely,\nthat is, with probability 1 the sequence (\u02c6xt)t\u22651 weakly converges to the set of Nash equilibria of G.\nCorollary 4. If G has a unique Nash equilibrium x\u2217, then with probability 1, \u02c6xt (cid:42) x\u2217.\n4.2 Hannan-Consistent Strategies\n\nt = 1\nt\nt , \u02c6x2\n\nt )t\u22651, (s2\n\nt )t\u22651 and\n\n\u03c4 ) = V .\n\n\u03c4 =1 u(s1\n\n\u03c4 , s2\n\n\u03c4 =1 \u03b4si\n\n\u03c4\n\nBy Theorem 5, if each player follows a Hannan-consistent strategy, then the empirical distributions\nof play weakly converge to the set of Nash equilibria of the game. But do such strategies exist?\nRegret minimizing strategies are intuitive candidates, and the intimate connection between regret\nminimization and learning in games is well studied in many cases, e.g. for \ufb01nite games (Cesa-\nBianchi and Lugosi, 2006) or potential games (Monderer and Shapley, 1996). Using our results from\nSection 3, we will show that, under the appropriate assumption on the information revealed to the\nplayer, no-regret learning based on Dual Averaging leads to Hannan consistency in our setting.\nSpeci\ufb01cally, suppose that after each iteration t, each player i observes a partial payoff function\nt : Si \u2192 R describing their payoff as a function of only their own action, si, holding the action\n\u02dcui\nplayed by the other player \ufb01xed. That is, \u02dcu1\nRemark 2. Note that we do not assume that the players have knowledge of the joint utility function u.\nHowever, we do assume that the player has full information feedback, in the sense that they observe\npartial reward functions u(\u00b7 , s\u2212i\n\u03c4 ) on their entire action set, as opposed to only observing the reward\n\n\u03c4 ) of the action played (the latter corresponds to the bandit setting).\n\nt (s2) := \u2212u(s1\n\nt (s1) := u(s1, s2\n\nt ) and \u02dcu2\n\nt , s2).\n\n\u03c4 , s2\n\nu(s1\n\n\u03c4 )t\n\nt = (\u02dcui\n\nWe denote by \u02dcU i\nt to denote the set of all possible such histories, and de\ufb01ne U i\nU i\nt)\u221e\nt=1 of (possibly random) mappings \u03c3i\ncollection (\u03c3i\nplays si\nt\u22121). We make the following assumption on the payoff function:\nt = \u03c3i\nt(U i\n\n\u03c4 =1 the sequence of partial payoff functions observed by player i. We use\n0 := \u2205. A strategy \u03c3i of player i is a\nt\u22121 \u2192 Si, such that at iteration t, player i\n\nt : U i\n\n7\n\n\f(cid:1).\n\n(cid:80)t\u22121\n\n(cid:0)\u03b7t\u22121\n\nt is a random variable with the following distribution:\n\nAssumption 3. The payoff function u is uniformly continuous in si with modulus of continuity\nindependent of s\u2212i for i = 1, 2. That is, for each i there exists \u03c7i \u2208 Z such that |u(s, s\u2212i) \u2212\nu(s(cid:48), s\u2212i)| \u2264 \u03c7i(di(s, s(cid:48))) for all s\u2212i \u2208 S\u2212i.\nIt is easy to see that Assumption 3 implies that the game has a value (see supplementary material).\nIt also makes our setting compatible with that of Section 3. Suppose now that each player random-\nizes their play according to the sequence of probability distributions on Si generated by DA with\nregularizer hi. That is, suppose that each \u03c3i\n\u2217\n\u03c3i\nt \u223c Dh\ni\n\n(19)\nTheorem 6. Suppose that player i uses strategy \u03c3i according to (19), and that the DA algorithm\nensures sublinear regret (i.e. lim supt Rt/t \u2264 0). Then \u03c3i is Hannan-consistent.\nCorollary 5. If both players use strategies according to (19) with the respective Dual Averaging en-\nsuring that lim supt Rt/t \u2264 0, then with probability 1 the sequence (\u02c6xt)t\u22651 of empirical distributions\nof play weakly converges to the set of Nash equilibria of G.\n(cid:0)x1, x2(cid:1) =(cid:0) exp(s)\nExample Consider a zero-sum game G1 between two players on the unit interval with payoff func-\ntion u(s1, s2) = s1s2 \u2212 a1s1 \u2212 a2s2, where a1 = e\u22122\ne\u22121. It is easy to verify that the pair\nt (s1) =(cid:0)\u03a3t\nand (s2\nU 1\n\n(cid:1) is a mixed-strategy Nash equilibrium of G1. For sequences (s1\n(cid:1)s2 \u2212 a1\u03a3t\n\u03c4 \u2212 a1t(cid:1)s1 \u2212 a2\u03a3t\n\n\u03c4 )t\n\u03c4 =1, the cumulative payoff functions for \ufb01xed action s \u2208 [0, 1] are given, respectively, by\n\u03c4 )t\n\nt (s2) =(cid:0)a2t \u2212 \u03a3t\n\ne\u22121 , exp(1\u2212s)\ne\u22121\n\ne\u22121 and a2 = 1\n\n\u03c4 =1 \u02dcui\n\u03c4\n\n\u03c4 =1s2\n\u03c4\n\nU 2\n\n\u03c4 =1s1\n\u03c4\n\n\u03c4 =1s1\n\u03c4\n\n\u03c4 =1s2\n\n\u03c4 =1\n\nIf each player i uses the Generalized Hedge Algorithm with learning rates (\u03b7\u03c4 )t\nperiod t is to sample from the distribution xi\nt = \u03b7t(a2t\u2212 \u03a3t\n\u03b12\nstatistic, in the sense that it completely determines the mixed strategy at time t.\n\n\u03c4 =1, their strategy in\n\u03c4 \u2212 a1t) and\n\u03c4 ). Interestingly, in this case the sum of the opponent\u2019s past plays is a suf\ufb01cient\n\nt(s) \u221d exp(\u03b1i\n\nts), where \u03b11\n\nt = \u03b7t(\u03a3t\n\n\u03c4 =1s1\n\n\u03c4 =1s2\n\nFigure 1: Normalized histograms of the empirical distributions of play in G (100 bins)\n\nFigure 1 shows normalized histograms of the empirical distributions of play at different iterations t.\nAs t grows the histograms approach the equilibrium densities x1 and x2, respectively. However, this\nt oscillating\ndoes not mean that the individual strategies xi\naround the equilibrium parameters 1 and \u22121, respectively, even for very large t. We do, however,\nobserve that the time-averaged parameters \u00af\u03b1i\n\nt converge. Indeed, Figure 2 shows the \u03b1i\nt converge to the equilibrium values 1 and \u22121.\n\nFigure 2: Evolution of parameters \u03b1i\n\nt and \u00af\u03b1i\n\nt := 1\nt\n\n\u03c4 =1 \u03b1i\n\n\u03c4 in G1\n\nIn the supplementary material we provide additional numerical examples, including one that illustrates\nhow our algorithms can be utilized as a tool to compute approximate Nash equilibria in continuous\nzero-sum games on non-convex domains.\n\n(cid:80)t\n\n8\n\n0.00.51.01.52.02.5player1,t=5000x1(s)player1,t=50000x1(s)player1,t=500000x1(s)0.00.20.40.60.81.00.00.51.01.52.02.5player2,t=5000x2(s)0.00.20.40.60.81.0player2,t=50000x2(s)0.00.20.40.60.81.0player2,t=500000x2(s)100101102103104105106\u22121012\u03b11t\u00af\u03b11t\u03b12t\u00af\u03b12t\fReferences\nSanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-\n\nalgorithm and applications. Theory of Computing, 8(1):121\u2013164, 2012.\n\nJean-Yves Audibert, S\u00e9bastien Bubeck, and G\u00e0bor Lugosi. Regret in online combinatorial optimiza-\n\ntion. Mathematics of Operations Research, 39(1):31\u201345, 2014.\n\nS. Bubeck and R. Eldan. The entropic barrier: a simple and optimal universal self-concordant barrier.\n\nArXiv e-prints, December 2014.\n\nS\u00e9bastien Bubeck and Nicol\u00f2 Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-\n\narmed bandit problems. Foundations and Trends in Machine Learning, 5(1):1\u2013122, 2012.\n\nNicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge UP, 2006.\nThomas M. Cover. Universal portfolios. Mathematical Finance, 1(1):1\u201329, 1991.\nImre Csisz\u00e1r.\n\nInformation-type measures of difference of probability distributions and indirect\n\nobservations. Studia Scientiarum Mathematicarum Hungarica, 2:299\u2013318, 1967.\n\nIrving L. Glicksberg. Minimax theorem for upper and lower semicontinuous payoffs. Research\n\nMemorandum RM-478, The RAND Corporation, Oct 1950.\n\nJames Hannan. Approximation to Bayes risk in repeated play. In Contributions to the Theory of\n\nGames, vol III of Annals of Mathematics Studies 39. Princeton University Press, 1957.\n\nSergiu Hart and Andreu Mas-Colell. A general class of adaptive strategies. Journal of Economic\n\nTheory, 98(1):26 \u2013 54, 2001.\n\nElad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex\n\noptimization. Machine Learning, 69(2-3):169\u2013192, 2007.\n\nJuha Heinonen., Pekka Koskela, Nageswari Shanmugalingam, and Jeremy T. Tyson. Sobolev\nSpaces on Metric Measure Spaces: An Approach Based on Upper Gradients. New Mathematical\nMonographs. Cambridge University Press, 2015.\n\nWalid Krichene. Dual averaging on compactly-supported distributions and application to no-regret\n\nlearning on a continuum. CoRR, abs/1504.07720, 2015.\n\nWalid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The Hedge Algorithm\non a Continuum. In 32nd International Conference on Machine Learning, pages 824\u2013832, 2015.\nJoon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. ArXiv\n\ne-prints, January 2014.\n\nEhud Lehrer. Approachability in in\ufb01nite dimensional spaces. International Journal of Game Theory,\n\n31(2):253\u2013268, 2003.\n\nDov Monderer and Lloyd S. Shapley. Potential games. Games and Economic Behavior, 14(1):124 \u2013\n\n143, 1996.\n\nYurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming,\n\n120(1):221\u2013259, 2009.\n\nShai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in\n\nMachine Learning, 4(2):107\u2013194, 2012.\n\nNati Srebro, Karthik Sridharan, and Ambuj Tewari. On the universality of online mirror descent. In\n\nAdvances in Neural Information Processing Systems 24 (NIPS), pages 2645\u20132653. 2011.\n\nKarthik Sridharan and Ambuj Tewari. Convex games in banach spaces. In COLT 2010 - The 23rd\n\nConference on Learning Theory,, pages 1\u201313, Haifa, Israel, June 2010.\n\nThomas Str\u00f6mberg. Duality between Fr\u00e9chet differentiability and strong convexity. Positivity, 15(3):\n\n527\u2013536, 2011.\n\nLin Xiao. Dual averaging methods for regularized stochastic learning and online optimization. J.\n\nMach. Learn. Res., 11:2543\u20132596, December 2010.\n\n9\n\n\f", "award": [], "sourceid": 125, "authors": [{"given_name": "Maximilian", "family_name": "Balandat", "institution": "UC Berkeley"}, {"given_name": "Walid", "family_name": "Krichene", "institution": "UC Berkeley"}, {"given_name": "Claire", "family_name": "Tomlin", "institution": "UC Berkeley"}, {"given_name": "Alexandre", "family_name": "Bayen", "institution": "UC Berkeley"}]}