{"title": "Efficient Online Linear Optimization with Approximation Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 627, "page_last": 635, "abstract": "We revisit the problem of Online Linear Optimization in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $\\alpha$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied offline linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the $\\alpha$-regret which is the natural extension of the standard regret in online learning to this setting.   We present new  algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem. Mainly, for both variants, we present $\\alpha$-regret bounds of $O(T^{-1/3})$, were $T$ is the number of prediction rounds, using only $O(\\log(T))$ calls to the approximation oracle per iteration, on average. These are the first results to obtain both average oracle complexity of $O(\\log(T))$ (or even poly-logarithmic in $T$) and $\\alpha$-regret bound $O(T^{-c})$ for a positive constant $c$, for both variants.", "full_text": "Ef\ufb01cient Online Linear Optimization\n\nwith Approximation Algorithms\n\nDan Garber\n\nTechnion - Israel Institute of Technology\n\ndangar@technion.ac.il\n\nAbstract\n\nWe revisit the problem of online linear optimization in case the set of feasible ac-\ntions is accessible through an approximated linear optimization oracle with a factor\n\u21b5 multiplicative approximation guarantee. This setting is in particular interesting\nsince it captures natural online extensions of well-studied of\ufb02ine linear optimiza-\ntion problems which are NP-hard, yet admit ef\ufb01cient approximation algorithms.\nThe goal here is to minimize the \u21b5-regret which is the natural extension of the\nstandard regret in online learning to this setting. We present new algorithms with\nsigni\ufb01cantly improved oracle complexity for both the full information and bandit\nvariants of the problem. Mainly, for both variants, we present \u21b5-regret bounds of\nO(T 1/3), were T is the number of prediction rounds, using only O(log(T )) calls\nto the approximation oracle per iteration, on average. These are the \ufb01rst results to\nobtain both average oracle complexity of O(log(T )) (or even poly-logarithmic in\nT ) and \u21b5-regret bound O(T c) for a constant c > 0, for both variants.\n\n1\n\nIntroduction\n\nIn this paper we revisit the problem of Online Linear Optimization (OLO) [14], which is a specialized\ncase of Online Convex Optimization (OCO) [12] with linear loss functions, in case the feasible set of\nactions is accessible through an oracle for approximated linear optimization with a multiplicative\napproximation error guarantee. In the standard setting of OLO, a decision maker is repeatedly\nrequired to choose an action, a vector in some \ufb01xed feasible set in Rd. After choosing his action,\nthe decision maker incurs loss (or payoff) given by the inner product between his selected vector\nand a vector chosen by an adversary. This game between the decision maker and the adversary then\nrepeats itself. In the full information variant of the problem, after the decision maker receives his\nloss (payoff) on a certain round, he gets to observe the vector chosen by the adversary. In the bandit\nversion of the problem, the decision maker only observes his loss (payoff) and does not get to observe\nthe adversary\u2019s vector. The standard goal of the decision maker in OLO is to minimize a quantity\nknown as regret, which measures the difference between the average loss of the decision maker on\na game of T consecutive rounds (where T is \ufb01xed and known in advance), and the average loss\nof the best feasible action in hindsight (i.e., chosen with knowledge of all actions of the adversary\nthroughout the T rounds) (in case of payoffs this difference is reversed). The main concern when\ndesigning algorithms for choosing the actions of the decision maker, is guaranteeing that the regret\ngoes to zero as the length of the game T increases, as fast as possible (i.e., the rate of the regret in\nterms of T ). It should be noted that in this paper we focus on the case in which the adversary is\noblivious (a.k.a. non-adaptive), which means the adversary chooses his entire sequence of actions for\nthe T rounds beforehand.\nWhile there exist well known algorithms for choosing the decision maker\u2019s actions which guarantee\noptimal regret bounds in T , such as the celebrated Follow the Perturbed Leader (FPL) and Online\nGradient Descent (OGD) algorithms [14, 17, 12], ef\ufb01cient implementation of these algorithms hinges\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fon the ability to ef\ufb01ciently solve certain convex optimization problems (e.g., linear minimization\nfor FPL or Euclidean projection for OGD) over the feasible set (or the convex hull of feasible\npoints). However, when the feasible set corresponds for instance to the set of all possible solutions\nto some NP-Hard optimization problem, no such ef\ufb01cient implementations are known (or even\nwidely believed to exist), and thus these celebrated regret-minimizing procedures cannot be ef\ufb01ciently\napplied. Luckily, many NP-Hard linear optimization problems (i.e., the objective function to either\nminimize or maximize is linear) admit ef\ufb01cient approximation algorithms with a multiplicative\napproximation guarantee. Some examples include MAX-CUT (factor 0.87856 approximation due to\n[9]) , METRIC TSP (factor 1.5 approximation due to [6]), MINIMUM WEIGHTED VERTEX COVER\n(factor 2 approximation [4]), and WEIGHTED SET COVER (factor (log n + 1) approximation due to\n[7]). It is thus natural to ask wether an ef\ufb01cient factor \u21b5 approximation algorithm for an NP-Hard\nof\ufb02ine linear optimization problem could be used to construct, in a generic way, an ef\ufb01cient algorithm\nfor the online version of the problem. Note that in this case, even ef\ufb01ciently computing the best \ufb01xed\naction in hindsight is not possible, and thus, minimizing regret via an ef\ufb01cient algorithm does not\nseem likely (given an approximation algorithm we can however compute in hindsight a decision\nthat corresponds to at most (at least) \u21b5 times the average loss (payoff) of the best \ufb01xed decision in\nhindsight).\nIn their paper [13], Kakade, Kalai and Ligett were the \ufb01rst to address this question in a fully generic\nway. They showed that using only an \u21b5-approximation oracle for the set of feasible actions, it\nis possible, at a high level, to construct an online algorithm which achieves vanishing (expected)\n\u21b5-regret, which is the difference between the average loss of the decision maker and \u21b5 times the\naverage loss of the best \ufb01xed point in hindsight (for loss minimization problems and \u21b5  1; a\ncorresponding de\ufb01nition exists for payoff maximization problems and \u21b5< 1). Concretely, [13]\nshowed that one can guarantee O(T 1/2) expected \u21b5-regret in the full-information setting, which is\noptimal, and O(T 1/3) in the bandit setting under the additional assumption of the availability of a\nBarycentric Spanner (which we discuss in the sequel).\nWhile the algorithm in [13] achieves an optimal \u21b5-regret bound (in terms of T ) for the full information\nsetting, in terms of computational complexity, the algorithm requires, in worst case, to perform on\neach round O(T ) calls to the approximation oracle, which might be prohibitive and render the\nalgorithm inef\ufb01cient, since as discussed, in general, T is assumed to grow to in\ufb01nity and thus the\ndependence of the runtime on T is of primary interest. Similarly, their algorithm for the bandit setting\nrequires O(T 2/3) calls to the approximation oracle per iteration.\nThe main contribution of our work is in providing new low \u21b5-regret algorithms for the full information\nand bandit settings with signi\ufb01cantly improved oracle complexities. A detailed comparison with [13]\nis given in Table 1. Concretely, for the full-information setting, we show it is possible to achieve\nO(T 1/3) expected \u21b5-regret using only O(log(T )) calls to the approximation oracle per iteration,\non average, which signi\ufb01cantly improves over the O(T ) bound of [13]1. We also show a bound\nof O(T 1/2) on the expected \u21b5-regret (which is optimal) using only O(pT log(T )) calls to the\noracle per iteration, on average, which gives nearly quadratic improvement over [13]. In the bandit\nsetting we show it is possible to obtain a O(T 1/3) bound on the expected \u21b5-regret (same as in [13])\nusing only O(log(T )) calls to the oracle per iteration, on average, under the same assumption on the\navailability of a Barycentric Spanner (BS). It is important to note that while there exist algorithms\nfor OLO with bandit feedback which guarantee \u02dcO(T 1/2) expected regret [1, 11] (where the \u02dcO(\u00b7)\nhides poly-logarithmic factors in T ), these require on each iteration to either solve to arbitrarily small\naccuracy a convex optimization problem over the feasible set [1], or sample a point from the feasible\nset according to a speci\ufb01ed distribution [11], both of which cannot be implemented ef\ufb01ciently in our\nsetting. On the other-hand, as we formally show in the sequel, at a high level, using a BS (originally\nintroduced in [2]) simply requires to \ufb01nd a single set of d points from the feasible set which span the\nentire space Rd (assuming this is possible, otherwise the set could be mapped to a lower dimensional\nspace). The process of \ufb01nding these vectors can be viewed as a preprocessing step and thus can be\ncarried out of\ufb02ine. Moreover, as discussed in [13], for many NP-Hard problems it is possible to\ncompute a BS in polynomial time and thus even this preprocessing step is ef\ufb01cient. Importantly, [13]\nshows that the approximation oracle by itself is not strong enough to guarantee non-trivial \u21b5-regret in\nthe bandit setting, and hence this assumption on the availability of a BS seems reasonable. Since the\n\n1as we show in the appendix, even if we relax the algorithm of [13] to only guarantee O(T 1/3) \u21b5-regret, it\n\nwill still require O(T 2/3) calls to the oracle per iteration, on average.\n\n2\n\n\ffull information\n\nbandit information\n\nReference\nKKL [13]\nThis paper (Thm. 4.1, 4.2)\nThis paper (Thm. 4.1)\n\n\u21b5  regret\nT 1/2\nT 1/3\nT 1/2\n\noracle complexity\n\nT\n\nlog(T )\npT log(T )\n\n\u21b5  regret\nT 1/3\nT 1/3\n\n-\n\noracle complexity\n\nT 2/3\nlog(T )\n\n-\n\nTable 1: comparison of expected \u21b5 regret bounds and average number of calls to the approximation\noracle per iteration. In all bounds we give only the dependence on the length of the game T and\nomit all other dependencies which we treat as constants. In the bandit setting we report the expected\nnumber of calls to the oracle per iteration.\n\nbest general regret bound known using a BS is O(T 1/3), the \u21b5-regret bound of our bandit algorithm\nis the best achievable to date via an ef\ufb01cient algorithm.\nTechnically, the main challenge in the considered setting is that as discussed, we cannot readily apply\nstandard tools such as FPL and OGD. At a high level, in [13] it was shown that it is possible to\napply the OGD method by replacing the exact projection step of OGD with an iterative algorithm\nwhich \ufb01nds an infeasible point, but one that both satis\ufb01es the projection property required by OGD\nand is dominated by a convex combination of feasible points for every relevant linear loss (payoff)\nfunction. Unfortunately, in worst case, the number of queries to the approximation oracle required by\nthis so-called projection algorithm per iteration is linear in T . While our online algorithms are also\nbased on an application of OGD, our approach to computing the so-called projections is drastically\ndifferent than [13], and is based on a coupling of two cutting plane methods, one that is based on\nthe Ellipsoid method, and the other that resembles Gradient Descent. This approach might be of\nindependent interest and might prove useful to similar problems.\n\n1.1 Additional related work\nKalai and Vempala [14] showed that approximation algorithms which have point-wise approximation\nguarantee, such as the celebrated MAX-CUT algorithm of [9], could be used to instantiate their\nFollow the Perturbed Leader framework to achieve low \u21b5-regret. However this construction is far\nfrom generic and requires the oracle to satisfy additional non-trivial conditions. This approach was\nalso used in [3]. In [14] it was also shown that FPL could be instantiated with a FPTAS to achieve low\n\u21b5-regret, however the approximation factor in the FPTAS needs to be set to roughly (1 + O(T 1/2)),\nwhich may result in prohibitive running times even if a FPTAS for the underlying problem is available.\nSimilarly, in [8] it was shown that if the approximation algorithm is based on solving a convex\nrelaxation of the original, possibly NP-Hard, problem, this additional structure can be used with\nthe FPL framework to achieve low \u21b5-regret ef\ufb01ciently. To conclude all of the latter works consider\nspecialized cases in which the approximation oracle satis\ufb01es additional non-trivial assumptions\nbeyond its approximation guarantee, whereas here, similarly to [13], we will be interested in a generic\nas possible conversion from the of\ufb02ine problem to the online one, without imposing additional\nstructure on the of\ufb02ine oracle.\n\n2 Preliminaries\n\n2.1 Online linear optimization with approximation oracles\nLet K,F be compact sets of points in Rd\n+ (non-negative orthant in Rd) such that maxx2K kxk \uf8ff\nR, maxf2F kfk \uf8ff F , for some R > 0, F > 0 (throughout this work we let k\u00b7k denote the standard\nEuclidean norm), and for all x 2K , f 2F it holds that C  x \u00b7 f  0, for some C > 0.\nWe assume K is accessible through an approximated linear optimization oracle OK : Rd\n+ !K with\nparameter \u21b5> 0 such that:\nif \u21b5  1;\nOK(c) \u00b7 c  \u21b5 maxx2K x \u00b7 c if \u21b5< 1.\n\nOK(c) 2K and \u21e2 OK(c) \u00b7 c \uf8ff \u21b5 minx2K x \u00b7 c\n\n8c 2 Rd\n+ :\n\nHere K is the feasible set of actions for the player, and F is the set of all possible loss/payoff vectors2.\n+ are\nmade for ease of presentation and clarity, and since these naturally hold for many NP-Hard optimization problem\nthat are relevant to our setting. Nevertheless, these assumptions could be easily generalized as done in [13].\n\n2we note that both of our assumptions that K\u21e2 Rd\n\n+ and that the oracle takes inputs from Rd\n\n+,F\u21e2 Rd\n\n3\n\n\f\u21b5  regret({(xt, ft)}t2[T ]) :=8<:\n\nT PT\n\nt=1 xt \u00b7 ft  \u21b5 \u00b7 minx2K\nt=1 x \u00b7 ft  1\n\n1\n\n\u21b5 \u00b7 maxx2K\n\nT PT\n\nT PT\nT PT\n\nt=1 x \u00b7 ft\nt=1 xt \u00b7 ft\n\nif \u21b5  1;\nif \u21b5< 1.\n\n(1)\n\nSince naturally a factor \u21b5> 1 for the approximation oracle is reasonable only for loss minimization\nproblems, and a value \u21b5< 1 is reasonable for payoff maximization problems, throughout this\nwork it will be convenient to use the value of \u21b5 to differentiate between minimization problems and\nmaximization problems.\nGiven a sequence of linear loss/payoff functions {f1, ..., fT}2F T and a sequence of feasible points\n{x1, ...., xT}2K T , we de\ufb01ne the \u21b5  regret of the sequence {xt}t2[T ] with respect to the sequence\n{ft}t2[T ] as\n\n1\n\n1\n\nWhen the sequences {xt}t2[T ],{ft}t2[T ] are obvious from context we will simply write \u21b5  regret\nwithout stating these sequences. Also, when the sequence {xt}t2[T ] is randomized we will use\nE[\u21b5  regret] to denote the expected \u21b5-regret.\n2.1.1 Online linear optimization with full information\nIn OLO with full information, we consider a repeated game of T prediction rounds, for a \ufb01xed\nT , where on each round t, the decision maker is required to choose a feasible action xt 2K .\nAfter committing to his choice, a linear loss function ft 2F is revealed, and the decision maker\nincurs loss of xt \u00b7 ft.\nIn the payoff version, the decision maker incurs payoff of xt \u00b7 ft. The\ngame then continues to the next round. The overall goal of the decision maker is to guarantee\nthat \u21b5  regret({(xt, ft)}t2[T ]) = O(T c) for some c > 0, at least in expectation (in fact using\nrandomization is mandatory since K need not be convex). Here we assume that the adversary is\noblivious (aka non-adaptive), i.e., the sequence of losses/payoffs f1, ..., fT is chosen in advance\n(before the \ufb01rst round), and does not depend on the actions of the decision maker.\n\n2.1.2 Bandit feedback\nThe bandit version of the problem is identical to the full information setting with one crucial difference:\non each round t, after making his choice, the decision maker does not observe the vector ft, but only\nthe value of his loss/payoff, given by xt \u00b7 ft.\n2.2 Additional notation\nFor any two sets S,K\u21e2 Rd and a scalar  2 R we de\ufb01ne the sets S +K := {x + y | x 2S , y 2K} ,\nS := {x | x 2S} . We also denote by CH(K) the convex-hull of all points in a set K. For a\nconvex and compact set S\u21e2 Rd and a point x 2 Rd we de\ufb01ne dist(x,S) := minz2S kz  xk. We\nlet B(c, r) denote the Euclidean ball or radius r centered in c.\n2.3 Basic algorithmic tools\n\nWe now brie\ufb02y describe two very basic ideas that are essential for constructing our algorithms,\nnamely the extended approximation oracle and the online gradient descent without feasibility method.\nThese were already suggested in [13] to obtain their low \u21b5-regret algorithms. We note that in the\nappendix we describe in more detail the approach of [13] and discuss its shortcomings in obtaining\noracle-ef\ufb01cient algorithms.\n\n2.3.1 The extended approximation oracle\nAs discussed, a key dif\ufb01culty of our setting that prevents us from directly applying well studied\nalgorithms for OLO, is that essentially all standard algorithms require to exactly solve (or up to\narbitrarily small error) some linear/convex optimization problem over the convexi\ufb01cation of the\nfeasible set CH(K). However, not only that our approximation oracle OK(\u00b7) cannot perform exact\nminimization, even for \u21b5 = 1 it is applicable only with inputs in Rd\n+, and hence cannot optimize in\nall directions. A natural approach, suggested in [13], to overcome the approximation error of the\noracle OK(\u00b7), is to consider optimization with respect to the convex set CH(\u21b5K) (i.e. convex hull\nof all points in K scaled by a factor of \u21b5) instead of CH(K). Indeed, if we consider for instance\nthe case \u21b5  1, it is straightforward to see that for any c 2 Rd\n+, OK(c) \u00b7 c \uf8ff \u21b5 minx2K x \u00b7 c =\n\n4\n\n\f+, although the oracle returns points in the original set K.\n\n\u21b5 minx2CH(K) x \u00b7 c = minx2CH(\u21b5K) x \u00b7 c. Thus, in a certain sense, OK(\u00b7) can optimize with respect\nto CH(\u21b5K) for all directions in Rd\nThe following lemma shows that one can easily extend the oracle OK(\u00b7) to optimize with respect to\nall directions in Rd.\nLemma 2.1 (Extended approximation oracle). Given c 2 Rd write c = c+ + c where c+ equals\nto c on all non-negative coordinates of c and zero everywhere else, and c equals c on all negative\ncoordinates and zero everywhere else. The extended approximation oracle is a mapping \u02c6OK : Rd !\n(K + B(0, (1 + \u21b5)R), K) de\ufb01ned as:\n\n\u02c6OK(c) = (v, s) :=\u21e2 (OK(c+)  \u21b5R\u00afc, OK(c+))\n(OK(c)  R\u00afc+, OK(c))\n\nif \u21b5  1;\nif \u21b5< 1,\n\n(2)\n\nwhere for any vector v 2 Rd we denote \u00afv = v/kvk if kvk > 0 and \u00afv = 0 otherwise, and it satis\ufb01es\nthe following three properties:\n1. v \u00b7 c \uf8ff minx2\u21b5K x \u00b7 c\n2. 8f 2F : s \u00b7 f \uf8ff v \u00b7 f if \u21b5  1 and s \u00b7 f  v \u00b7 f if \u21b5< 1\n3. kvk \uf8ff (\u21b5 + 2)R\n\nThe proof is given in the appendix for completeness.\nIt is important to note that while the extended oracle provides solutions with values at least as low as\nany point in CH(\u21b5K), still in general the output point v need not be in either K or CH(\u21b5K), which\nmeans that it is not a feasible point to play in our OLO setting, nor does it allow us to optimize\nover CH(\u21b5K). This is why we also need the oracle to output the feasible point s 2K which\ndominates v for any possible loss/payoff vector in F. While we will use the outputs v to solve a\ncertain optimization problem involving CH(\u21b5K), this dominance relation will be used to convert the\nsolutions to these optimization problems into feasible plays for our OLO algorithms.\n\n2.3.2 Online gradient descent with and without feasibility\nAs in [13], our online algorithms will be based on the well known Online Gradient Descent method\n(OGD) for online convex optimization, originally due to [17]. For a sequence of loss vectors\n{f1, ..., fT}\u21e2 Rd OGD produces a sequence of plays {x1, ..., xT}\u21e2S , for a convex and compact\nset S\u21e2 Rd via the following updates: 8t  1 : yt+1 xt\u2318ft, xt+1 arg minx2S kx  yt+1k2,\nwhere x1 is initialized to some arbitrary point in S and \u2318 is some pre-determined step-size. The\nobvious dif\ufb01culty in applying OGD to online linear optimization over S = CH(\u21b5K) is the step of\ncomputing xt+1 by projecting yt+1 onto the feasible set S, since as discussed, even with the extended\napproximation oracle, one cannot exactly optimize over CH(\u21b5K). Instead we will consider a variant\nof OGD which may produce infeasible points, i.e., outside of S, but which guarantees low regret\nwith respect to any point in S. This algorithm, which we refer to as online gradient descent without\nfeasibility, is given below (Algorithm 1).\n\nAlgorithm 1 Online Gradient Descent Without Feasibility\n1: input: learning rate \u2318> 0\n2: x1 some point in S\n3: for t = 1 . . . T do\nplay xt and receive loss/payoff vector ft 2 Rd\n4:\nyt+1 \u21e2 xt  \u2318ft\n\ufb01nd xt+1 2 Rd such that\n\nfor losses\nfor payoffs\n\nxt + \u2318ft\n\n5:\n\n6:\n\n7: end for\n\n8z 2S :\n\nkz  xt+1k2 \uf8ff kz  yt+1k2\n\n(3)\n\nLemma 2.2. [Online gradient descent without feasibility] Fix \u2318> 0. Suppose Algorithm 1 is applied\nfor T rounds and let {ft}T\nt=1 \u21e2 Rd be the sequence of observed loss/payoff vectors, and let {xt}T\nt=1\n\n5\n\n\fbe the sequence of points played by the algorithm. Then for any x 2S it holds that\n\n2T PT\n2T PT\n\nt=1 kftk2\nt=1 kftk2\n\nfor losses;\n\nfor payoffs.\n\n1\n\nT PT\nT PT\n\nT PT\nt=1 x \u00b7 ft \uf8ff 1\nt=1 xt \u00b7 ft  1\nT PT\nt=1 xt \u00b7 ft \uf8ff 1\nt=1 x \u00b7 ft  1\n\n1\n\n2T\u2318 kx1  xk2 + \u2318\n2T\u2318 kx1  xk2 + \u2318\n\nThe proof is given in the appendix for completeness.\n\n3 Oracle-ef\ufb01cient Computation of (infeasible) Projections onto CH(\u21b5K)\nIn this section we detail our main technical tool for obtaining oracle-ef\ufb01cient online algorithms,\ni.e., our algorithm for computing projections, in the sense of Eq. (3), onto the convex set CH(\u21b5K).\nBefore presenting our projection algorithm, Algorithm 2 and detailing its theoretical guarantees,\nwe \ufb01rst present the main algorithmic building block in the algorithm, which is described in the\nfollowing lemma. Lemma 3.1 shows that for any point x 2 Rd, we can either \ufb01nd a near-by point\np which is a convex combination of points outputted by the extended approximation oracle (and\nhence, p is dominated by a convex combination of feasible points in K for any vector in F, as\ndiscussed in Section 2.3.1), or we can \ufb01nd a separating hyperplane that separates x from CH(\u21b5K)\nwith suf\ufb01ciently large margin. We achieve this by running the well known Ellipsoid method [10, 5]\nin a very specialized way. This application of the Ellipsoid method is similar in spirit to those in\n[15, 16], which applied this idea to computing correlated equilibrium in games and algorithmic\nmechanism design, though the implementation details and the way in which we apply this technique\nare quite different.\nThe proof of the following lemma is given in the appendix.\nLemma 3.1 (Separation-or-Decomposition via the Ellipsoid method). Fix x 2 Rd, \u270f 2\n\u2318, where c is a positive univer-\n(0, (\u21b5 + 2)R], and a positive integer N  cd2 ln\u21e3 (\u21b5+1)R+kxk\nsal constant. Consider an attempt to apply the Ellipsoid method for N iterations to the following\nfeasibility problem:\n\n\u270f\n\nkwk \uf8ff 1,\nsuch that each iteration of the Ellipsoid method applies the following consecutive steps:\n\n\ufb01nd w 2 Rd such that:\n\n(x  z) \u00b7 w  \u270f\n\n8z 2 \u21b5K :\n\nand\n\n(4)\n\nseparating hyperplane for the Ellipsoid method and continue to to the next iteration\n\n1. (v, s) \u02c6OK(w), where w is the current iterate. If (x  v) \u00b7 w <\u270f , use v  x as a\n2. if kwk > 1, use w as a separating hyperplane for the Ellipsoid method and continue to the\n\nnext iteration\n\n3. otherwise (kwk \uf8ff 1 and (x  v) \u00b7 w  \u270f), declare Problem (4) feasible and return the\n\nvector w.\n\n2\n\n1\n\nmin\n\n(a1,...,aN )\n\nThen, if the Ellipsoid method terminates declaring Problem 4 feasible, the returned vector w is a\nfeasible solution to Problem (4). Otherwise (the Ellipsoid method completes N iterations without\ndeclaring Problem (4) feasible), let (v1, s1), ..., (vN , sN ) be the outputs of the extended approxima-\ntion oracle gathered throughout the run of the algorithm, and let (a1, ..., aN ) be an optimal solution\nto the following convex optimization problem:\n\naivi  x\n\nsuch that 8i 2{ 1, ..., N} : ai  0,\n\n2\nNXi=1\nThen the point p =PN\nWe are now ready to present our algorithm for computing projections onto CH(\u21b5K) (in the sense of\nEq. (3)). Consider now an attempt to project a point y 2 Rd, and note that in particular, y itself is a\nvalid projection (again, in the sense of Eq. (3)), however, in general, it is not a feasible point nor is\nit dominated by a convex combination of feasible points. When attempting to project y 2 Rd, our\nalgorithm continuously applies the separation-or-decomposition procedure described in Lemma 3.1.\n\ni=1 aivi satis\ufb01es kx  pk \uf8ff 3\u270f.\n\nNXi=1\n\nai = 1.\n\n(5)\n\n6\n\n\fIn case the procedure returns a decomposition, then by Lemma 3.1, we have a point that is suf\ufb01ciently\nclose to y and is dominated for any vector in F by a convex combination (given explicitly) of feasible\npoints in K. Otherwise, the procedure returns a separating hyperplane which can be used to to \u201cpull\ny closer\" to CH(\u21b5K) in a way that the resulting point still satis\ufb01es the projection inequality given in\nEq. (3), and the process then repeats itself. Since each time we obtain a hyperplane separating our\ncurrent iterate from CH(\u21b5K), we pull the current iterate suf\ufb01ciently towards CH(\u21b5K), this process\nmust terminate. Lemma 3.2 gives exact bounds on the performance of the algorithm.\n\nelse\n\ncall the SEPARATION-OR-DECOMPOSTION procedure (Lemma 3.1) with parameters (\u02dcy,\u270f )\nif the procedure outputs a separating hyperplane w then\n\n\u02dcy \u02dcy  \u270fw\nlet (a1, ..., aN ), {(v1, s1), ..., (vN , sN )} be the decomposition returned\nreturn \u02dcy, (a1, ..., aN ), {(v1, s1), ..., (vN , sN )}\n\nAlgorithm 2 (infeasible) Projection onto CH(\u21b5K)\n1: input: point y 2 Rd, tolerance \u270f> 0\n2: \u02dcy y/ max{1, kyk/(\u21b5R)}\n3: for t = 1 . . . do\n4:\n5:\n6:\n7:\n8:\n9:\nend if\n10:\n11: end for\nLemma 3.2. Fix y 2 Rd and \u270f 2 (0, (\u21b5 + 2)R]. Algorithm 2 terminates after at most d\u21b52R2/\u270f2e\niterations, returning a point \u02dcy 2 Rd, a distribution (a1, ..., aN ) and a set {(v1, s1), ..., (vN , sN )}\noutputted by the extended approximation oracle, where N is as de\ufb01ned in Lemma 3.1, such that\n1. 8z 2 CH(\u21b5K) :\nMoreover, if the for loop was entered a total number of k times, then the \ufb01nal value of \u02dcy satis\ufb01es\n\nfor p := Xi2[N ]\n\nk\u02dcy  zk2 \uf8ff ky  zk2 ,\n\n2.\n\nkp  \u02dcyk \uf8ff 3\u270f\n\naivi.\n\ndist2(\u02dcy, CH(\u21b5K)) \uf8ff min{2\u21b52R2, dist2(y, CH(\u21b5K))  (k  1)\u270f2},\n\nand the overall number of queries to the approximation oracle is Okd2 ln ((\u21b5 + 1)R/\u270f).\n\nIt is important to note that the worst case iteration bound in Lemma 3.2 does not seem so appealing\nfor our purposes, since it depends polynomially on 1/\u270f, and in our online algorithms naturally we\nwill need to take \u270f = O(T c) for some c > 0, which seems to contradict our goal of achieving\npoly-logarithmic in T oracle complexity, at least on average. However, as Lemma 3.2 shows, the\nmore iterations Algorithm 2 performs, the closer it brings its \ufb01nal iterate to the set CH(\u21b5K). Thus, as\nwe will show when analyzing the oracle complexity of our online algorithms, while a single call to\nAlgorithm 2 can be expensive, when calling it sequentially, where each input is a small perturbation\nof the output of the previous call, the average number of iterations performed per such call cannot be\ntoo high.\n\n4 Ef\ufb01cient Algorithms for the Full Information and Bandit Settings\n\nWe now turn to present our online algorithms for the full-information and bandit settings together\nwith their regret bounds and oracle-complexity guarantees.\n\n4.1 Algorithm for the full information setting\nOur algorithm for the full-information setting, Algorithm 3, is given below.\nTheorem 4.1. [Main Theorem] Fix \u2318> 0,\u270f 2 (0, (\u21b5 + 2)R]. Suppose Algorithm 3 is applied for T\nrounds and let {ft}T\nt=1 be the\nsequence of points played by the algorithm. Then it holds that\n\nt=1 \u2713F be the sequence of observed loss/payoff vectors, and let {st}T\n\nand the average number of calls to the approximation oracle of K per iteration is upper bounded by\n\nE\u21e5\u21b5  regret{(st, ft)}t2[T ]\u21e4 \uf8ff \u21b52R2T 1\u23181 + \u2318F 2/2 + 3F\u270f,\nK(\u2318, \u270f) := O1 +\u2318\u21b5RF + \u23182F 2 \u270f2 d2 ln ((\u21b5 + 1)R/\u270f) .\n\n7\n\n\fAlgorithm 3 Online Gradient Descent with Infeasible Projections onto CH(\u21b5K)\n1: input: learning rate \u2318> 0, projection error parameter \u270f> 0\n2: s1 some point in K, \u02dcy1 \u21b5s1\n3: for t = 1 . . . T do\n4:\n\n5:\n\n6:\n\n\u02dcyt + \u2318ft\n\nplay st and receive loss/payoff vector ft 2F\nyt+1 \u21e2 \u02dcyt  \u2318ft\ncall Algorithm 2 with inputs (yt+1,\u270f ) to obtain an approximated projection \u02dcyt+1, a distribution\n(a1, ..., aN ) and {(v1, s1), ..., (vN , sN )}\u2713 Rd \u21e5K , for some N 2 N.\nsample st+1 2{ s1, ..., sN} according to distribution (a1, ..., aN )\n\nif \u21b5  1\nif \u21b5< 1\n\n7:\n8: end for\n\nIn particular, setting \u2318 = \u21b5RT 2/3/F , \u270f = \u21b5RT 1/3 gives E [\u21b5  regret] = O\u21b5RF T 1/3,\nK = Od2 ln \u21b5+1\n\u21b5 T. Alternatively, setting \u2318 = \u21b5RT 1/2/F , \u270f = \u21b5RT 1/2 gives\nE [\u21b5  regret] = O\u21b5RF T 1/2, K = O\u21e3pT d2 ln \u21b5+1\n\n\u21b5 T\u2318.\n\nThe proof is given in the appendix.\n\n4.2 Algorithm for the bandit information setting\n\ni=1 qiq>i\n\nOur algorithm for the bandit setting follows from a very well known reduction from the bandit setting\nto the full information setting, also applied in the bandit algorithm of [13]. The algorithm simply\nsimulates the full information algorithm, Algorithm 3, by providing it with estimated loss/payoff\nvectors \u02c6f1, ..., \u02c6fT instead of the true vectors f1, ..., fT which are not available in the bandit setting.\nThis reduction is based on the use of a Barycentric Spanner (de\ufb01ned next) for the feasible set K.\nAs standard, we assume the points in K span the entire space Rd, otherwise we can reformulate the\nproblem in a lower-dimensional space, in which this assumption holds.\nDe\ufb01nition 4.1 (Barycentric Spanner3). We say that a set of d vectors {q1, ..., qd}\u21e2 Rd is a\nBarycentric Spanner with parameter > 0 for a set S\u21e2 Rd, denoted by -BS(S), if it holds that\n{q1, ..., qd}\u21e2S , and the matrix Q :=Pd\nImportantly, as discussed in [13], the assumption on the availability of such a set -BS(K) seems\nreasonable, since i) for many sets that correspond to the set of all possible solutions to some well-\nstudied NP-Hard optimization problem, one can still construct in poly(d) time a barycentric spanner\nwith  = poly(d), ii) -BS(K) needs to be constructed only once and then stored in memory (overall\nd vectors in Rd), and hence its construction can be viewed as a pre-processing step, and iii) as\nillustrated in [13], without further assumptions, the approximation oracle by itself is not suf\ufb01cient to\nguarantee nontrivial regret bounds in the bandit setting.\nThe algorithm and the proof of the following theorem are given in the appendix.\nTheorem 4.2. Fix \u2318> 0,\u270f 2 (0, (\u21b5 + 2)R], 2 (0, 1). Suppose Algorithm 5 is applied for T\nrounds and let {ft}T\nt=1 be the\nsequence of points played by the algorithm. Then it holds that\n\nis not singular and maxi2[d] kQ1qik \uf8ff .\n\nt=1 \u2713F be the sequence of observed loss/payoff vectors, and let {\u02c6st}T\nE\u21e5\u21b5  regret{(\u02c6st, ft)}t2[T ]\u21e4 \uf8ff \u21b52R2\u23181T 1 + \u2318d2C221/2 + 3\u270fF + C,\nE [K(\u2318, \u270f,  )] := O1 +\u2318\u21b5dCR + (\u2318dC )2/ \u270f2 d2 ln ((\u21b5 + 1)R/\u270f) .\n\nand the expected number of calls to the approximation oracle of K per iteration is upper bounded by\n\ndC T 2/3, \u270f = \u21b5RT 1/3,  = T 1/3 gives E [\u21b5  regret] =\n\nIn particular, setting \u2318 = \u21b5R\n\nO(\u21b5dCR + \u21b5RF + C)T 1/3, E[K] = Od2 ln \u21b5+1\n\n\u21b5 T.\n\n3this de\ufb01nition is somewhat different than the classical one given in [2], however it is equivalent to a\n\nC-approximate barycentric spanner [2], with an appropriately chosen constant C().\n\n8\n\n\fReferences\n[1] Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An ef\ufb01cient algorithm for\n\nbandit linear optimization. In COLT, pages 263\u2013274, 2008.\n\n[2] Baruch Awerbuch and Robert D Kleinberg. Adaptive routing with end-to-end feedback: Distributed\nlearning and geometric approaches. In Proceedings of the thirty-sixth annual ACM symposium on Theory\nof computing, pages 45\u201353. ACM, 2004.\n\n[3] Maria-Florina Balcan and Avrim Blum. Approximation algorithms and online mechanisms for item pricing.\n\nIn Proceedings of the 7th ACM Conference on Electronic Commerce, pages 29\u201335. ACM, 2006.\n\n[4] Reuven Bar-Yehuda and Shimon Even. A linear-time approximation algorithm for the weighted vertex\n\ncover problem. Journal of Algorithms, 2(2):198\u2013203, 1981.\n\n[5] S\u00e9bastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends R in\n\nMachine Learning, 8(3-4):231\u2013357, 2015.\n\n[6] Nicos Christo\ufb01des. Worst-case analysis of a new heuristic for the travelling salesman problem. Technical\n\nreport, DTIC Document, 1976.\n\n[7] V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research,\n\n4(3):233\u2013235, 1979.\n\n[8] Takahiro Fujita, Kohei Hatano, and Eiji Takimoto. Combinatorial online prediction via metarounding. In\n\nALT, pages 68\u201382. Springer, 2013.\n\n[9] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and\nsatis\ufb01ability problems using semide\ufb01nite programming. Journal of the ACM (JACM), 42(6):1115\u20131145,\n1995.\n\n[10] M. Gr\u00f6tschel, L. Lov\u00e1sz, and A. Schrijver. The ellipsoid method and its consequences in combinatorial\n\noptimization. Combinatorica, 1(2):169\u2013197, 1981.\n\n[11] Elad Hazan, Zohar Shay Karnin, and Raghu Meka. Volumetric spanners: an ef\ufb01cient exploration basis for\n\nlearning. In COLT, volume 35, pages 408\u2013422, 2014.\n\n[12] Elad Hazan and Haipeng Luo. Variance-reduced and projection-free stochastic optimization. In Proceedings\nof the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June\n19-24, 2016, pages 1263\u20131271, 2016.\n\n[13] Sham M. Kakade, Adam Tauman Kalai, and Katrina Ligett. Playing games with approximation algorithms.\n\nSIAM J. Comput., 39(3):1088\u20131106, 2009.\n\n[14] Adam Kalai and Santosh Vempala. Ef\ufb01cient algorithms for online decision problems. Journal of Computer\n\nand System Sciences, 71(3):291\u2013307, 2005.\n\n[15] Christos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games.\n\nJournal of the ACM (JACM), 55(3):14, 2008.\n\n[16] S Matthew Weinberg. Algorithms for strategic agents. PhD thesis, Massachusetts Institute of Technology,\n\n2014.\n\n[17] Martin Zinkevich. Online convex programming and generalized in\ufb01nitesimal gradient ascent. In Machine\nLearning, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003,\nWashington, DC, USA, pages 928\u2013936, 2003.\n\n9\n\n\f", "award": [], "sourceid": 434, "authors": [{"given_name": "Dan", "family_name": "Garber", "institution": "Technion - Israel Institute of Technology"}]}