Part of Advances in Neural Information Processing Systems 30 (NIPS 2017)
Dan Garber
We revisit the problem of Online Linear Optimization in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor $\alpha$ multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied offline linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the $\alpha$-regret which is the natural extension of the standard regret in online learning to this setting. We present new algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem. Mainly, for both variants, we present $\alpha$-regret bounds of $O(T^{-1/3})$, were $T$ is the number of prediction rounds, using only $O(\log(T))$ calls to the approximation oracle per iteration, on average. These are the first results to obtain both average oracle complexity of $O(\log(T))$ (or even poly-logarithmic in $T$) and $\alpha$-regret bound $O(T^{-c})$ for a positive constant $c$, for both variants.