Optimistic Bandit Convex Optimization

Part of Advances in Neural Information Processing Systems 29 (NIPS 2016)

Bibtex Metadata Paper Reviews Supplemental


Scott Yang, Mehryar Mohri


We introduce the general and powerful scheme of predicting information re-use in optimization algorithms. This allows us to devise a computationally efficient algorithm for bandit convex optimization with new state-of-the-art guarantees for both Lipschitz loss functions and loss functions with Lipschitz gradients. This is the first algorithm admitting both a polynomial time complexity and a regret that is polynomial in the dimension of the action space that improves upon the original regret bound for Lipschitz loss functions, achieving a regret of $\widetilde O(T^{11/16}d^{3/8})$. Our algorithm further improves upon the best existing polynomial-in-dimension bound (both computationally and in terms of regret) for loss functions with Lipschitz gradients, achieving a regret of $\widetilde O(T^{8/13} d^{5/3})$.