# NIPS Proceedingsβ

## Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

[PDF] [BibTeX]

### Abstract

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar $$\alpha$$. We prove that the regret of \newtron is $$O(\log T)$$ when $$\alpha$$ is a constant that does not vary with horizon $$T$$, and at most $$O(T^{2/3})$$ if $$\alpha$$ is allowed to increase to infinity with $$T$$. For $$\alpha$$ = $$O(\log T)$$, the regret is bounded by $$O(\sqrt{T})$$, thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when $$\alpha$$ is a small constant.