NIPS Proceedingsβ

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Part of: Advances in Neural Information Processing Systems 24 (NIPS 2011)

[PDF] [BibTeX]



We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar \(\alpha\). We prove that the regret of \newtron is \(O(\log T)\) when \(\alpha\) is a constant that does not vary with horizon \(T\), and at most \(O(T^{2/3})\) if \(\alpha\) is allowed to increase to infinity with \(T\). For \(\alpha\) = \(O(\log T)\), the regret is bounded by \(O(\sqrt{T})\), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when \(\alpha\) is a small constant.