Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

*Michael Kapralov, Rina Panigrahy*

Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say `predict 0' or `predict 1', and our payoff is $+1$ if the prediction is correct and $-1$ otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting $0$ or always predicting $1$. For a sequence of length $T$ our algorithm has regret $14\epsilon T $ and loss $2\sqrt{T}e^{-\epsilon^2 T} $ in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of $N$ experts, where the related problem of trading off regret to the best expert for regret to the 'special' expert has been studied by Even-Dar et al. (COLT'07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al (COLT'07) and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to $k$-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for {\em any window of size $n$} the regret of our algorithm to any expert never exceeds $O(\sqrt{n(\log N+\log T)})$, where $N$ is the number of experts and $T$ is the time horizon, while maintaining the essentially zero loss property.

Do not remove: This comment is monitored to verify that the site is working properly