Beyond Bandit Feedback in Online Multiclass Classification

van der Hoeven, Dirk; Fusco, Federico; Cesa-Bianchi, Nicolò

Beyond Bandit Feedback in Online Multiclass Classification

Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Authors

Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi

Abstract

We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification.We introduce \textproc{Gappletron}, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm,we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order $B\sqrt{\rho KT}$, where $B$ is the diameter of the prediction space, $K$ is the number of classes, $T$ is the time horizon, and $\rho$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that \textproc{Gappletron} achieves a constant surrogate regret of order $B^2K$. We also prove a general lower bound of order $\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines.

Beyond Bandit Feedback in Online Multiclass Classification

Authors

Abstract

Name Change Policy