Tracking the Best Expert in Non-stationary Stochastic Environments

Wei, Chen-Yu; Hong, Yi-Te; Lu, Chi-Jen

Tracking the Best Expert in Non-stationary Stochastic Environments

Part of Advances in Neural Information Processing Systems 29 (NIPS 2016)

Bibtex Metadata Paper Reviews Supplemental

Authors

Chen-Yu Wei, Yi-Te Hong, Chi-Jen Lu

Abstract

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter $\W$ , which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and study how this amount affects the regret. We investigate the interaction between $\W$ and $\Gamma$ , which counts the number of times the distributions change, as well as $\W$ and $V$ , which measures how far the distributions deviates over time. One striking result we find is that even when $\Gamma$ , $V$ , and $\Lambda$ are all restricted to constant, the regret lower bound in the bandit setting still grows with $T$ . The other highlight is that in the full-information setting, a constant regret becomes achievable with constant $\Gamma$ and $\Lambda$ , as it can be made independent of $T$ , while with constant $V$ and $\Lambda$ , the regret still has a $T^{1/3}$ dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

Tracking the Best Expert in Non-stationary Stochastic Environments

Authors

Abstract

Name Change Policy