Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewers are lukewarm, after a discussion. Pros: achieving full-feedback-like regret under only partial feedback and some regularity assumptions is intellectually important. This result also forms an interesting counterpoint with [SALS]. Cons: insufficient motivation for the feedback model and for the kernel assumption. Also, the analysis appears fairly standard. [SALS] Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. NIPS 2015. --------------- Some additional points: The kernel assumption is a very strong assumption which has been used in a line of prior work on "Gaussian Process bandits" as a way to formalize the idea that "similar arms bring similar rewards". AFAIK, this work does not claim that this assumption has much to do with reality. Instead, they claim it is productive, in the sense that their algorithms work well in some semi-idealized experiments, where the data is *not* generated according to their model. It would serve the authors to explain the novelty of their technical approach. The main technical idea seems to be, to plug in UCBs into an EXP3-like algorithm, and analyze this algorithm by decomposing regret into the terms coming from the UCB estimators and the terms coming from multiplicative weights. However, these ideas have been used in several papers, and trace back to EXP3.P algorithm (the high-confidence version of EXP3 in the original paper). The application to "robust Bayesian optimization" could use some more motivation. To fit it into the main model, the impact of the "environment" is modeled as observable action delta_t such that the reward function is smooth in delta_t (in the sense of the kernel assumption). But it is really unclear whether/when anything like this happens. Also, only one paper is cited, so (smoothness assumption aside) it is unclear if this setting is standard / well-studied.