NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5689 Sequential Experimental Design for Transductive Linear Bandits

### Reviewer 1

The paper introduces a natural yet interesting extension to the pure exploration in linear stochastic bandits, called pure exploration in transductive linear bandits. This extension is well motivated through various examples in the introduction, which makes the paper pleasant to read. Overall the paper is very well written and is a nice read. It is very well organized, and almost error free. The studied problem is presented precisely and clearly. Moreover, the steps towards the design of the algorithm are described and motivated very clearly. In addition, the numerical experiments are explained and detailed very well. All these make the paper a nice read. From the technical side, a positive aspect of the paper is that it provides a complete picture of the introduced problem: the problem-dependent lower bound on the sample complexity, as well as an algorithm whose performance matches the lower bound (up to logarithmic factors). I read the proofs (without checking all the details), and believe they are flawless and correct. Detailed comments. 1. As far as I understood, both lower and upper bounds follow straightforwardly from by-now-standard proof techniques from the previous analyses for linear bandits in the literature. Hence, despite the clear presentation of the paper and matching upper and lower bounds provided, given the very high standard of NeurIPS, I think the aforementioned contributions do not make the paper strong enough to pass the acceptance threshold. Therefore, I may ask the authors to highlight the main non-trivial steps in their proofs, which makes the proof challenging/or would call for new technical tools compared to existing works. 2. What is the effect of choosing d odd in Proposition 1? 3. Line 191: … Algorithm 1 matches the lower bound … --> the sample complexity of Algorithm 1 matches … --- Updates after rebuttal --- I have gone through the authors' response. I appreciate their response to my comment. My impression has been improved, and I increase my score to 6. Still I think the described contribution does not strongly justfiy publication at NeurIPS. Additional comment: Line 233: where $i$ is ... ==> I think you meant "where $\Delta_i$ is ..."

### Reviewer 2

The paper proposes the general problem setting of the pure exploration for transductive linear bandits, in which a set of arms in X could be used for exploration, and the reward of an arm x (a d-dimensional vector) is the dot-product of x with an unknown latent vector \theta^*, and the reward is observed after playing arm x. The goal, however, is to find the optimal arm z from a different set of arms Z, with the same reward definition. The authors motivate the problem from applications such as drug discovery, recommendations etc. The formulation generalizes the linear bandit case (where X = Z) and the combinatorial bandit case, where X is the set of d standard basis vectors. The problem looks interesting and useful. The authors design an algorithm SAGE, which combines existing techniques such as gap elimination and rounding procedure. The authors claim that the algorithm is better than existing algorithms, for example comparing against ALBA of [28], they claim that ALBA does not compute optimal allocation based on differences but on arms in X directly, which is not as efficient. The authors provide a sample complexity upper bound for SAGE and a corresponding lower bound theorem. The upper bound looks complicated, but the authors do try to provide interpretations of the upper bound result. The authors conduct experiments comparing SAGE with several linear bandit algorithms. The results show that SAGE is competitive, and has advantages in various settings. Overall, the paper provides a nice and useful general setting of the transductive linear bandit problem, and provide both analytical and empirical results to the problem. The paper overall is well organized and well written, with balanced intuitive explanations and technical results. I think the paper is worth to be published. However, there are still some (minor) issues, as listed below. - The paper never formally define what is an arm. In some papers on linear bandit, an arm is referred to as one dimension in the action vector, while in others an arm is the vector itself. Thus I believe a clarification is needed here. I believe in this paper an arm is an vector, but there are two sets of vectors X and Z. Are vectors in both sets called arms? This actually leads to another issue below. - Line 82, $\sum_{x\in X} (\lambda_x x x^\top)^{-1}$ first appears in this line, but there is no explanation about $\lambda_x x x^\top$. For example, is it full-rank? What if it is not? I only find a relevant explanation in Footnote 2 two pages later. Please clarify this point up front. - Algorithm 1. a) The computation of \lambda^*_t needs more explanation. How efficient is it? Only in the experimental section the authors mention that they use Frank-Wolfe algorithm to find \lambda^*_t. I hope more explanation could be given in the algorithm section. b) What is n_\min? Should it be r(\epsilon)? c) What is \lambda^* in the ROUND procedure? Should it be \lambda^*_t? - Experiments. a) The transductive setting. The setting used is actually the pure exploration for combinatorial bandit setting. Then baselines for this setting should be compared. Moreover, a separate setting for the general transductive setting (neither the linear bandit setting nor the combinatorial bandit setting) should be tested. b) The sub-Gaussian noise setting is not mentioned. What is the parameter used? - Proof of Theorem 1 in the supplementary material. In line 488, it explicitly defines \nu_{\theta,i} as the normal distribution with mean z_i^{\top} \theta and variance 1. However, in the formula after line 497, x_i^{\top} appears, which really confuses me. Should this x_i be z_i, or the other way round --- the z_i mentioned above should be x_i? Another indication is that the summations in the previous formulas are all from 1 to n, suggesting that this is going through vectors in X, not in Z. Please clarify this. This is also related to the first issue I listed. For example, in line 487, arm i is mentioned, but is it arm i in X or Z? The differentiation between X and Z is crucial in the transductive setting, but it seems the differentiation is unclear in the proof of Theorem 1. Please check this differention in all the proofs, since I did not check all the details of all the proofs.