NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5909
Title:MaxGap Bandit: Adaptive Algorithms for Approximate Ranking

Reviewer 1

This paper introduces a new bandit problem where the goal is to find the largest gap between any two adjacent arms. It is framed similarly as the best-arm identification problem in the sense that the goal of the player aims to identify the max gap between two adjacent arms using as less samples as possible. The authors provide a lower bound on sample complexity for that setting. They also propose elimination and UCB-like algorithms for which they provide sample complexity analysis matching the lower bound. Strengths: [Originality] * The tackled setting is a new problem. * The algorithms are based on known bandit-strategies, but their application for solving the problem is new. [Quality] * The authors provide a lower bound on sample complexity for their new setting and matching upper bounds for the proposed algorithms. * The simulated experiment is useful for showing the sample complexity improvement of proposed algorithms compared with uniform sampling. [Clarity] * The authors provide a lot of intuition on their theoretical results, along with examples to understand their definitions and proofs. [Significance] * It is interesting how the authors transform the confidence intervals on arm means into confidence intervals on arm gaps. These theoretical perspectives (and Alg. 4) could be useful in other problems. Weaknesses: [Clarity] * What is the value of the c constant (MaxGapUCB algorithm) used in experiments? How was it determined? How does it impact the performance of MaxGapUCB? * The experiment results could be discussed more. For example, should we conclude from the Streetview experiment that MaxGapTop2UCB is better than the other ones? [Significance] * The real-world applications of this new problem setting are not clear. The authors mention applicability to sorting/ranking. It seems like this would require a recursive application of proposed algorithms to recover partial ordering. However, the procedure to find the upper bounds on gaps (Alg. 4) has complexity K^2, where K is the number of arms. How would that translate in computational complexity when solving a ranking problem? Minor details: * T_a(t) is used in Section 3.1, but only defined in Section 4. * The placement of Figure 2 is confusing. --------------------------------------------------------------------------- I have read the rebuttal. Though the theoretical contribution seems rather low given existing work on pure exploration, the authors have convinced me of the potential impacts of this work.

Reviewer 2

This problem studies a novel variation of pure-exploration in multi-armed bandits, where the objective is to discover the pair of adjacent arms with the maximum gap of expectations. The authors provide elimination and UCB-based algorithms and show their upper bounds. They also provide a minimax lower bound to show the optimality of the proposed algorithms. They conduct experiments on both synthetic and real world data sets to verify the effectiveness of their algorithms. The problem studied in this paper is a novel variation of MAB problem and has some meaningful real-world applications. It provides a preliminary theoretical foundation to efficient learning-to-rank tasks as well as approximate query processing tasks. The unique challenge in this problem is the unknown order of arms, which introduces a non-trivial twist that differs the problem from traditional top arm identification. On the other hand, the output only needs to find the maximum gap, which is easier than figuring out the full rank of all the arms. The proposed algorithms are presented in a clear way and seem quite intuitive. The algorithms generally maintain the confidence intervals of gaps between arms, as well as an active set of arms which are possibly related to the max gap. The arms in the active set will be sampled and played. The upper bound of the proposed algorithms is related to how close “the gap of a pair of arms” is to “the max gap”, as well as the gap between a pair of arms per se. The authors also provide an intuitive explanation of the upper bound. A more practical question here is that if there are multiple pairs of arms with similar gap values, the algorithm still needs to take a lot of samples to figure out. The authors may consider a PAC-setting to prevent this situation. I have minor concern on the experiment settings. The theoretical results presented are obtained in a fixed-confidence setting. However, the experiments shown seem to be evaluating performance in a fixed-budget setting. Is there a specific reason why a fixed-confidence experiment is not conducted? Another concern is: do authors have more real-world data set test cases other than the Borda one? Currently the authors primarily run experiments on only one single test case in real-world data. However, it would be interesting to see 1) what would be the hardness distribution in a real-world data set. 2) does the algorithm performance align well with the hardness value. Generally, I think the paper studies an interesting novel problem setting. The paper is well-written and provides a solid foundation for future studies. There are some minor issues with experiments, but overall the paper would still be a valuable contribution to the community. %===After rebuttal===% Thanks for the authors for their response. I think the figure illustrating hardness and #samples is useful and could be included in the paper. I remain my relatively positive opinion.

Reviewer 3

This paper considers a variant of pure-exploration bandit problems where the agent tries to divide the arms into two sets such that the highest mean of one set is smaller than the lowest mean of the other set, and the gap between these means is maximized. Algorithms based on elimination and UCB (with/without LCB) are proposed, and their high-probability sample complexity bounds are derived with a discussion on the lower bound in the fixed-confidence scenario. Simulation results are given for synthetic setting and a one based on a real-world dataset. One of the main concerns is that the motivation is quite unclear. In the proposed framework, a very large number of samples is required if the largest and the second largest gaps are close to each other, as illustrated in the discussion on the lower bound, even though the objective of the framework is just clustering. Therefore there should be detailed discussion on in what situation such an exact clustering becomes necessary. For example, in the simulation for the Streetview dataset, the resulting “correct” clusters seem quite unbalanced and I don’t understand why finding such clusters is essential. For the same reason, the discussion “Note that finding the safest image (best-arm identification) is hard as we need a lot of human responses to decide the larger mean between the two rightmost distributions” seems quite inappropriate since it seems to needs a lot of human responses to decide the larger gap between the highest and the second highest ones. Another concern is that the techniques used in this paper in the evaluation of the sample complexity seem to be a naive application of the original analysis of the base algorithms, successive elimination and LUCB algorithms. As a result, the derived bound is only order-optimal (with respect to Delta) and does not evaluate the expected number of samples even though the lower bound is given for the expected number of samples. Furthermore, the sample complexity bound is only correct with probability 1-delta, where delta is pre-specified by the required confidence level and cannot be controlled in the evaluation. There should be more modern analysis on the lower bound and matching upper bound following the line of work by Kaufmann for best arm identification, techniques of which seem also applicable to this setting. Section 1.1: The considered model must be clarified (Bernoulli, Normal or sub-Gaussian,...) to discuss confidence bounds. Algorithm 2: There seems to be no explanation on the choice of c, even though the algorithm with small c cannot be correct. Experiments: It is strange that the error probabilities are evaluated even though the theoretical framework considers the fixed-confidence setting. -------- Reply to the rebuttal: - choice of c The rebuttal clarified that c=5 is used in the experiments as in Jamieson+, but, unlike Jamieson+, there is no argument on what choice of c is theoretically guaranteed. In fact, c does not appear even once in the proof of Theorem 1. - "characterization of the "closest" alternate model does not hold in the MaxGap bandit problem" In general bandit problems, the lower bound is characterized by an optimization problem over a family of alternate models, and the closest model does not have to be explicitly available, though the resulting lower bound becomes not explicitly written. The analysis like those around Figure 8 will be useful (and interesting) to simplify the candidates of alternate models but the current result itself does not seem to be very useful because of the same drawback as LUCB as pointed out in the first review. The characteristic like "other arms may have to be sampled even if its own gap is small" also appears in other extensions such as linear bandits.