NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 2605 Making the Cut: A Bandit-based Approach to Tiered Interviewing

### Reviewer 1

This paper explores the problem of tiered interviewing and formulates the problem as a multi-stage combinatorial pure exploration bandit framework. The authors propose two algorithms, CACO and BRUTAS, that extends the algorithms by Chen et al., CLUCB and CSAR, to multiple stages. The authors provide theoretical bounds for the proposed algorithms and conduct simulations based on real data to evaluate the algorithms. On the positive side, I think the research question is interesting, and the proposed approach makes sense on a high level. However, there seems to be a gap between the bandit formulation and the motivating application. I also have some concerns on the technical content (mainly due to the exposition). I would lean towards rejecting the paper in its current form. == Motivation == I find the motivating problem, tiered-interviewing, really interesting. However, I have a hard time connecting the proposed bandit formulation to this problem. In particular, in the bandit formulation, at each stage, the learner can choose to pull each arm different number of times based on the responses of previous pulls. What does this arm pull mean in the interviewing process? As in typical interviewing scenarios, it seems the firm only interacts with each candidate once in each stage (with each stage being resume screening, phone interview, or on-site interview)? I think it would help to provide more discussion on how the formulation connects to the motivating example. == Technical content == While the paper is easy to follow on a high level, I have a couple of confusions and concerns on the technical content. In particular, - The definition of information gain s is only mentioned in the description of related work on page 3. Since it’s one of the more important notations, it would help to clearly define it in the setting. - What are the intuitions or interpretations on why there is a Cost^3 factor when calculating the confidence radius in Line 9 of Algorithm 1? Is the learner aiming to minimize cost or need to satisfy some budget constraint? Without explicitly stating the objective, it’s hard to interpret this choice of confidence radius. - What is T in Theorem 1? It’s not defined in the main text. Looking at Equation 11 to 13 in the appendix, it seems T is the cost divided by the information gain (this partially answers my question above, since Theorem 1 implies that the objective is to bound T). However, if this is the definition, it is not consistent with the notion T in algorithm 2, in which T is the total budget (not divided by information gain). Moreover, since T(a) is also used as the information gain for arm a, this set of notations is a bit confusing. - Algorithm 2 doesn’t seem to satisfy the budget constraint. If we set both \tilt{K}_1 and j_1 to 1, in the first stage, since no arms are in A or B yet, there will be n * (T_1 -1) arms pulls in stage 1 according to Line 6 and Line 8 of Algorithm 2, while the budget is only T_1. I might be mistaken, but it would help to formally demonstrate why Algorithm 2 satisfies the budget constraint since it’s not clear from the current description. - It seems in Algorithm 2, the learner not only needs to decide how much budget T_i to allocate to each stage, it also needs to determine \tilt{K}_i, i.e., how many arms to accept/reject. While the authors provide a brief discussion after Theorem 2, it’s not clear to me how the learner should choose these values simultaneously. - All the K in Theorem 2 should be \tilt{K}? == Experiments == The experiments are conducted under strong assumptions, which make the results less insightful. In particular, to compare the proposed algorithms with the committee decision, it is assumed the learned P(a) represent the true acceptance rate of each candidate. Moreover, it is not clear to me how to interpret the comparison with the decision by the committee, since the objective of the committee might not be the same as the learning algorithms.

### Reviewer 2

The multi-round interview scenario is interesting, and has potentially applications beyond hiring. The algorithmic and theoretical extension from [Chen 2014] is non-trivial, the two algorithms are sound, with clear intuitions and computationally feasible. The two settings are analogous to those in [Chen 2014], but with intuitions updated for multi-round setting. The writing is generally clear. ------ update after author feedback ----------- I read other reviews and author feedback, and feel that the feedback helps provide perspective and have addressed concerns raise in the reviews. Assuming the explanations and additional info in the feedback makes to the final version, I maintain an accept rating.