Paper ID: 1079
Title: Buy-in-Bulk Active Learning
Reviews

Submitted by Assigned_Reviewer_3

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The author(s) are essentially extending [Hanneke, Rates of Convergence in Active Learning; 2011] to batch-mode active learning. Specifically, there are two results of note: (1) that the label complexity of batch (buy-in-bulk) active learning interpolates between passive learning (seeing all of the labels at once) and standard sequential active learning (one queries example at time) -- both for the realizable case and with Tsybakov noise and (2) if the cost of buy-in-bulk active learning is sublinear in the size of the batch, the total cost may be smaller than that of sequential active learning at a similar cost (which obviously makes intuitive sense).

I will address the {quality, clarity, originality, significance} components separately below.

Quality: After significant effort on my part, I am fairly certain that the proofs are correct. The work is well-contextualized, motivated, and answers meaningful theoretical questions. My only comment on this front might be to include other "batch mode" active learning works just for context (as this is one place where you have more space) and practitioners may be interested in comparing this to more heuristic methods (since this is what they will actually implement).

Clarity: Once I was able to internalize the notation and do a bit of background reading, the proofs were actually fairly straightforward (and very clear).

Originality: Once you read [Hanneke, 2011], it is fairly straightforward to get through this paper, so this represents more of a "finding a good questions to ask" paper as opposed to introducing new methods of analysis. However, I haven't seen a good theoretical presentation of batch-mode active learning, so this will almost certainly further the understanding of the active learning community.

Significance: As previously stated, this is more of a "extending theoretical results to a related setting" as opposed to developing new methods for understanding active learning (but this is the difference between a good paper and a "best" paper. I don't think this will have tremendous significance, but is an important advancement in the literature that people in this community will find useful and interesting.

Minor comment
-----
- pg 1 - I assume "match maximum" is supposed to be "batch maximum", no? Otherwise, I am missing something
Q2: Please summarize your review in 1-2 sentences
This is the first good theoretical presentation of batch mode ("buy-in-bulk") active learning with respect to label complexity. It analyzes buy-in-bulk active learning under realizable cases, noisy label cases, and settings of various label costs. It is well-motivated, sound, and interesting.

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary of Paper

The paper studies active learning problems in which labels can be acquired in batches, with sublinear labeling costs for labels within a batch. Specifically, the paper studies finite-sample risk bounds for settings in which the model space is assumed to contain a model with risk zero (implying well-specified models and zero noise) and a relaxed version that allows for Tsybakov noise.


Evaluation

The presented setting in which we assume sublinear labeling costs for labels within a batch is interesting and relevant for active learning. This setting has been studied in the literature e.g. by formulating label selection as an optimization problem (Chakraborty et al., 2010). In contrast, the current paper tackles the problem by modifying the PAC-based active learning algorithm CAL (Cohn et al., 1994), and is focused on proving finite-sample bounds in this setting. I am not aware of any other work that derives finite-sample bounds for batch-based active learning, thus this appears to be a novel type of analysis in an established setting.

The first part of the paper analyzes simple extensions of CAL that obtain labels in k equal-sized batches, showing that the resulting label complexity lies in between passive learning and the original CAL algorithm. I feel that this part of the analysis is not very much related to the main motivation of the paper of having a cost function on batches of labels (with sublinear costs). The main contribution of the paper is to adapt the CAL algorithm to such costs functions on batches. For this setting, a detailed proof is provided for the realizable case, and a proof sketch is provided for the case including Tsybakov noise. The analysis here is interesting but relies on certain assumptions about the relationship between D_{X,Y} and the hypothesis space (in the form of the Tsybakov noise model). To understand how well this matches real-world learning tasks, I think a careful empirical evaluation would be very helpful, which is unfortunately missing. For example, the proposed algorithm could be compared against standard active learning and some of the batch-optimized active learners recently proposed (eg, Chakraborty et al, 2010).

Overall, I have the impression that the paper to some degree falls short of its own goals: the setting of buying labels with batch-level costs functions is well motivated with realistic application domains, but the resulting algorithms are never evaluated empirically on such domains.

The paper is mostly well-written and the general ideas presented are quite easy to follow. The theoretical analysis is spelled out in sufficient detail. Proofs of theorems could be moved to an (online) appendix if there is need to free up some space (eg, for an empirical study).
Q2: Please summarize your review in 1-2 sentences
A paper that derives finite sample bounds for an active learning setting in which labels can be acquired in batches with sublinear costs. Unfortunately, the theoretical analysis is not backed up by empirical results.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper introduces a new aspect to the theory of active learning: querying-in-bulk. The setting is well motivated by practical applications and clearly defined. The authors then adapt a basic active learner (CAL) to this new setting in three variations: Realizable case, Tsybacov noise and under a specified cost function for batches of queries. The authors provide performance guarantee with query bounds in terms of the disagreement coefficient for all three cases. In the first two cases their bound interpolates between passive learning and standard active learning for the respective learning setting.

The results are solid and well presented and the setting is new. However, being an adaptation of a existing algorithm, presented with disagreement-coefficient depended bounds, it does not provide significant new insights for our understanding of active learning. So the progress seems rather incremental.
Q2: Please summarize your review in 1-2 sentences
This is a solid (but not overly exciting) work on the theory of active learning. It introduces a new setting of querying-in-bulk, adapts a well-known active learner to this setting and provides performance guarantees.

Submitted by Assigned_Reviewer_8

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
*Summary*

This paper discusses (disagreement-based) batch-mode active learning, and puts forward a theoretical framework bounding the sample complexity and the total cost of buy-in-bulk active learning (i.e., batch-mode active learning with sublinear cost functions w.r.t. the size of a batch). In particular, they study batch-mode active learning from two aspects: (1) Suppose the batch sizes are equal. Fix the number of rounds examined, how many labels one has to request in total to achieve some certain error rate; (2) given that the cost to obtain a batch of labels is sublinear in the size of the batch ( as referred to as ``buy-in-bulk discount''), how is the total cost of the proposed batch algorithms compared with that of fully-sequential active learning methods.

For the first aspect, the authors propose batch-based variants of the well-known CAL algorithm for sequential active learning, and provide upper bounds on label complexity of k-batch active learning, for both the realizable case and the non-realizable case (with Tsybakov noise). For the second aspect, they provide a cost-adaptive modification of the CAL algorithm, and find that the total cost by this algorithm may often be significantly smaller than that of the analogous methods in the fully sequential setting.

*Quality*

This paper nicely extends the previous work (Hanneke ’10) to the batch-mode setting. The first part (section 3, 4) derives upper bounds (depending the size of a batch) of label complexity of the proposed batch algorithms. It is technically sound, and thoroughly explained. The second part (section 5), however, is less clear. From line 376 to 380, the authors directly compare two upper bounds that are obtained separately (potentially with different low-order terms hidden in the big O notation), and conclude that buy-in-bulk active learning are usually more cost-efficient - which seems not fair to me.

Besides, there is no empirical evaluation of the proposed algorithm. Given the various motivating applications mentioned before, it would be interesting to see how the Cost-Adaptive CAL performs empirically with different cost models.

*Clarity*

The second result (Section 5) for buy-in-bulk active learning, as appears in the paper, is striking, yet unconvincing. According to Theorem 5.1, if the cost function is linear in batch sizes, then the proposed batch-mode algorithm (Cost-Adaptive CAL) will exhibit the same form of upper bound as that of the fully-sequential CAL algorithm. Intuitively, under such (unit) cost setting, the cost of Cost-Adaptive CAL should be higher than fully sequential CAL. However, according to the two upper bounded provided in line 343 and 367, one can not tell the difference. Further analysis on the difference (hidden constant or low-order terms) between the cost bounds would be helpful.

Originality and Significance
Active learning problems under the batch-mode setting are quite natural and interesting, but so far only a few theoretical work have been done (e.g., Chen & Krause, ICML’13 formulate it as an adaptive optimization problem). This paper explores the theoretical aspect of the mellow schemes for batch-mode active learning. Although the proof technique relies heavily on the previous work on (sequential) active learning, it is one of the few attempts made along this line of research.

Editorial issue(s):
1. line 171, second term in the max bracket: $er(h)$ is not a subscript.
2. the paper exceeds the submission page limit by 3 lines.

Update: I've considered the authors' rebuttal, and decide not to change my score. Specifically, I’m still not convinced that the cost of the batch algorithm is only constant factor away from the sequential CAL (according to the rebuttal) -- there might be some lower order terms in the bound of the batch algorithm, which is important when comparing with the sequential bound. This is one of the main reasons that I would like to see some empirical evidence of the statement.
Q2: Please summarize your review in 1-2 sentences
Overall, I find the paper theoretically interesting. It generalize the result of [Hanneke ’10] to the batch-mode setting. The results seem correct. However, the theoretical analysis for buy-in-bulk solution (section 5), which is more closely-related to the theme of the paper, is less clear (as explained previously). Besides, no empirical study (especially for the setting in section 5) is conducted, which I think is important given the practical importance of the problem setting.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank all of the reviewers for these helpful
comments. Below are our responses to a few specific
points raised in the reviews.

Assigned_Reviewer_5 mentions that the results for
sublinear cost functions are only given for the
realizable case. However, we note that the paper
does include such a result under Tsybakov noise on
page 8. We chose to abbreviate the proof due to the
space limitation, but the ideas are sketched there.
Once accepted, we will provide the full details of
the proof in an extended version of the paper for
the arXiv.

Assigned_Reviewer_8 wonders about the comparison
between upper bounds for the full-sequential vs
cost-adaptive variants of CAL. We acknowledge that
an improvement in upper bound does not always
reflect an improvement in the actual cost complexity;
however, in this case, it is clear that such
an improvement actually does occur in certain cases,
especially when the cost is much smaller than linear.
For instance, if the cost is c(x)=x^{1/2}, we reduce
by a factor proportional to (theta*d)^{1/2}. It is
true that the constant factors can be slightly larger
in the cost-adaptive bound (say, by a factor of 4),
but for large d or theta, we will still have a net
reduction in total cost.

As for the question of empirical comparisons to other
batch-based methods, this work is strictly
theoretical, and we do not plan on including an
empirical evaluation.