NeurIPS 2020

Estimating decision tree learnability with polylogarithmic sample complexity


Review 1

Summary and Contributions: This is a theoretical paper that shows that commonly used top-down decision tree learning heuristics are amenable to highly efficient learnability estimation. That is, for monotone target functions, it is possible to estimate the error the learned hypothesis will have on the training set S by observing just a small fraction of S.

Strengths: Nice, clean paper with very clear and interesting (but not surprising) results. Potentially interesting technique. Very well written

Weaknesses: The results are not surprising at all. That does not mean that it’s easy to prove, but it is still not surprising that it is possible to estimate how well a learning algorithm will do on a set S by observing only a small part of the set S. Also, the paper makes strong monotonicity assumptions, but does not discuss the implications of it on the strength (and relevance to application) of the results.

Correctness: Seems correct.

Clarity: Very well written.

Relation to Prior Work: Discussed well.

Reproducibility: Yes

Additional Feedback: The paper is not interesting enough for a competitive conference. It is good to have these results in the literature, but I suggest to send it to a journal. Having read the reviews, and following the discussion, I still think that this does not below in a competitive conference. Indeed, as the authors stress in their response, the power of the result is due to the specific algorithm developed here. Nevertheless, I cannot be excited by it, given the monotonicity assumption and the fact that it applies only to the uniform distribution setting. I agree that it's an interesting result, but I think that it's not interesting enough nor important enough for a top conference.


Review 2

Summary and Contributions: The paper presents applications of recent work on classical decision tree learning algorithms. Very efficient (sublinear) algorithms are given for estimating the error of the tree produced by a top-down learning algorithm (specified by an impurity function) for a monotone target function, using a given training set.

Strengths: The paper belongs to a flurry of beautiful recent work on the classical problem of learning decision trees using top-down algorithms. Results of Blanc et al. are strengthened and the strengthened result is applied to very natural new learning problem, estimating the quality of the hypothesis produced by a learning algorithm on a fixed training set. The proofs are interesting and elegant.

Weaknesses: There are a few comments for improving the presentation, given below.

Correctness: Yes.

Clarity: One point that would be useful to clarify is that most of the positive results on decision tree learning apply to the uniform distribution setting. This could be complemented by mentioning some general negative results for decision tree learning, and discussing whether there are possibilities to extend the results, for example, for product distributions. In the remark after Theorem 3 on arbitrary test distributions, the difference between the test distribution and the uniform distribution used to get T should be discussed.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: The author response addressed the comments in a satisfactory manner.


Review 3

Summary and Contributions: The paper studies the recently introduced "estimated learnability" framework in the context of decision trees. More specifically, they analyse top-down decision tree learning heuristic and show that their expected preformance (in terms of error probability) can be computed with exponetially fewer examples than it would take to learn the DT. On the way they introduce a minibatch version of the standard top-down decision tree learning heuristic that comes with the same error guarantees but can be learned more efficiently by having to consider only polylog examples at every step.

Strengths: The paper makes a significant contribution to the recently proposed framework of estimating learnability. The obtained analysis also provides a deeper understanding of the workings of well-known and prominent decision tree learning heuristics. The obtained results are non-trivial and surprising.

Weaknesses: Since estimating learnability is a very novel field, it is at this stage impossible to say how important it will be in the future. However, the field seems to have already gained some traction and the main results were published at very strong venues.

Correctness: yes

Clarity: Given that the paper is very theoretical and the proofs (in the appendix) are very technical, I was happy to see that the authors made a very good effort to explain the main ideas behind their results in a very intuitive manner in the main paper. So yes the paper is well-written.

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback:


Review 4

Summary and Contributions: The paper is a theoretical contribution in the domain of estimating learnability. It improves over results in [BLT20a] on learning monotone target functions with top-down decision tree algorithms. For this, they introduce a mini-batch top-down decision tree learning algorithm. They also provide additional results for active local learning and estimating learnability.

Strengths: A sound paper in the domain of computational learning theory for the recently introduced problem of learnability estimation.

Weaknesses: The focus is on sample complexity, more precisely on the number of examples to be labeled. My main concern is about the overall complexity. I would have liked to read more details on how the batch samples can be efficiently drawn at each leaf to be queried.

Correctness: The claims seem correct modulo the above question.

Clarity: The paper is well written.

Relation to Prior Work: The distinction with [BLT20a] could have been made more thorough.

Reproducibility: No

Additional Feedback: **** Thanks for the feedback. I do not change my evaluation on the paper. The main contribution is the mini-batch learning algorithm, but the batch sampling procedure needs to process the whole dataset. Therefore I am not convinced that the paper is enough for NeurIPS. I suggest to submit the paper to a conference or journal on computational learning theory.