NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper contains several contributions. The authors propose a polynomial form of a tree ensemble. A tree is expressed as a sum (over leaf) of a weight (associated to the leaf) and a product of conditions. The authors provide some theoretical results linking trees and this representation. Based on this representation, they propose methods to compress tree ensembles, to prune them and to measure feature importance. They also provide some experimental results. Imho, this paper is rather original. While trees and ensembles have already been studied as a sum of weights over leafs, I do not think the methods derived by the authors have been considered before. Following the authors clarification, I think the paper is relatively clear and, with the modification the authors promised in the rebuttal, of a quality fit for publication. The authors make several contributions and back them with experimental results.I believe the details they make available about their experiments would make it reasonably easy to reproduce them. On the negative side, they do not compare their methods to baselines for the pruning experiment, they do not provide any code. In my opinion the paper would also benefit from additional proof-reading. My main concern was that the paper was often unclear. Here are the main points that bothered me initially, but the authored cleared them up in the rebuttal, promising to correct mistakes and pointed out some understanding mistake I made. I'm sorry for that. A) l74: I do not understand condition 2. A leaf can be bounded from the left and the right, so can't a feature be present twice? B) Theorem 1: Wouldn't it be possible to construct an ensemble H' from an ensemble H by splitting one leaf l and assigning the weigh of the old leaf in the new leaves in H'? Hence the ensemble would have the same value for all points but different set of conditions. C) equation 2 seems wrong with respect to right and left splits and the definition of c(x) equation 3: M does not seem to have been defined before. D) l 85: d does not seem to have been defined before. l105: In this section we (...) set up a task of tree shape change in ensemble. We represent a tree ensemble as a sum of trees of fixed shape. ---> unclear to me whether the shape is changed or not. Nevertheless, I believe the methods proposed to be interesting. The lack of comparison to existing approaches make it hard to assess the impact they can have on practitioners. However, the fact the authors derive various methods from the representation they study leads me to believe this impact could be high. some typos: 56: there are fixed number of rules 82: the stronger condition devours weaker 112: but proof will be longer 149: we took model --------------- I would like to thank the authors for pointing out misunderstandings on my side. The rebuttal was also very nice to understand the paper. I have updated my review above to take the rebuttal into account. I would like to stress again that I now think the paper to be worth publishing though there is still some room for improvement. I will update my score accordingly. I would encourage the authors to include the example included in l8-10 of the rebuttal in the paper. On a related note, I think there may be a mistake in these lines. Shouldn't the first equality on line 9 be "I(x > 0)(1 − I(x > 1)) = I( x >0 ) − I(x>0)I(x>1)" ? I also suggest improving the illustrative Figure 1 by including an additional step: the transformation of the sum into an expression containing only right split indicator functions. If the authors implement this and the changes they suggested in their answer, I think my complains would no longer be relevant.
Reviewer 2
This paper proposes using the polynomial representation for the trees. The main contribution of the authors can be summarized to the theoretical analysis, model pruning, and interpretation of the mentioned representation. In the theoretical analysis presented in the paper, the authors introduced a theorem to define the equivalence of the tree ensembles with the same set of conditions and polynomial representation. Also, an optimization problem is formalized for converting polynomial form back to tree ensemble. However, later for simplicity, a greedy algorithm is proposed to convert the polynomial form to the ensemble of symmetric oblivious trees. Based on PolyTree representation, a model reduction strategy is proposed in which the weight of the monomials with (defined) quality lower than a threshold is set to zero. Finally, an illustrative experiment and a theoretical justification are presented to show the ability of the presented model for feature importance. The main idea presented in this work, a novel polynomial representation form for the decision trees, is interesting. Mainly because it can open new avenues to understand and interpret the tree ensembles in general. The paper is easy to read and well organized. However, I have some concerns related to the experimental results presented in section 4.
Reviewer 3
In this paper, authors propose a new framework for tree ensemble representation while transforming each tree into a well-known polynomial form. The proposed repreentation is applied to three tasks namely theoretical analysis, model reduction, and interpretation. Experimentations are conducted ans some comparisons are made. The proposed idea is quite original by ensuring three tasks namely theorectical analysis, the model reduction and the interpretation. The contributions are somewhat interesting where authors propose an algorithm for tree ensemble conversion to polynomial form; a proof of uniqueness of the polynomial form for all shapes of trees, that have the same values is provided. Besides, author develop an algorithm for conversion of polynomial ensemble representation to sum of symmetric oblivious trees. They also deal with ensemble pruning, and also propose a method for global feature attribution. The results are in general sound, a proof of Theorem 1 is provided. Experiments are somewhat convincing but needed to be more detailed and more comparisons should be done. In general, the paper is well-organized with few typos to correct. However Figures 2 and 3 are not very clear and their titles are very long. Examples should be added make the article clearer. Even the paper presents somewhat interesting ideas, there are some comments that should be addressed : - The introduction needs more depth to better understand the motivation of the paper. - The paper lacks of examples which will better illustrate the proposed ideas. . - Why the level of dependency grows with the number of conditions in M. - For section 3 relative to theoretical analysis, experiments remain poor, testing on only one set (Higg dataset) is insufficient to make reliable conclusions. More data sets should be tested. - In Tabe 1, How to explain that there is no significant improvement In AUC criterion between trained ensembles and pruned ensembles. In fact, you get almost the same AUC for both methods - In Table 1, evaluating based only on AUC and model reduction is unsufficient other criteria should be added like the computational time, etc. - Authors do not mentionned if the cross validation is applied or not. This latter should be applied. - Why what is tested in Table 1 is only sets of binary classes. What about sets sets with multiple classes.