Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper is borderline. Before rebuttal, one reviewer was very negative but then he changed his mind after the authors' response that clarified some of his misunderstandings. Now, all three reviewers are leaning towards acceptance. I thus follow them and recommend acceptance. I think the proposed transformation is novel and offers an interesting new way of interpreting tree forests, which can lead to new algorithmic solutions (e.g., for pruning and model interpretability as illustrated in the paper). The empirical evaluation is clearly weak, as highlighted by the reviewers, but I agree with the authors that the main contribution is the proposed transformation and that the experiments are merely provided as an illustration of the possibilities offered by this transformation. Given the reviews and my reading of the paper, I however ask the authors to make the following mandatory changes in the final version of their paper: - The authors should make all changes promised in their response and they should take into account all additional reviewers' suggestions to make the paper more clear. In addition, the different kinds of trees used (oblivious, symmetric, balanced, etc.) should be clearly defined as they are not so standard. - Section 5 should be clarified. I have the following questions: * It's not clear how the models are constructed. Do you build one model for each class (versus the others) or a single multi-class model? How exactly is a figure obtained for each class in Figures 3 and 4. * How do you apply Breiman's permutation based VI? This method originally estimates loss increase on out-of-bag samples when features are randomly permuted. How do you apply it on boosting models that do not use bootstrap sampling? Which loss do you use? I'm surprised that you get so many pixels with a negative importance value with this method in Figure 3. I expected this variable importance measure to be mostly positive for all pixels. * Accuracies of the models should be reported. Are we assessing features from a good enough model? * Not enough intuition is given for the metric in (8). Given that the averaged term can be negative or positive depending on the example x, couldn't v(k) be very small (or null) even when a feature is important? The authors should comment on that potential limitation in the paper. - I think that the authors should consider extending the pseudocode in Algorithm 1 (in the supplementary material) to make the subroutines more clear (SymmetricTree, IsSubset, and AddMonomialToTree). - The authors should change the name of their representation. The name 'polytree' will lead to confusion as 'polytree' is already used to refer to a specific graph structure, which has been used furthermore in machine learning papers (in the field of graphical models). - Overall, the paper needs to be carefully proofread as there are several typos in the text but also in formulas. - Citations are not properly formatted: they should be numbered in the NeurIPS style and less details must be given in the reference list (no urls, no ISSN, etc.). Be sure to use the correct format.