NeurIPS 2020

Margins are Insufficient for Explaining Gradient Boosting


Meta Review

R1 and R3 support acceptance underlying the interestingness of the results. R2 support rejects by mentioning that the results do not directly take into account some specificities of the gradient boosting (GB) learning algorithms in particular the problems of normalization of the regressors that have to be combined. That being said, the theory presented in the paper is fairly general, giving new insights on (gradient) boosting methods, it provides progress on margin bounds in both direction (lower+upper bounds) with respect to current state of the art. The wide use of (gradient) boosting methods make the paper interesting for the community. Based on these positive points, I recommend acceptance. However, the authors should consider revising their paper according to the following points: -The theory provided is rather general, not specific to GB, and must presented accordingly. Note that it is rather known/expected that existing bounds are not that tight, discussion could be improved on this point. -Consider to add a discussion that addresses the specificities of GB to answer the remarks raised by R2, and possibly the limits of the setup studied. -Consider to expand the discussion on the meaning of the proposed results: how do they make sense in practice and how to make use of them to develop new learning algorithms. -Consider to add the following references: *Lev Reyzin, Robert E. Schapire: How Boosting the Margin Can Also Boost Classifier Complexity. ICML 2006. Check in particular the experimental evaluation of the meaning of margin bounds *Liwei Wang, Masashi Sugiyama, Zhaoxiang Jing, Cheng Yang, Zhi-Hua Zhou, Jufu Feng: A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin. Journal of Machine Learning Research, vol 12, pages 1835-1863, 2011. *Robert E. Schapire, Yoram Singer: Improved Boosting Algorithms Using Confidence-rated Predictions. Machine Learning Journal, vol 37(3), pages 297-336, 1999 some issues mentioned in the paper were discussed already there *Leo Breiman. Prediction Games and Arcing Algorithms. Neural Computation, vol 11(7), pages 1493–1517, 1999. you should in particular make reference to the arc-gv algorithm (used in the experimental part of the Reyzin-Schapire-ICML'06 mentioned above)