Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors propose a non-uniform sampling strategy for stochastic gradient boosted decision trees. In particular, sampling probability of the training data is optimized towards maximizing the estimation accuracy of the splitting score of decision trees. The optimization problem allows an approximate closed-form solution. Experiment results demonstrate superior performance of the proposed strategy. The reviewers agree that the paper can not only help understand sampling within GBDT from a more rigorous perspective but also improve GBDT implementations in practice. The reviewers have concerns about the structural assumption (between tree with sub-sampled data and tree with full data), the clarity of writing in some parts and the tuning time of the strategy. The authors are encouraged to improve on those parts, and contribute the implementation to open-source packages.