NeurIPS 2020

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Meta Review

Knowledge distillation is not well understood and the reviewers agree that there's value in studying this topic. The results make wide network and NTK assumptions, which some reviewers (and this AC) eschew as unrealistic, however it's still exciting to see a theoretical step forward on such an opaque issue.