NeurIPS 2020

On the linearity of large non-linear models: when and why the tangent kernel is constant


Meta Review

This paper clarify the condition under which the NTK remains constant. First, it is pointed out that the NTK is constant if and only if the model is linear. Second, it is shown that the NTK is almost constant if the spectral norm of the Hessian is small. The Hessian norm is bounded by some conditions: linearity of output, sparse dependence of activation function, and no-bottleneck layers. Overall, this paper is well written. Clarifying conditions under which constancy of NTK is quite beneficial to wide range of audiences especially who are working on a infinite width network training. On the other hand, several comments are made from reviewers to improve the paper. I encourage the authors to reflect them to the final version as much as possible.