NeurIPS 2020

A Generalized Neural Tangent Kernel Analysis for Two-layer Neural Networks

Meta Review

This paper extends neural tangent kernel results to a two-layer, infinite width neural network with a three-times differentiable activation function, weight decay regularization, and noisy gradient descent training, showing a linear convergence rate. The paper received mixed reviews (marginally above, marginally below, accept, reject). On the positive side, R3 think the results are a new nontrivial extension of the NTK results, and R1 think the paper is novel, well written, etc. R1 had some technical issues, but was satisfied by the rebuttal. On the other hand, R2 raised some technical issues regarding the effect of the scaling in the kernel, which I think are well addressed by the rebuttal. R4's main critique is that he/she is not convinced about the significance of using L2 regularization, since algorithms have implicit regularization. I am satisfied by the rebuttal, which essentially argues that both implicit and explicit regularization have value and should be studied.