Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper studies deep neural networks in the regime where the layer widths grow to infinity. Its main contribution is to show that the dynamics of gradient descent for optimizing an infinite width neural network can be explained by the first-order Taylor expansion of the network around its initial parameters, given by the NTK of Jacot et al. Reviewers all agreed this is a valuable contribution which helps the current efforts on understanding the inner workings of gradient descent on large neural networks and its role with regards to generalisation. Despite some concerns about the applicability of this regime to explain the empirical performance of large deep nets and some concurrent work (Chizat and Bach), the authors successfully addressed these concerns in the rebuttal and therefore the AC recommends acceptance.