Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper studies the dynamics of discrete gradient descent for overparametrized two-layer neural networks and shows that under certain conditions on the input/output covariance matrices and the initialization the components of the input-output map are learned sequentially. The reviewers appreciated the contributions of the paper, both theory and experiments, and found the paper well written. At the same time, one reviewer feels the assumptions are too strong, and another one feels that some claims are misleading (e.g. having deep in the title) and that the contributions relative an un-cited paper by Lampinen and Ganguli are incremental. Post rebuttal, the reviewer concluded that the novelty of the paper is buried in the appendix, and that a re-write of the paper is needed to elucidate that novelty in the body of the paper. This AC agrees with R4 that the contributions relative to Lampinen and Ganguli need to be clearly established in the body of the paper and that a citation needs to be added. This AC also agrees that the title/abstract/body need to be changed to reflect that a shallow network with squared loss is being analyzed.