NeurIPS 2020

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology


Meta Review

This paper shows global convergence of gradient descent for deep neural networks that has wide first layer followed by pyramidal shape layers. It shows that an unconventional initialization with width N (data size) of the first layer suffices to show global convergence, which is much smaller than the required width for usual Xavier initialization. The presented result improves existing results greatly; the global convergence for width N is a great improvement from existing results. That is a valuable result. It is encouraged to add more detailed discussions about connection to existing NTK theories and possibilities of relaxing the assumptions maid in the analysis.