This paper shows global convergence of gradient descent for deep neural networks that has wide first layer followed by pyramidal shape layers. It shows that an unconventional initialization with width N (data size) of the first layer suffices to show global convergence, which is much smaller than the required width for usual Xavier initialization. The presented result improves existing results greatly; the global convergence for width N is a great improvement from existing results. That is a valuable result. It is encouraged to add more detailed discussions about connection to existing NTK theories and possibilities of relaxing the assumptions maid in the analysis.