#### Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

#### Authors

*Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová*

#### Abstract

<p>We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the overparametrized regime where the layer width m is larger than the input dimension d. </p>
<p>We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width m*<=m. </p>
<p>We describe how the empirical loss landscape is affected by the number n of data samples and the width m* of the teacher network. In particular we determine how the probability that there be no spurious minima on the empirical loss depends on n, d, and m*, thereby establishing conditions under which the neural network can in principle recover the teacher. </p>
<p>We also show that under the same conditions gradient descent dynamics on the empirical loss converges and leads to small generalization error, i.e. it enables recovery in practice.</p>
<p>Finally we characterize the time-convergence rate of gradient descent in the limit of a large number of samples.</p>
<p>These results are confirmed by numerical experiments.</p>