Paper ID: | 9023 |
---|---|

Title: | Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity |

The paper focuses on the memorization capacity of ReLU networks. For 3-layer FNNs the authors tighten the bounds by some carefully constructed parameters to fit the dataset perfectly. As they point out, some related works focus on shallow networks and it could be hard to apply the analysis to more practical networks. The authors provide a lower bound for memorization capacity of deeper networks, by using a set of 3-layer FNNs as blocks in the proofs. To the best of my knowledge, the result is novel. The paper is structured clearly and easy to follow. I enjoyed reading the paper.

Clear presentation. Results are original and contribute to our understanding of ReLu networks. They make a significant advance to existing literature.

The paper investigates the problem of expressiveness in neural networks w.r.t. finite samples, and improves current bound of O(N) to \Theta(\sqrt N) w.r.t. a two hidden layer network, where N is the sample size (showing both a lower and an upper bound). The authors also show an upper bound for classification, a corollary of which is that a three hidden layer network with hidden layers of sized 2k-2k-4k can perfectly classify ImageNet. Moreover, they show that if the overall sum of hidden nodes in a ResNet is of order N/d_x, where d_x is the input dimension then again the network can perfectly realize the data. Lastly, an analysis is given showing batch SGD that is initialized close to a global minimum will come close to a point with value significantly smaller than the loss in the initialization (though a convergence guarantee could not be given). The paper is clear and easy to follow for the most part, and conveys a feeling that the authors did their best to make the analysis as thorough and exhausting as possible, providing results for various settings.