Review for NeurIPS paper: On the Modularity of Hypernetworks

NeurIPS 2020

On the Modularity of Hypernetworks

Meta Review

This paper analyzes the theoretical complexity of embedding-based models and hypernetworks, the two types of conditional models for neural networks. The motivation is to be able to understand the results in recent literature that suggests that the overall number of trainable parameters needed for a hypernetwork is significantly lower than traditional neural networks with embedding, while achieving similar or better results. The paper's contributions is to develop a theoretical framework that first extends optimal nonlinear approximation theory to neural nets and conditioning models, and proceed to use this as a foundation to prove advantages of hypernetworks over embeddings in terms of comparing the size of the primary network. They demonstrate that hypernetworks exhibit modularity / reduced complexity (although they admit that modularity is not guaranteed to be achievable through SGD optimization). They go one to show that, under common assumptions, the overall number of trainable parameters in a hypernetwork is much smaller than the number of trainable parameters of a standard neural network. They perform simple experiments (MNIST, CIFAR-10, synthetic toy dataset) to validate and complement their theoretical claims. One of the issues of the paper is the presentation due to its theoretical nature, and NeurIPS 8-page limit, that most of the proofs are in the Appendix, although a reviewer points out that the main paper provides high level details to guide the reader through the details intuitively. In the author's rebuttal, they took the feedback seriously and promised to make the intro section tailored for a more general audience, and also improve clarity of the proofs. I recommend that the authors also use the extra page allocated in the camera ready version to move important details back into the main paper and spend some time in optimizing the presentation of this work, as all the reviewers and myself expect the work to have a high impact and would want to make sure that the effort will be made in the presentation of the paper and also presentation of the work, if this were a spotlight or oral talk. The reviewers also suggested other improvements that the authors received and acknowledged, so I expect them to be in the camera ready version as well. A discussion point is whether this paper would benefit as a journal paper, given the length and rigour. Given the high impact nature of this outstanding work, and potential to change and improve the way practitioners and researchers view neural networks through the use of hypernetworks, most reviewers, including myself, would want to see this work accepted at the conference, despite our limitations. I'm inclined to recommend strong acceptance of this work at NeurIPS conference, as I am confident it will make a great addition, and I will leave it with the authors on whether they would continue developing this work in a Journal format afterwards, perhaps after presenting the work and receiving further feedback at NeurIPS.