NeurIPS 2020

On the universality of deep learning

Meta Review

The paper shows that any functional class that can be learned in polynomial time by some algorithm can be learned in polynomial time by deep neural networks using stochastic gradient descent. This sheds light, in part, on the empirical success of deep learning, and makes an important contribution toward furthering our understanding of efficient learning of neural networks. Authors complement the result with several extensions, including (a) showing that the results hold even when polynomial noise is added to the gradients or when weights can be of polynomial precision, (b) showing that a network of size O(n^2) and depth O(log(n)) can learn parities using SGD in poly time, and (c) lower bounds for descent algorithms characterized in terms of novel properties of networks that may be of independent interest. The paper reads very well, and the results and insights in the paper are very compelling. Overall, a good paper. Accept!