Processing math: 100%

Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback Bibtex MetaReview Metadata Paper Reviews Supplemental

Authors

Surbhi Goel, Sushrut Karmalkar, Adam Klivans

Abstract

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let \opt<1 be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \item Finding a ReLU with square-loss $\opt + \epsilon$ is as   hard as the problem of learning sparse parities with noise, widely thought   to be computationally intractable.  This is the first hardness   result for learning a ReLU with respect to Gaussian marginals, and   our results imply --{\em unconditionally}-- that gradient descent cannot   converge to the global minimum in polynomial time. \item There exists an efficient approximation algorithm for finding the   best-fitting ReLU that achieves error $O(\opt^{2/3})$.  The   algorithm uses a novel reduction to noisy halfspace learning with   respect to $0/1$ loss.  \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.