#### Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

#### Authors

*Surbhi Goel, Sushrut Karmalkar, Adam Klivans*

#### Abstract

We consider the problem of computing the best-fitting ReLU with
respect to square-loss on a training set when the examples have been
drawn according to a spherical Gaussian distribution (the labels can
be arbitrary). Let $\opt < 1$ be the population loss of the
best-fitting ReLU. We prove:
\begin{itemize}
\item Finding a ReLU with square-loss $\opt + \epsilon$ is as
hard as the problem of learning sparse parities with noise, widely thought
to be computationally intractable. This is the first hardness
result for learning a ReLU with respect to Gaussian marginals, and
our results imply --{\em unconditionally}-- that gradient descent cannot
converge to the global minimum in polynomial time.
\item There exists an efficient approximation algorithm for finding the
best-fitting ReLU that achieves error $O(\opt^{2/3})$. The
algorithm uses a novel reduction to noisy halfspace learning with
respect to $0/1$ loss.
\end{itemize}
Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.