Charles Fefferman, Scott Markel
We study feed-forward nets with arbitrarily many layers, using the stan(cid:173) dard sigmoid, tanh x. Aside from technicalities, our theorems are: 1. Complete knowledge of the output of a neural net for arbitrary inputs uniquely specifies the architecture, weights and thresholds; and 2. There are only finitely many critical points on the error surface for a generic training problem.
Neural nets were originally introduced as highly simplified models of the nervous system. Today they are widely used in technology and studied theoretically by scientists from several disciplines. However, they remain little understood. Mathematically, a (feed-forward) neural net consists of:
(1) A finite sequence of positive integers (Do, D 1 , ... , D£); (2) A family of real numbers (wJ d defined for 1 :5 e 5: L, 1 5: j 5: D l , 1 5: k :5 Dl-l ;
(3) A family of real numbers (OJ) defined for 15: f 5: L, 15: j 5: Dl.
The sequence (Do, D 1 , .. " DL ) is called the architecture of the neural net, while the W]k are called weights and the OJ thresholds. Neural nets are used to compute non-linear maps from }R.N to }R.M by the following construction. vVe begin by fixing a nonlinear function 0-( x) of one variable. Analogy with the nervous system suggests that we take o-(x) asymptotic to constants as x tends to ±oo; a standard choice, which we adopt throughout this paper, is o-(.r) =