We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learn(cid:173) ing systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order) between the expected test set and tlaining set errors:
(1) Here, n is the size of the training sample e, u;f f is the effective noise variance in the response variable( s), ,x is a regularization or weight decay parameter, and Peff(,x) is the effective number of parameters in the non(cid:173) linear model. The expectations ( ) of training set and test set errors are taken over possible training sets e and training and test sets e' respec(cid:173) tively. The effective number of parameters Peff(,x) usually differs from the true number of model parameters P for nonlinear or regularized models; this theoretical conclusion is supported by Monte Carlo experiments. In addition to the surprising result that Peff(,x) ;/; p, we propose an estimate of (1) called the generalized prediction error (GPE) which generalizes well established estimates of prediction risk such as Akaike's F P E and AI C, Mallows Cp, and Barron's PSE to the nonlinear setting.!
lCPE and Peff(>") were previously introduced in Moody (1991).