#### Authors

John Moody, Joachim Utans

#### Abstract

The notion of generalization ability can be defined precisely as the pre(cid:173) diction risk, the expected performance of an estimator in predicting new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multi-layer perceptron networks and use it to select an optimal network architecture from a set of possible architec(cid:173) tures. We also propose a heuristic search strategy to explore the space of possible architectures. The prediction risk is estimated from the available data; here we estimate the prediction risk by v-fold cross-validation and by asymptotic approximations of generalized cross-validation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of a complete a priori model which could be used to impose a structure to the network architecture.

1 Generalization and Prediction Risk

The notion of generalization ability can be defined precisely as the prediction risk, the expected performance of an estimator is predicting new observations. Consider a set of observations D = {(Xj, tj); j = 1 ... N} that are assumed to be generated 683

684