Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki
Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection the(cid:173) ory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estima(cid:173) tor and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models.