Part of Advances in Neural Information Processing Systems 8 (NIPS 1995)

*Siegfried Bös*

In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identi(cid:173) cal architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting. We give an explanation for this rather surprising observation and develop a method to avoid the overfitting. This method has two possible interpretations, one is learning with noise, the other cross-validated early stopping.

1 Learning Rules from Examples

The property which makes feedforward neural nets interesting for many practical applications is their ability to approximate functions, which are given only by ex(cid:173) amples. Feed-forward networks with at least one hidden layer of nonlinear units are able to approximate each continuous function on a N-dimensional hypercube arbitrarily well. While the existence of neural function approximators is already established, there is still a lack of knowledge about their practical realizations. Also major problems, which complicate a good realization, like overfitting, need a better understanding.

In this work we study overfitting in a one-layer percept ron model. The model allows a good theoretical description while it exhibits already a qualitatively similar behavior as the multilayer perceptron.

A one-layer perceptron has N input units and one output unit. Between input and output it has one layer of adjustable weights Wi, (i = 1, ... ,N). The output z is a possibly nonlinear function of the weighted sum of inputs Xi, i.e.

z = g(h) , with

Do not remove: This comment is monitored to verify that the site is working properly