Peter Sollich, Anders Krogh
We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actu(cid:173) ally over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students ap(cid:173) propriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generaliza(cid:173) tion performance, in particular if the individual students are sub(cid:173) ject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data.