Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Peter Sollich
Learning curves for Gaussian process regression are well understood when the 'student' model happens to match the 'teacher' (true data generation process). I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima; over-fitting can occur even if the student estimates the teacher noise level correctly. In lower dimensions, plateaux also appear, and the learning curve remains dependent on the mismatch between student and teacher even in the asymptotic limit of a large number of training examples. Learn(cid:173) ing with excessively strong smoothness assumptions can be partic(cid:173) ularly dangerous: For example, a student with a standard radial basis function covariance function will learn a rougher teacher func(cid:173) tion only logarithmically slowly. All predictions are confirmed by simulations.