Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)
Todd Leen
Ideally pattern recognition machines provide constant output when the inputs are transformed under a group 9 of desired invariances. These invariances can be achieved by enhancing the training data to include examples of inputs transformed by elements of g, while leaving the corresponding targets unchanged. Alternatively the cost function for training can include a regularization term that penalizes changes in the output when the input is transformed un(cid:173) der the group.
This paper relates the two approaches, showing precisely the sense in which the regularized cost function approximates the result of adding transformed (or distorted) examples to the training data. The cost function for the enhanced training set is equivalent to the sum of the original cost function plus a regularizer. For unbiased models, the regularizer reduces to the intuitively obvious choice - a term that penalizes changes in the output when the inputs are transformed under the group. For infinitesimal transformations, the coefficient of the regularization term reduces to the variance of the distortions introduced into the training data. This correspon(cid:173) dence provides a simple bridge between the two approaches.