Part of Advances in Neural Information Processing Systems 4 (NIPS 1991)
Patrice Simard, Bernard Victorri, Yann LeCun, John Denker
In many machine learning applications, one has access, not only to training data, but also to some high-level a priori knowledge about the desired be(cid:173) havior of the system. For example, it is known in advance that the output of a character recognizer should be invariant with respect to small spa(cid:173) tial distortions of the input images (translations, rotations, scale changes, etcetera). We have implemented a scheme that allows a network to learn the deriva(cid:173) tive of its outputs with respect to distortion operators of our choosing. This not only reduces the learning time and the amount of training data, but also provides a powerful language for specifying what generalizations we wish the network to perform.