Unsupervised learning procedures have been successful at low-level feature extraction and preprocessing of raw sensor data. So far, however, they have had limited success in learning higher-order representations, e.g., of objects in visual images. A promising ap(cid:173) proach is to maximize some measure of agreement between the outputs of two groups of units which receive inputs physically sep(cid:173) arated in space, time or modality, as in (Becker and Hinton, 1992; Becker, 1993; de Sa, 1993). Using the same approach, a much sim(cid:173) pler learning procedure is proposed here which discovers features in a single-layer network consisting of several populations of units, and can be applied to multi-layer networks trained one layer at a time. When trained with this algorithm on image sequences of moving geometric objects a two-layer network can learn to perform accurate position-invariant object classification.
1 LEARNING COHERENT CLASSIFICATIONS
A powerful constraint in sensory data is coherence over time, in space, and across different sensory modalities. An unsupervised learning procedure which can capital(cid:173) ize on these constraints may be able to explain much of perceptual self-organization in the mammalian brain. The problem is to derive an appropriate cost function for unsupervised learning which will capture coherence constraints in sensory signals; we would also like it to be applicable to multi-layer nets to train hidden as well as output layers. Our ultimate goal is for the network to discover natural object classes based on these coherence assumptions.