Michael C. Mozer
Learning structure in temporally-extended sequences is a difficult com(cid:173) putational problem because only a fraction of the relevant information is available at any instant. Although variants of back propagation can in principle be used to find structure in sequences, in practice they are not sufficiently powerful to discover arbitrary contingencies, especially those spanning long temporal intervals or involving high order statistics. For example, in designing a connectionist network for music composition, we have encountered the problem that the net is able to learn musical struc(cid:173) ture that occurs locally in time-e.g., relations among notes within a mu(cid:173) sical phrase-but not structure that occurs over longer time periods--e.g., relations among phrases. To address this problem, we require a means of constructing a reduced deacription of the sequence that makes global aspects more explicit or more readily detectable. I propose to achieve this using hidden units that operate with different time constants. Simulation experiments indicate that slower time-scale hidden units are able to pick up global structure, structure that simply can not be learned by standard back propagation.
Many patterns in the world are intrinsically temporal, e.g., speech, music, the un(cid:173) folding of events. Recurrent neural net architectures have been devised to accom(cid:173) modate time-varying sequences. For example, the architecture shown in Figure 1 can map a sequence of inputs to a sequence of outputs. Learning structure in temporally-extended sequences is a difficult computational problem because the in(cid:173) put pattern may not contain all the task-relevant information at any instant. Thus, 275