{"title": "Catastrophic interference in connectionist networks: Can It Be predicted, can It be prevented?", "book": "Advances in Neural Information Processing Systems", "page_first": 1176, "page_last": 1177, "abstract": null, "full_text": "Catastrophic interference in \n\nconnectionist networks: Can it be \n\npredicted, can it be prevented? \n\nRobert M. French \n\nComputer Science Department \n\nWillamette University \nSalem, Oregon 97301 \nfrench@willamette.edu \n\n1 OVERVIEW \n\nCatastrophic forgetting occurs when connectionist networks learn new information, \nand by so doing, forget all previously learned information. This workshop focused \nprimarily on the causes of catastrophic interference, the techniques that have been \ndeveloped to reduce it, the effect of these techniques on the networks' ability to gen(cid:173)\neralize, and the degree to which prediction of catastrophic forgetting is possible. The \nspeakers were Robert French, Phil Hetherington (Psychology Department, McGill \nUniversity, het@blaise.psych.mcgill.ca), and Stephan Lewandowsky (Psychology \nDepartment, University of Oklahoma, lewan@constellation.ecn.uoknor.edu). \n\n2 PROTOTYPE BIASING AND FORCED SEPARATION \n\nOF HIDDEN-LAYER REPRESENTATIONS \n\nFrench indicated that catastrophic forgetting is at its worst when high represen(cid:173)\ntation overlap at the hidden layer combines with significant teacher-output error. \nHe showed that techniques to reduce this overlap tended to decrease catastrophic \nforgetting. Activation sharpening, a technique that produces representations hav(cid:173)\ning a few highly active nodes and many low-activation nodes, was shown to be \neffective because it reduced representation overlap. However, this technique was \nineffective for large data sets because creating localized representations reduced the \nnumber of possible hidden-layer representations. Hidden layer representations that \nwere more distributed but still highly separated were needed. French introduced \nprototype biasing, a technique that uses a separate network to learn a prototype \nfor each teacher pattern. Hidden-layer representations of new items are made to \nresemble their prototypes. Each representation is also \"separated\" from the repre(cid:173)\nsentation of the previously encountered pattern according to the difference between \nthe respective teachers. This technique produced hidden-layer representations that \n\n1176 \n\n\fCatastrophic Interference in Connectionist Networks \n\n1177 \n\nwere both distributed and well separated. The result was a significant decrease in \ncatastrophic forgetting. \n\n3 ELIMINATING CATASTROPHIC INTERFERENCE \n\nBY PRETRAINING \n\nHetherington presented a technique that consisted of prior training of the network on \na large body of items of the same type as the new items in the sequential learning \ntask. Hetherington measured the degree of actual forgetting, as did all of the \nauthors, by the method of savings, i.e., by determining how long the network takes \nto relearn the original data set that has been \"erased\" by learning the new data. \nHe showed that when networks are given the benefit of relevant prior knowledge, \nthe representations of the new items are constrained naturally and interference may \nbe virtually eliminated. The previously encoded knowledge causes new items to be \nencoded in more orthogonal manner (i.e., with less mutual overlap) than in a naive \n(Le., non-pretrained) network. The resulting decrease in representation overlap \nproduced the virtual elimination of catastrophic forgetting. \n\nHetherington also presented another technique that substantially reduced catas(cid:173)\ntrophic interference in the sequential learning task. Learning of new items takes \nplace in a windowed, or overlapping fashion. \nIn other words, as new items are \nlearned the network continues learning on the most recently presented items. \n\n4 THE RELATIONSHIP BETWEEN INTERFERENCE \n\nAND GENERALIZATION \n\nLewandowsky examined the hypothesis that generalization is compromised in net(cid:173)\nworks that had been \"manipulated\" to decrease catastrophic interference by creat(cid:173)\ning semi-distributed (i.e., only partially overlapping) representations at the hidden \nlayer. He gave a theoretical analysis of the relationship between interference and \ngeneralization and then presented results from several different simulations using \nsemi-distributed representations. His conclusions were that semi-distributed rep(cid:173)\nresentations can significantly reduce catastrophic interference in backpropagation \nnetworks without diminishing their generalization abilities. This was only true, \nhowever, for techniques (e.g., activation sharpening) that reduced interference by \ncreating a more robust final weight pattern but that did not change the activation \nsurfaces of the hidden units. On the other hand, in models where interference is \nreduced by eliminating overlap between receptive fields of static hidden units (i.e., \nby altering their response surface), generalization abilities are impaired. \n\nIn addition, Lewandowsky presented a technique that relied on orthogonalizing the \ninput vectors to a standard backpropagation network by converting standard asym(cid:173)\nmetric input vectors (each node at 0 or 1) to symmetric input vectors (each input \nnode at -lor 1). This technique was also found to significantly reduce catastrophic \ninterference. \n\n\f", "award": [], "sourceid": 799, "authors": [{"given_name": "Robert", "family_name": "French", "institution": null}]}