{"title": "ART2/BP architecture for adaptive estimation of dynamic processes", "book": "Advances in Neural Information Processing Systems", "page_first": 169, "page_last": 175, "abstract": null, "full_text": "ART2/BP architecture for adaptive estimation of \n\ndynamic processes \n\nEinar S~rheim * \nDepartment of Computer Science \nUNIK, Kjeller \nUniversity of Oslo \nN-2007 Norway \nAbstract \n\nThe goal has been to construct a supervised artificial neural network that \nlearns incrementally an unknown mapping. As a result a network con(cid:173)\nsisting of a combination of ART2 and backpropagation is proposed and \nis called an \"ART2/BP\" network. The ART2 network is used to build \nand focus a supervised backpropagation network. The ART2/BP network \nhas the advantage of being able to dynamically expand itself in response \nto input patterns containing new information. Simulation results show \nthat the ART2/BP network outperforms a classical maximum likelihood \nmethod for the estimation of a discrete dynamic and nonlinear transfer \nfunction. \n\n1 \n\nINTRODUCTION \n\nMost current neural network architectures such as backpropagation require a cyclic \npresentation of the entire training set to converge. They are thus not very well suited \nfor adaptive estimation tasks where the training vectors arrive one by one, and where \nthe network may never see the same training vector twice. The ART2/BP network \nsystem is an attempt to construct a network that works well on these problems. \n\nMain features of our ART2/BP are: \n\n\u2022 implements incremental supervised learning \n\u2022 dynamically self-expanding \n\n*e-mail address:einar@tellus.unik.nooreinars@ifi.uio.no \n\n169 \n\n\f170 \n\nSorheim \n\n\u2022 learning of a novel training pattern does not wash away memory of previous \n\ntraining patterns \n\n\u2022 short convergence time for learning a new pattern \n\n2 BACKGROUND \n\nAdaptive estimation of nonlinear functions requires some basic features of the esti(cid:173)\nmation algorithm. \n\n1. Incremental learning \n\nThe input/output pairs arrive to the estimation machine one by one. By ac(cid:173)\ncumulating the input/output pairs into a training set and rerun the training \nprocedure at every arrival of a new input/output pair, one could use a conven(cid:173)\ntional method. Obvious disadvantages would however be \n\n\u2022 huge learning time required as the size of the training set increases . \n\u2022 an upper limit, N, on the number of elements in the training set will \nhave to be set. The training set will then be a gliding horizon of the N \nlast input/output pairs, and information prior to the N last input/output \npairs will be lost. \n\n2. Plasticity \n\nLearning of a new input/output pair should not wash away the memory of \npreviously learned nonconflicting input/output pairs. With most existing feed(cid:173)\nforward supervised nets this is hard to accomplish, though some efforts have \nbeen made (Otwell 90). Some networks, like the ART-family and RCN (Ryan \n1988) are plastic but they are self-organizing, not supervised. \n\nTo summarize: \nNeed a supervised network that learns incrementally the mapping of an unknown \nsystem and that can be used to predict future outputs. The system in question \nmaps analog vectors to analog vectors. \n\n3 COMBINED ARCHITECTURE \n\nIn the proposed network architecture an ART2 network controls a BP network, see \nFigure 1. \n\nThe BP-network consists of many relatively small subnetworks where the subnets \nare specialized on one particular domain of the input space. ART2 controls how the \ninput space is divided among the subnets and the total amount of sub nets needed. \n\nThe ART2 network analyzes the input part of the input/output pairs as they arrive \nto the system. For a given input pattern i:r, ART2 finds the category G:r which has \nthe closest resemblance to ~. If this resemblance is good enough, ~ is of category \nG:r and the LTM-weights of G:r are updated. The BP-subnetwork BP:r, connected \nto G:r, is as a consequence activatedt and relearning of BP:r is done. The learning \nset consists of a \"representative\" set of the neighbouring subnets patterns and a \nsmall number of the previous patterns belonging to category G:r. To summarize the \n\n.... \n\n\fART2IBP Architecture for Adaptive Estimation of Dynamic Processes \n\n171 \n\nalgorithm goes as follows: \n\n1. Send input vector to ART2 network \n2. ART2 classification. \n3. If in learning mode adjust ART2 LTM weights of the winning node. \n4. Send input to the back propagation network connected to the winning ART2 \n\nnode. \n\n5. If in learning mode: \n\n\u2022 find a representative training set. \n\u2022 do epoch learning on training set. \n\nOtherwise \n\n\u2022 compute output of the selected back propagation network. \n\n6. Go to 1. for new input vector. \n\nThe ART2/BP neural network can be used for adaptive estimation of nonlinear \ndynamic processes. The mapping to be estimated then is \nl( u(t), yet\u00bb~ \n\n(1) \n\nyet + ot) \nu(t) \nyet) \n\nf ~m \n\nf ~n \n\nThe input/output pairs will be i7J = [u(t) , yet), yet + ot)], denote the input part of \ni7J: i = [u(t) , yet)] and the output part of (0: 0 = yet + ot). \n\n4 ART2 MODIFIED \n\nART2 was developed by Carpenter& Grossberg see (Carpenter 1987) and (Carpen(cid:173)\nter 1988). ART2 categorizes arbitrary sequences of analog input patterns, and the \ncategories can be of arbitrary coarseness. For a detailed description of ART2, see \n(Carpenter 1987). \n\n4.1 MODIFICATION \n\nIn the standard ART2-algorithm input vectors (patterns) are normalized. For this \napplication it is not desired to classify parallel vectors of different magnitude as \nbelonging to the same category. By adding an extra element to the input vector \nwhere this element is simply \n\n(2) \n\nthe new input vector becomes \n\n(3) \nFrom a scaled vector of i: i = a :{ the original vector i could easily be found as : \n(4) \n\n-\n\n-\n\n\f172 \n\nSorheim \n\nand by using the augmented i as the input to ART2 instead of i one can at any \npoint in Fl( representation layer) and F2( categorization layer) generate the corre(cid:173)\nsponding non-normalized vector. The F2 node competition is modified so that the \nnode having bottom-up LTM weights with the smallest distance (distance being the \neuclidean norm) to the Fl layer pattern code wins the competition. The distance \ndJ of F2 node J is given by: \n\nIIv - zjll \nbeing the 12 - norm \nFl pattern code. \nbottom - up LTM weights of F2 node J \n\nv \nzj \n\n(5) \n\nReset is done by calculating the distance d between the Fl layer pattern code V and \n\n~ \nJ : \n\n(6) \nand comparing it to a largest acceptable bound p. If d > p the winning node is \ninhibited and a new node will be created. If d ~ p LTM-patterns of the winning \nnode J are modified (learning). \n\nd = IIv- ~I \n\n5 BACK PROPAGATION NETWORK \n\nThe backpropagation network used in this work is of the standard feedforward type, \nsee (Rumelhart 1986) . The number of hidden layers and nodes should be kept low \nin the subnetworks, for the problems in our simulations we used 1 hidden layer with \n2 nodes. As for training algorithms several different kinds have been tried: \n\n\u2022 Standard back propagation (SBP) \n\n\u2022 A modified back propagation (MBP) method similar to the one used in the \n\nBPS simulator from George Mason University. \n\n\u2022 Quickprop (Q). \n\n\u2022 A quasi-Newton method (BFGS). \n\nAll of these except SBP show similar performance in my test cases. \n\nThe BP-networks performs as an interpolator in this algorithm and any good inter(cid:173)\npolation algorithm can be used instead of BP. Approximation theory gives several \ninteresting techniques for approximation/interpolation of multidimensional func(cid:173)\ntions such as Radial Basis Functions and Hyper Basis Functions, for further detail \nsee (Poggio 90). These methods requires a representative training set where the \ninput part determines the location of centers in the input space. The ART2 alg