{"title": "Dynamic Cell Structures", "book": "Advances in Neural Information Processing Systems", "page_first": 497, "page_last": 504, "abstract": null, "full_text": "Dynamic Cell Structures \n\nJorg Bruske and Gerald Sommer \nDepartment of Cognitive Systems \n\nChristian Albrechts University at Kiel \n\n24105 Kiel- Germany \n\nAbstract \n\nDynamic Cell Structures (DCS) represent a family of artificial neural \narchitectures suited both for unsupervised and supervised learning. \nThey belong to the recently [Martinetz94] introduced class of Topology \nRepresenting Networks (TRN) which build perlectly topology pre(cid:173)\nserving feature maps. DCS empI'oy a modified Kohonen learning rule \nin conjunction with competitive Hebbian learning. The Kohonen type \nlearning rule serves to adjust the synaptic weight vectors while Hebbian \nlearning establishes a dynamic lateral connection structure between \nthe units reflecting the topology of the feature manifold. In case of super(cid:173)\nvised learning, i.e. function approximation, each neural unit implements \na Radial Basis Function, and an additional layer of linear output units \nadjusts according to a delta-rule. DCS is the first RBF-based approxima(cid:173)\ntion scheme attempting to concurrently learn and utilize a perfectly to(cid:173)\npology preserving map for improved performance. \nSimulations on a selection of CMU-Benchmarks indicate that the DCS \nidea applied to the Growing Cell Structure algorithm [Fritzke93] leads \nto an efficient and elegant algorithm that can beat conventional models \non similar tasks. \n\n1 \n\nIntroduction \n\nThe quest for smallest topology preserving maps motivated the introduction of growing \nfeature maps like Fritzke's Growing Cell Structures (GCS). In GCS, see [Fritzke93] for de(cid:173)\ntails, one starts with a k-dimensional simplex of N = k+ 1 neural units and (k + 1) . kl2 \nlateral connections (edges). Growing of the network is performed such that after insertion \n\n\f498 \n\nJorg Bruske, Gerald Sommer \n\nof a new unit the network consists solely of k dimensional simplices again. Thus, like Ko(cid:173)\nhonen's SOM, GCS can only learn a perfectly topology preserving feature mapl if k \nmeets the actual dimension of the feature manifold. Assuming that the lateral connections \ndo reflect the actual topology the connections serve to define a neighborhood for a Kohonen \nlike adaptation of the synaptic vectors Wj and guide the insertion of new units. Insertion \nhappens incrementally and does not necessitate a retraining of the network. The principle \nis to insert new neurons in such a way that the expected value of a certain local error mea(cid:173)\nsure, which Fritzke ca11s the resource, becomes equal for all neurons. For instance, the \nnumber of times a neuron wins the competition, the sum of distances to stimuli for which \nthe neuron wins or the sum of errors in the neuron's output can all serve as a resource and \ndramatically change the behavior of GCS. Using different error measures and guiding in(cid:173)\nsertion by the lateral connections contributes much to the success of GCS. \nThe principle of DCS is to avoid any restriction of the topology of the network (lateral con(cid:173)\nnection scheme between the neural units) but to concurrently learn and utilize a perfectly \ntopology preserving map. This is achieved by adapting the lateral connection structure ac(cid:173)\ncording to a competitive Hebbian learning rule2: \n\nCIj(t+ 1) = \n\n{ \n\nmax{Yj'YpCij(t)} : Yj'Yj~Yk'YI V'(1S,k,IS,N) \n\n0 \n\naCjj (t) \n\n: Cjj(t) <9 \n: otherwise, \n\nfIfI \n\ntill \n\n(1) \n\nwhere a, 0 < a < 1 is a forgettin\u00a5. constant, 9, 0 < 9 < 1 serves as a threshold for deleting \nlateral connections, and Yj = R (j/v - Wj//) is the activation of the i-th unit with Wj as the \ncentre of its receptive field on presentatIOn of stimulus v. R(.) can be any positive continu(cid:173)\nously monotonically decreasing function. For batch learning with a training set T of fixed \nsize 111 , a = 17JJ9 is a good choice. \nSince the isomorphic representation of the topology of the feature manifold M in the lateral \nconnection structure is central to performance, in many situations a DCS algorithm may be \nthe right choice. These situations are characterized by missing a priori knowledge of the \ntopology of the feature manifold M or a topology of M which cannot be readily mapped to \nthe existing models. Of course, if such a priori knowledge is available then models like \nGCS or Kohonen's SOM allowing to incorporate such knowledge have an advantage, es(cid:173)\npecially if training data are sparse. \nNote that DCS algorithms can also aid in cluster analysis: In a perfectly topology preserv(cid:173)\ning map clusters which are bounded by regions of P(v) = 0 can be identified simply by a \nconnected component analysis. However, without prior knowledge about the feature man(cid:173)\nifold M it is in principal impossible to check for perfect topology preservation of S. Noise \nin the input data may render perfect topology learning even more difficult. So what can per(cid:173)\nfect topology learning be used for? The answer is simply that for every s~t S of reference \nvectors perfect topology learning yields maximum topology preservation with respect to \nthis set. And connected components with respect to the lateral connection structure C may \nwell serve as an initialization for postprocessing by hierarchical cluster algorithms. \n\n1. We use the term \"perfectly topology preserving feature map\" in accordance with its rigorous \ndefinition in [Martinetz93]. \n2. In his very recent and recommendable article [Martinetz94] the term Topology Representing \nNetwork (TRN) is coined for any network employing competitive Hebbian learning for topolo(cid:173)\ngy learning. \n3. if topology preservation is measured by the topographic function as defined in [Villmann94]. \n\n\fDynamic Cell Structures \n\n499 \n\nThe first neural algorithm attempting to learn perfectly topology preserving feature maps \nis the Neural Gas algorithm ofT. Martinetz [Martinetz92]. However, unlike DCS the Neu\u00b7 \nral Gas does not further exploit this information: In every step the Neural Gas computes \nthe k nearest neighbors to a given stimulus and, in the supervised case, employs a11 of them \nfor function approximation. DCS avoids this computational burden by utilizing the lateral \nconnection structure (topology) learned so far, and it restricts interpolation between acti(cid:173)\nvated units to the submanifold of the current stimulus. \nApplying the principle ofDCS to Fritzke's GCS yields our DCS\u00b7GCS algorithm. This al(cid:173)\ngorithm sticks very closely to the basic structure of its ancestor GCS except the predefined \nk-dimensional simplex connection structure being replaced by perfect topology learning. \nBesides the conceptual advantage of perfect topology learning, DCS\u00b7GCS does decrease \noverhead (Fritzke has to handle quite sophisticated data structures in order to maintain the \nk-dimensional simplex structure after insertion! deletion of units) and can be readily imple(cid:173)\nmented on any serial computer. \n\n2 Unsupervised DCS-GCS \n\nThe unsupervised DCS-GCS algorithm starts with initializing the network (graph) to two \nneural units (vertices) n, and n2. Their weight vectors wI' WI (centres of receptive fields) \nare set to points v I' v2 E M which are drawn from M according to P(v). They are connected \nby a lateral connection of weight C l2 = C21 = I. Note that lateral connections in DCS \nare always bidirectional and have symmetric weights. \nNow the algorithm enters its outer loop which is repeated until some stopping criterium is \nfulfi11ed. This stopping criterium could for instance be a test whether the quantization error \nhas already dropped below a predefined accuracy. \nThe inner loop is repeated for A. times. In off-line learning A. can be set to the number ex(cid:173)\namples in the training set T. In this case, the inner loop just represents an epoch of training. \nWithin the inner loop, the algorithm first draws an input stimulus v E M from M according \nto P(v) and then proceeds to calculate the two neural units which weight vectors are first \nand second closest to v. \nIn the next step, the lateral connections between the neural units are modified according to \neq. (1), the competitive Hebbian learning rule. As has already been mentioned, in off-line \n\nlearning it is a good idea to set a = va . \n\nNow the weight vectors Wj of the best matching unit and its neighbors are adjusted in a \nKohonen like fashion: \n\n.1.wbmu = \u00a38 (v - wbmu> and .1.Wj = \u00a3Nh (v - wj) , \nis \n\nN h U) \n\nunit \n\nof \n\na \n\nj \n\n(2) \n\nthe \n\nneighborhood \n\nwhere \nNhU) = {il(Cjj*O,l~i~N)} . \nThe inner loop ends with updating the resource value of the best matching unit. The re(cid:173)\nsource of a neuron is a local error measure attached to each neural unit. As has been pointed \nout, one can choose alternative update functions corresponding to different error measures. \nFor our ~xperim~nts (section 2.1 and sectpn 3.1) we used the accumulated squared distance \nto the stImulus, I.e . .1.'tbmu = Ilv - wbmull \nThe outer loop now proceeds by adding a new neural unit r to the network. This unit is lo(cid:173)\ncated in-between the unit I with largest resource value and its neighbor n with second larg(cid:173)\nest resource value:4 \n\ndefined \n\nby \n\n. \n\n\f500 \n\nJorg Bruske, Gerald Sommer \n\nThe exact location of its centre of receptive field w r is calculated according to the ratio of \nthe resource values 'tl, 'tn' and the resource values of units n and I are redistributed among \nr, n and I: \n\nW r = wI + yew n - wI) , 'tr = ~'tn + ~'tr' 'tl = 't[- ~'tr and'tn = 'tn - ~'tn\u00b7 \n\n(4) \n\nThis gives an estimate of the resource values if the new unit had been in the network right \nfrom the start. Finally the lateral connections are changed, \n\nCr=C[ =l,C =C =IandCr=Cr =0, \n\nr \n\nr \n\nrn \n\nrn \n\nn \n\nn \n\n(5) \n\nconnecting unit r to unit I and disconnecting n and I. \nThis heuristic guided by the lateral connection structure and the resource values promises \ninsertion of new units at good initial positions. It is responsible for the better performance \nof DCS-GCS and GCS compared to algorithms which do not exploit the neighborhood re(cid:173)\nlation between existing units. \nThe outer \n\n'ti (t + 1) = ~'ti (t) , 1:::; i:::; N, where \u00b0 < ~ < 1 is a constant. This last step just avoids \noverflow of the resource variables. For off-line learning, ~ = \u00b0 is the natural choice. \n\nresource values of all units, \n\nloop closes by decrementing \n\nthe \n\n2.1 Unsupervised DCS simulation results \n\nLet us first turn to our simulation on artificial data. The training set T contains 2000 exam(cid:173)\nples randomly drawn from a feature manifold M consisting of three squares, two of them \nconnected by a line. The development of our unsupervised DCS-GCS network is depicted \nin Figure 1, with the initial situation of only two units shown in the upper left. Examples \nare represented by small dots, the centres of receptive fields by small circles and the lateral \nconnections by lines connecting the circles. From left to right the network is examined after \n0, 9 and 31 epochs of training (i.e. after insertion of 2, 11 and 33 neural units). \nAfter 31 epochs the network has built a perfectly topology preserving map of M, the lateral \nconnection structure nicely reflecting the shape of M: Where Mis 2-dimensional the lateral \nconnection structure is 2-dimensional, and it is I-dimensional where M is I-dimensional. \nNote, that a connected component analysis could recognize that the upper right square is \nseparated from the rest of M. The accumulated squared distance to stimuli served as the re(cid:173)\nsource. \nTh.e quantization error Eq = ! L Ilv - wbmu (v) 112 dropped from 100% (3 units) to 3% (33 \numts). \nThe second simulation deals with the two-spirals benchmark. Data were obtained by run(cid:173)\nning the program \"two-spirals\" (provided by eMU) with parameters 5 (density) and 6.5 \n(spiral radius) resulting in a training set T of 962 examples. The data represent two distinct \nspirals in the x-y-plane. Unsupervised DCS-GCS at work is shown in Figure 2, after inser(cid:173)\ntion of 80, 154 and, finally, 196 units. With 196 units a perfectly topology preserving map \nof M has emerged, and the two spirals are clearly separated. Note that the algorithm has \nlearned the separation in a totally unsupervised manner, i.e. not using the labels of the data \n\nn VE T \n\n4. Fritzke inserts new units at a slightly different location, using not the neighbor with second \nlargest resource but the most distant neighbor. \n\n\fDynamic Cell Structures \n\n501 \n\nFigure 1: Unsupervised DCS-GCS on artificial data \n\npoints (which are provided by CMU for supervised learning). Again, the accumulated \nsquared distance to stimuli served as the resource. \n\n-;..---\n\n------,--- -- - . '-::---.:-:r~~ \n\n, \n~.:~:.~.';{';';'; ~.:.;.!~~~!~I \n\nFigure 2: Unsupervised learning of two spirals \n\n3 Supervised DCS-GCS \n\nIn supervised DCS-GCS examples consist not only of an input vector v but also include an \nadditional teaching output vector u. \nThe supervised algorithm actually does work very similar to its unsupervised version ex(cid:173)\ncept \n\n\u2022 when a neural unit nj is inserted an output vector OJ will be attached to it with \n\nOJ = u. \n\n\u2022 the output y of the network is calculated as a weighted sum of the best matching unit's \n\noutput vector 0hmu and the output vectors of its neighbors OJ' i E Nh (bmu) , \n\nY = (~ \n\n~jE {bmuuNh(hmu)} \n\na.o.) , \n\nI \n\nI \n\n(6) \n\n\f502 \n\nJorg Brnske, Gerald Sommer \n\nwhere a\u00b7 = I/(crllv-wiI12+ 1) \nis the activation of neuron i on stimulus v, \ncr, cr> (1, representing the size of the receptive fields. In our simulations, the size of \n\nreceptive fields have been equal for all units. \n\n\u2022 adaption of output vectors by the delta-rule: A simple delta-rule is employed to adjust \n\nthe output vectors of the best matching unit and its neighbors. \n\nMost important, the approximation (classification) error can be used for resource updating. \nThis leads to insertion of new units in regions where the approximation error is worst, thus \npromising to outperform dynamic algorithms which do not employ such a criterion for in(cid:173)\nsertion. In our simulations we used the accumulated squared distance of calculated and \nteaching output, ~tbmu = Ily - u11 2 . \n3.1 Supervised DCS-GCS simulation results \n\nWe applied our supervised DCS-GCS algorithm to three CMU benchmarks, the supervised \ntwo-spiral problem, the speaker independent vowel recognition problem and the sonar \nmine! rock separation problem.5 \nThe t~o spirals benchmark contains 194 examples, each consisting of an input vector \nv E ~ and a binary label indicating to which spiral the point belongs. The spirals can not \nbe linearly separated. The task is to train the examples until the learning system can pro(cid:173)\nduce the correct output for all of them and to record the time. \nThe decision regions learned by supervised DCS-GCS are depicted in Figure 3 after 110 \nand 135 epochs of training, where the classification error on the training set has dropped to \n0%. Black indicates assignment to the fist, white assignment to the second spiral. The net(cid:173)\nwork and the examples are overlaid. \n\nFigure 3: Supervised learning of two spirals \n\nResults reported by others are 20000 epochs of Backprop for a MLP by Lang and Witbrok \n[Lang89], 10000 epochs of Cross Entropy Backprop and 1700 epochs of Cascade-Correla(cid:173)\ntion by Fahlman and Lebiere [Fahlman90] and 180 epochs of GCS training by Fritzke \n[Fritzke93]. \n\n5. For details of simulation, parameters and additional statistics for all of the reported experi(cid:173)\nments the reader is refered to [Bruske94] which is also available viaftp.infomuztik.uni-kiel.de in \ndirectory publkiellpublicationsffechnicalReportslPs.ZI as 1994tr03.ps.Z \n\n\fDynamic Cell Structures \n\n503 \n\nThe data for the speaker independent recognition of 11 vowels comprises a training set of \n582 examples and a test set of 462 examples, see [Robinson89]. \nWe obtained 65% correctly classified test samples with only 108 neural units in the DCS\u00b7 \nGCS network. This is superior to conventional models (including single and multi layer \nperceptron, Kanerva Model, Radial Basis Functions, Gaussian Node Network, Square \nNode Network and Nearest Neighbor) for which figures well below 57% have been report(cid:173)\ned by Robinson. It also qualitatively compares to GCS Gumps above the 60% margin), for \nwhich Fritzke reports best classification results of 61 %(158 units) up to 67% (154 units) for \na 3-dim GCS. On the other hand, our best DCS\u00b7GCS used much fewer units. Note that \nDCS\u00b7GCS did not rely on a pre-specified connection structure (but learned it!). \n\nOur last simulation concerns a data set used by Gorman and Sejnowski in their study of \nclassification of sonar data, [Gorman88]. The training and the test set contain 104 examples \neach. \nGorman and Sejnowski report their best results of 90.4% correctly classified test examples \nfor a standard BP network with 12 hidden units and 82.7% for a nearest neighbor classifier. \nSupervised DCS\u00b7GCS reached a peak classification rate of 95% after only 88 epochs of \ntraining. \n\n4 Conclusion \n\nWe have introduced the idea ofRBF networks which concurrently learn and utilize perfect(cid:173)\nly topology preserving feature maps for adaptation and interpolation. This family of ANNs, \nwhich we termed Dynamic Cell Structures, offers conceptual advantage compared to clas(cid:173)\nsical Kohonen type SOMs since the emerging lateral connection structure maximally pre(cid:173)\nserves topology. We have discussed the DCS-GCS algorithm as an instance of DCS. \nCompared to its ancestor GCS of Fritzke, this algorithm elegantly avoids computational \noverhead for handling sophisticated data structures. If connection updates (eq.(I)) are re(cid:173)\nstrJcted to the best matching unit and its neighbors, DCS has linear (serial) time complexi(cid:173)\nty and thus may also be considersd as an improvement of Martinetz's Neural Gas idea7. \nSpace complexity of DCS is 0 (N) in general and can be shown to become linear if the \nfeature manifold M is two dimensional. The simulations on CMU-Benchmarks indicate \nthat DCS indeed has practical relevance for classification and approximation. \nThus encouraged, we look forward to apply DCS at various sites in our active computer \nvision project, including image compression by dynamic vector quantization, sensorimotor \nmaps for the oculomotor system and hand-eye coordination, cartography and associative \nmemories. A recent application can be found in [Bruske95] where a DCS network attempts \nto learn a continous approximation of the Q-function in a reinforcement learning problem. \n\n6. Here we refer to the serial time a DeS algorithm needs to process a single stimulus (including \nresponse calculation and adaptation). \n7. The serial time complexity of the Neural Gas is Q (N) ,approaching 0 (NlogN) for \nk ~ N , k the number of nearest neighbors. \n\n\f504 \n\nReferences \n\nlarg Bruske, Gerald Sommer \n\n[Bruske94] J. Bruske and G. Sommer. Dynamic Cell Structures: Radial Basis Function Networks \nwith Perfect Topology Preservation. Inst. f. Inf. u. Prakt. Math. CAU zu Kiel. Technical Report 9403. \n[Bruske9S] J. Bruske. I. Ahms and G. Sommer. Heuristic Q-Learning. submitted to ECML 95. \n[Fahlman90] S.E. Fahlman. C.Lebiere. The Cascade-Correlation Learning Architecture. Advances \nin Neural Information processing systems 2, Morgan Kaufman, San Mateo, pp.524-534. \n[Fahlman93] S.E. Fahlman, CMU Benchmark Collection for Neural Net Learning Algorithms, \nCarnegie Mellon Univ., School of Computer Science, machine-readable data repository. Pittsburgh. \n[Fritzke92] B. Fritzke. Growing Cell Structures - a self organizing network in k dimensions, Arti(cid:173)\nficial Neural Networks 2. I.Aleksander & J .Taylor eds .\u2022 North-Holland, Amsterdam, 1992. \n[Fritzke93] B. Fritzke, Growing Cell Structures - a self organizing network for unsupervised and \nsupervised training, ICSI Berkeley, Technical Report, tr-93-026. \n[Gorman88] R.B. Gorman and TJ. Sejnowski, Analysis of Hidden Units in a Layered Network \nTrained to Classify Sonar Targets, Neural Networks, VoU. pp. 75-89 \n[Lang89] K.J. Lang & M.J. Witbrock, Learning to tell two spirals apart, Proc. of the 1988 Connec(cid:173)\ntionist Models Summer School. Morgan Kaufmann, pp.52-59. \n[Martinetz92] Thomas Martinetz, Selbstorganisierende neuronale Netzwerke zur Bewegungss(cid:173)\nteuerung, Dissertation. DIFKI-Verlag, 1992. \n[Martinetz93] Thomas Martinetz, Competitive Hebbian Learning Rule Forms Perfectly Topology \nPreserving Maps, Proc. of the ICANN 93. p.426-438, 1993. \n[Martinetz94] Thomas Martinetz and Klaus Schulten, Topology Representing Networks, Neural \nNetworks. No.7, Vol. 3. pp. 505-522, 1994. \n[Moody89] J.Moody, c.J. Darken. Fast Learning in Networks of Locally-Tuned Processing Units, \nNeural Computation VoLl Num.2, Summer 1989. \n[Robinson89] AJ. Robinson, Dynamic Error Propagation Networks, Cambridge Univ., Ph.D. the(cid:173)\nsis, Cambridge. \n[Villmann94] T. Villmann and R. Der and T. Martinetz, A Novel Approach to Measure the Topology \nPreservation of Feature Maps, Proc. of the ICANN 94, 1994. \n\n\f", "award": [], "sourceid": 975, "authors": [{"given_name": "J\u00f6rg", "family_name": "Bruske", "institution": null}, {"given_name": "Gerald", "family_name": "Sommer", "institution": null}]}