{"title": "A Growing Neural Gas Network Learns Topologies", "book": "Advances in Neural Information Processing Systems", "page_first": 625, "page_last": 632, "abstract": null, "full_text": "A Growing Neural Gas Network Learns \n\nTopologies \n\nBernd Fritzke \n\nInstitut fur Neuroinformatik \nRuhr-Universitat Bochum \n\nD-44 780 Bochum \n\nGermany \n\nAbstract \n\nAn incremental network model is introduced which is able to learn \nthe important topological relations in a given set of input vectors by \nmeans of a simple Hebb-like learning rule. In contrast to previous \napproaches like the \"neural gas\" method of Martinetz and Schulten \n(1991, 1994), this model has no parameters which change over time \nand is able to continue learning, adding units and connections, until \na performance criterion has been met. Applications of the model \ninclude vector quantization, clustering, and interpolation. \n\n1 \n\nINTRODUCTION \n\nIn unsupervised learning settings only input data is available but no information \non the desired output. What can the goal of learning be in this situation? \n\nOne possible objective is dimensionality reduction: finding a low-dimensional sub(cid:173)\nspace of the input vector space containing most or all of the input data. Linear \nsubspaces with this property can be computed directly by principal component anal(cid:173)\nysis or iteratively with a number of network models (Sanger, 1989; Oja, 1982). The \nKohonen feature map (Kohonen, 1982) and the \"growing cell structures\" (Fritzke, \n1994b) allow projection onto non-linear, discretely sampled subspaces of a dimen(cid:173)\nsionality which has to be chosen a priori. Depending on the relation between \ninherent data dimensionality and dimensionality of the target space, some informa(cid:173)\ntion on the topological arrangement of the input data may be lost in the process. \n\n\f626 \n\nBernd Fritzke \n\nThis is not astonishing since a reversible mapping from high-dimensional data to \nlower-dimensional spaces (or structures) does not exist in general. \n\nAsking how structures must look like to allow reversible mappings directly leads to \nanother possible objective of unsupervised learning which can be described as topol(cid:173)\nogy learning: Given some high-dimensional data distributionp(e), find a topological \nstructure which closely reflects the topology of the data distribution. An elegant \nmethod to construct such structures is \"competitive Hebbian learning\" (CHL) (Mar(cid:173)\ntinetz, 1993). CHL requires the use of some vector quantization method. Martinetz \nand Schulten propose the \"neural gas\" (NG) method for this purpose (Martinetz \nand Schulten, 1991). \nWe will briefly introduce and discuss the approach of Martinetz and Schulten. Then \nwe propose a new network model which also makes use of CHL. In contrast to \nthe above-mentioned CHL/NG combination, this model is incremental and has \nonly constant parameters. This leads to a number of advantages over the previous \napproach. \n\n2 COMPETITIVE HEBBIAN LEARNING AND \n\nNEURAL GAS \n\nCHL (Martinetz, 1993) assumes a number of centers in R n and successively inserts \ntopological connections among them by evaluating input signals drawn from a data \ndistribution p(e). The principle of this method is: \n\nFor each input signal x connect the two closest centers (measured \nby Euclidean distance) by an edge. \n\nThe resulting graph is a subgraph of the Delaunay triangulation (fig. 1a) corre(cid:173)\nsponding to the set of centers. This subgraph (fig. 1b), which is called the \"induced \nDelaunay triangulation\", is limited to those areas of the input space R n where \np(e\u00bb O. The \"induced Delaunay triangulation\" has been shown to optimally \npreserve topology in a very general sense (Martinetz, 1993). \n\nOnly centers lying on the input data submanifold or in its vicinity actually develop \nany edges. The others are useless for the purpose of topology learning and are often \ncalled dead units. To make use of all centers they have to be placed in those regions \nof R n where P (e) differs from zero. This could be done by any vector quantization \n(VQ) procedure. Martinetz and Schulten have proposed a particular kind of VQ \nmethod, the mentioned NG method (Martinetz and Schulten, 1991). The main \nprinciple of NG is the following: \n\nFor each input signal x adapt the k nearest centers whereby k is \ndecreasing from a large initial to a small final value. \n\nA large initial value of k causes adaptation (movement towards the input signal) \nof a large fraction of the centers. Then k (the adaptation range) is decreased until \nfinally only the nearest center for each input signal is adapted. The adaptation \nstrength underlies a similar decay schedule. To realize the parameter decay one has \nto define the total number of adaptation steps for the NG method in advance. \n\n\fA Growing Neural Gas Network Learns Topologies \n\n627 \n\na) Delaunay triangulation \n\nb) induced Delaunay triangulation \n\nFigure 1: Two ways of defining closeness among a set of points. a) The Delau(cid:173)\nnay triangulation (thick lines) connects points having neighboring Voronoi poly(cid:173)\ngons (thin lines). Basically this reduces to points having small Euclidean distance \nw.r.t. the given set of points. b) The induced Delaunay triangulation (thick lines) \nis obtained by masking the original Delaunay triangulation with a data distribu(cid:173)\ntion P(~) (shaded) . Two centers are only connected if the common border of their \nVoronoi polygons lies at least partially in a region where P(~\u00bb 0 (closely adapted \nfrom Martinetz and Schulten, 1994) \n\nFor a given data distribution one could now first run the NG algorithm to dis(cid:173)\ntribute a certain number of centers and then use CHL to generate the topology. \nIt is, however, also possible to apply both techniques concurrently (Martinetz and \nSchulten, 1991). In this case a method for removing obsolete edges is required since \nthe motion of the centers may make edges invalid which have been generated ear(cid:173)\nlier. Martinetz and Schulten use an edge aging scheme for this purpose. One should \nnote that the CHL algorithm does not influence the outcome of the NG method in \nany way since the adaptations in NG are based only on distance in input space and \nnot on the network topology. On the other hand NG does influence the topology \ngenerated by CHL since it moves the centers around. \n\nThe combination of NG and CHL described above is an effective method for topol(cid:173)\nogy learning. A problem in practical applications, however, may be to determine \na priori a suitable number of centers. Depending on the complexity of the data \ndistribution which one wants to model, very different numbers of centers may be \nappropriate. The nature of the NG algorithm requires a decision in advance and, \nif the result is not satisfying, one or several new simulations have to be performed \nfrom scratch. In the following we propose a method which overcomes this prob(cid:173)\nlem and offers a number of other advantages through a flexible scheme for center \ninsertion. \n\n\f628 \n\nBernd Fritzke \n\n3 THE GROWING NEURAL GAS ALGORITHM \n\nIn the following we consider networks consisting of \n\n\u2022 a set A of units (or nodes). Each unit c E A has an associated reference \nvector We E Rn. The reference vectors can be regarded as positions in input \nspace of the corresponding units. \n\n\u2022 a set N of connections (or edges) among pairs of units. These connec(cid:173)\n\ntions are not weighted. Their sole purpose is the definition of topological \nstructure. \n\nMoreover, there is a (possibly infinite) number of n-dimensional input signals obey(cid:173)\ning some unknown probability density function P(~). \n\nThe main idea of the method is to successively add new units to an initially small \nnetwork by evaluating local statistical measures gathered during previous adapta(cid:173)\ntion steps. This is the same approach as used in the \"growing cell structures\" model \n(Fritzke, 1994b) which, however, has a topology with a fixed dimensionality (e.g., \ntwo or three). \n\nIn the approach described here, the network topology is generated incrementally \nby CHL and has a dimensionality which depends on the input data and may vary \nlocally. The complete algorithm for our model which we call \"growing neural gas\" \nis given by the following: \n\no. Start with two units a and b at random positions Wa and Wb in Rn. \n1. Generate an input signal ~ according to P(~). \n2. Find the nearest unit 81 and the second-nearest unit 82. \n3. Increment the age of all edges emanating from 81. \n4. Add the squared distance between the input signal and the nearest unit in \n\ninput space to a local counter variable: \n\nAerror(8t} = IIWSl - ell 2 \n\n5. Move 81 and its direct topological neighbors1 towards ~ by fractions \n\nEb and En, respectively, of the total distance: \n\nAWs1 = Eb(e - W S1 ) \nAWn = En(~ - w n ) for all direct neighbors n of 81 \n\n6. If 81 and 82 are connected by an edge, set the age of this edge to zero. If \n\nsuch an edge does not exist, create it.2 \n\n7. Remove edges with an age larger than a maz \u2022 If this results in points having \n\nno emanating edges, remove them as well. \n\nIThroughout this paper the term neighbors denotes units which are topological neigh(cid:173)\n\nbors in the graph (as opposed to units within a small Euclidean distance of each other in \ninput space). \n\n2This step is Hebbian in its spirit since correlated activity is used to decide upon \n\ninsertions. \n\n\fA Growing Neural Gas Network Learns Topologies \n\n629 \n\n8. If the number of input signals generated so far is an integer multiple of a \n\nparameter A, insert a new unit as follows: \n\n\u2022 Determine the unit q with the maximum accumulated error. \n\u2022 Insert a new unit r halfway between q and its neighbor f with the \n\nlargest error variable: \n\nWr = 0.5 (wq + wf)' \n\n\u2022 Insert edges connecting the new unit r with units q and f, and remove \n\nthe original edge between q and f. \n\n\u2022 Decrease the error variables of q and f by multiplying them with a \nconstant 0:. Initialize the error variable of r with the new value of the \nerror variable of q. \n\n9. Decrease all error variables by multiplying them with a constant d. \n10. If a stopping criterion (e.g., net size or some performance measure) is not \n\nyet fulfilled go to step 1. \n\nHow does the described method work? The adaptation steps towards the input \nsignals (5.) lead to a general movement of all units towards those areas of the input \nspace where signals come from (P(~) > 0). The insertion of edges (6.) between \nthe nearest and the second-nearest unit with respect to an input signal generates a \nsingle connection of the \"induced Delaunay triangulation\" (see fig. 1b) with respect \nto the current position of all units. \n\nThe removal of edges (7.) is necessary to get rid of those edges which are no longer \npart of the \"induced Delaunay triangulation\" because their ending points have \nmoved. This is achieved by local edge aging (3.) around the nearest unit combined \nwith age re-setting of those edges (6.) which already exist between nearest and \nsecond-nearest units. \n\nWith insertion and removal of edges the model tries to construct and then track \nthe \"induced Delaunay triangulation\" which is a slowly moving target due to the \nadaptation of the reference vectors. \n\nThe accumulation of squared distances (4.) during the adaptation helps to identify \nunits lying in areas of the input space where the mapping from signals to units \ncauses much error. To reduce this error, new units are inserted in such regions. \n\n4 SIMULATION RESULTS \n\nWe will now give some simulation results to demonstrate the general behavior of our \nmodel. The probability distribution in fig. 2 has been proposed by Martinetz and \nSchulten (1991) to demonstrate the non-incremental \"neural gas\" model. It can be \nseen that our model quickly learns the important topological relations in this rather \ncomplicated distribution by forming structures of different dimensionalities. \nThe second example (fig. 3) illustrates the differences between the proposed model \nand the original NG network. Although the final topology is rather similar for both \nmodels, intermediate stages are quite different. Both models are able to identify the \nclusters in the given distribution. Only the \"growing neural gas\" model, however, \n\n\f630 \n\nBernd Fritzke \n\nFigure 2: The \"growing neural gas\" network adapts to a signal distribution which \nhas different dimensionalities in different areas of the input space. Shown are the \ninitial network consisting of two randomly placed units and the networks after 600, \n1800, 5000, 15000 and 20000 input signals have been applied. The last network \nshown is not the necessarily the \"final\" one since the growth process could in prin(cid:173)\nciple be continued indefinitely. The parameters for this simulation were: A = 100, \nEb = 0.2, En = 0.006, a = 0.5, amaz = 50, d = 0.995. \n\ncould continue to grow to discover still smaller clusters (which are not present in \nthis particular example, though). \n\n5 DISCUSSION \n\nThe \"growing neural gas\" network presented here is able to make explicit the impor(cid:173)\ntant topological relations in a given distribution pee) of input signals. An advantage \nover the NG method of Martinetz and Schulten is the incremental character of the \nmodel which eliminates the need to pre-specify a network size. Instead, the growth \nprocess can be continued until a user-defined performance criterion or network size \nis met. All parameters are constant over time in contrast to other models which \nheavily rely on decaying parameters (such as the NG method or the Kohonen feature \nmap). \nIt should be noted that the topology generated by CHL is not an optional feature \n\n\fA Growing Neural Gas Network Learns Topologies \n\n631 \n\n\"neural gas\" and \n\n\"competitive Hebbian learning\" \n\n\"growing neural gas\" \n\n(uses \"competitive Hebbian learning\") \n\n0 \n0 0 0 0 \n\no \n\n0 \n\n00 0 \n\no~oo 000 \n\n\u00b7~~r~: ;-.; .. {]J'. \no \n\nco 00 \n\n0 \n\n~ \n\n0 \n\no \n\n00 \n\n8 \n\n0 \n\n0 90 -\no \n\nV \n\nj \n\nFigure 3: The NG/CHL network of Martinetz and Schulten (1991) and the author's \n\"growing neural gas\" model adapt to a clustered probability distribution. Shown \nare the respective initial states (top row) and a number of intermediate stages. \nBoth the number of units in the NG model and the final number of units in the \n\"growing neural gas\" model are 100. The bottom row shows the distribution of \ncenters after 10000 adaptation steps (the edges are as in the previous row but not \nshown). The center distribution is rather similar for both models although the \nintermediate stages differ significantly. \n\n\f632 \n\nBernd Fritzke \n\nof our method (as it is for the NG model) but an essential component since it is \nused to direct the (completely local) adaptation as well as insertion of centers. It is \nprobably the proper initialization of new units by interpolation from existing ones \nwhich makes it possible to have only constant parameters and local adaptations. \n\nPossible applications of our model are clustering (as shown) and vector quantization. \nThe network should perform particularly well in situations where the neighborhood \ninformation (in the edges) is used to implement interpolation schemes between \nneighboring units. By using the error occuring in early phases it can be determined \nwhere to insert new units to generate a topological look-up table of different density \nand different dimensionality in particular areas of the input data space. \n\nAnother promising direction of research is the combination with supervised learning. \nThis has been done earlier with the \"growing cell structures\" (Fritzke, 1994c) and \nrecently also with the \"growing neural gas\" described in this paper (Fritzke, 1994a). \nA crucial property for this kind of application is the possibility to choose an arbitrary \ninsertion criterion. This is a feature not present, e.g., in the original \"growing neural \ngas\". The first results of this new supervised network model, an incremental radial \nbasis function network, are very promising and we are further investigating this \ncurrently. \n\nReferences \n\nFritzke, B. (1994a). Fast learning with incremental rbf networks. Neural Processing \n\nLetters, 1(1):2-5. \n\nFritzke, B. (1994b). Growing cell structures - a self-organizing network for unsu(cid:173)\n\npervised and supervised learning. Neural Networks, 7(9):1441-1460. \n\nFritzke, B. (1994c). Supervised learning with growing cell structures. In Cowan, J., \nTesauro, G., and Alspector, J., editors, Advances in Neural Information Pro(cid:173)\ncessing Systems 6, pages 255-262. Morgan Kaufmann Publishers, San Mateo, \nCA. \n\nKohonen, T. (1982). Self-organized formation of topologically correct feature maps. \n\nBiological Cybernetics, 43:59-69. \n\nMartinetz, T. M. (1993). Competitive Hebbian learning rule forms perfectly topol(cid:173)\n\nogy preserving maps. In ICANN'93: International Conference on Artificial \nNeural Networks, pages 427-434, Amsterdam. Springer. \n\nMartinetz, T. M. and Schulten, K J. (1991). A \"neural-gas\" network learns topolo(cid:173)\n\nIn Kohonen, T., Makisara, K, Simula, 0., and Kangas, J., editors, \n\ngies. \nArtificial Neural Networks, pages 397-402. North-Holland, Amsterdam. \n\nMartinetz, T. M. and Schulten, K J. (1994). Topology representing networks. \n\nNeural Networks, 7(3):507-522. \n\nOja, E. (1982). A simplified neuron model as a principal component analyzer. \n\nJournal of Mathematical Biology, 15:267-273. \n\nSanger, T. D. (1989). An optimality principle for unsupervised learning. In Touret(cid:173)\n\nzky, D. S., editor, Advances in Neural Information Processing Systems 1, pages \n11-19. Morgan Kaufmann, San Mateo, CA. \n\n\f", "award": [], "sourceid": 893, "authors": [{"given_name": "Bernd", "family_name": "Fritzke", "institution": null}]}