{"title": "Digital Realisation of Self-Organising Maps", "book": "Advances in Neural Information Processing Systems", "page_first": 728, "page_last": 738, "abstract": "", "full_text": "728 \n\nDIGITAL REALISATION OF SELF-ORGANISING MAPS \n\nNigel M. Allinson \n\nM~rtin J. Johnson \n\nDepartment of Electronics \n\nUniversity of York \n\nYork \n\nY015DD \nEngland \n\nABSTRACT \n\nKevin J. Moon \n\nThe method \n\nis presented. \n\nA digital realisation of two-dimensional self-organising feature \nmaps \nis based on subspace \ntechnique. Weight vector \nclassification using an n-tuple \napproximation and orthogonal projections to produce a winner(cid:173)\ntakes-all network are also discussed. Over one million effective \nbinary weights can be applied in 25ms using a conventional \nmicrocomputer. Details of a number of image recognition tasks, \nincluding character \nrecognition and object centring, are \ndescribed. \n\nINTRODUCTION \n\nBackground \n\nThe overall aim of our work is to develop fast and flexible systems for image \nrecognition, usually for commercial inspection tasks. There is an urgent need for \nautomatic learning systems in such applications, since at present most systems \nemploy heuristic classification techniques. This approach requires an extensive \ndevelopment effort for each new application, which exaggerates implementation \ncosts; and for many tasks, there are no clearly defined features which can be \nemployed for classification. Enquiring of a human expert will often only produce \n\"good\" and \"bad\" examples of each class and not the underlying strategies which \nhe may employ. Our approach is to model in a quite abstract way the perceptual \nnetworks found in the mammalian brain for vision. A back-propagation network \ncould be employed to generalise about the input pattern space, and it would find \nsome useful representations. However, there are many difficulties with this \napproach, since the network structure assumes nothing about the input space and \nit can be difficult to bound complicated feature clusters using hyperplanes. The \nmammalian brain is a layered structure, and so another model may be proposed \nwhich involves the application of many two-dimensional feature maps. Each map \ntakes information from the output of the preceding one and performs some type of \nclustering analysis in order to reduce the dimensionality of the input information. \nFor successful recognition, similar patterns must be topologically close so that \n\n\fDigi tal Realisation of Self-Organising Maps \n\n729 \n\nnovel patterns are in the same general area of the feature map as the class they \nare most like. There is therefore a need for both global and local ordering \nprocesses within the feature map. The process of global ordering in a topological \nmap is termed, by Kohonen (1984), as self-organisation. \nIt Is important to realize that all feedforward networks perform only one function, \nnamely the labelling of areas in a pattern space. This paper concentrates on a \ntechnique for realising large, fast, two-dimensional feature maps using a purely \ndigital implementation. \n\nFigure 1. Unbounded Feature Map of Local Edges \n\nSelf Organisation \n\nGlobal ordering needs to adapt the entire neural map, but local ordering needs \nonly local information. Once the optimum global organisation has been found, \nthen only more localised ordering can improve the topological organisation. This \nprocess is the basis of the Kohonen clustering algorithm, where the specified area \n\n\f730 \n\nJohnson, Allinson and Moon \n\nof adaption decreases with time to give an increasing local ordering. It has been \nshown that this approach gives optimal ordering at global and local levels (Oja, \n1983). It may be considered as a dimensionality reduction algorithm, and can be \nused as a vector quantiser. \n\nAlthough Kohonen's self-organising feature maps have been successfully applied \nto speech recognition (Kohonen, 1988; Tattersall et aI., 1988), there has been little \nInvestigation in their application for image recognition. Such feature maps can be \nused to extract various image primitives, such as textures, localised edges and \nterminations, at various scales of representations (Johnson and Allinson, 1988). \n\nAs a simple example, a test image of concentric circles is employed to construct a \nsmall feature map of localised edges (Figure 1). The distance measure used is the \nnormalised dot product since in general magnitude information is unimportant. \nUnder these conditions, each neuron output can be considered a similarity \nmeasure of the directions between the input pattern and the synaptic weight \nvector. This map shows that similar edges have been grouped together and that \ninverses are as far from each other as possible. \n\nDIGITAL IMPLEMENTATION \n\nSub-Space Classification \n\nAlthough a conventional serial computer is normally thought of as only performing \none operation at a time, there is a task which it can successfully perform involving \nparallel computation. The action of addressing memory can be thought of as a \nhi&JhlY parallel process, since it involves the comparison of a word, W, with a set ~ \n2 others where N is the number of bits in W. \nparallel computations - each being a single match. This can be exploited to speed \nup the simulation of a network by using a conversion between conventional \npattern space labelling and binary addressing. \n\nIt is, in effect, performing 2 \n\nFigure 2 shows how the labelling of two-dimensional pattern space is equivalent to \nthe partitioning of the same space by the decision regions of a multiple layer \nperceptron. If each quantised part of the space is labelled with a number for each \nclass then all that is necessary is for the pattern to be used as an address to give \nthe stored label (i.e. the response) for each class. These labels may form a cluster \nof any shape and so multiple layers are not required to combine regions. \nThe apparent flaw in the above suggestion is that for anything other than a trivial \nproblem, the labelling of every part of pattern space is impractical. For example a \n32 x 32 input vector would require a memory of 21024 words per unit! What is \nneeded is a coding system which uses some basic assumptions about patterns in \norder to reduce the memory requirements. One assumption which can be made \nis that patterns will cI uster together into various classes. As early as 1959, a \nmethod known as the n-tuple technique was used for pattern recognition (Bledsoe \nand Browning, 1959). This technique takes a number of subspaces of the pattern \n\n\fDigital Realisation of Self-Organising Maps \n\n731 \n\nPERCEPTRON \n\nx2 \n\nc1/c2 \n\n~=C1 \n\n=c2 \n\nx1 \n\nx2 \n\nl \n\ni' \n\n~~ I~r.-r~ \n\n.\u2022 \" \n\nI' I,' ,I,' ~I,J::-I' 1.-1-1\u00b7' 1' 1\u00b7 I\u00b7' \nI\u00b7 ~~t:' ~~~ I.;:I~'I~I-::I ' I, \nI\u00b7 ~ ~~ \nI .\u00b7 ~ \nI, I.' . ~ \nI\"~ ~~~ ~~~~ ~ \n... 010 I. ' .' I, ~ \n~ \nI\u00b7 \u2022 .t It I- . , \n\u2022 I\u00b7 ,-\n1'1 \u2022 '. , 1,' 1' \n\u2022 \u2022 ~'I \n. ~. \nf .ltl \n1411. -\nI- I. \u2022\u2022 \nItili \u2022 \ni\u00b7 - It It 11111. \n\nI.' \nI~. , \n1:\\ , \n~~ ~~ , ' \n~ \n\nII \u2022 I_ i \" \" 1--1' ~~ \n\n~ .' \n\n~ . \n\nI\" \n1.'1, \n\nI,' \nI\u00b7 \n\nII \u2022 . ' I' \n\n~ \n\n\u2022 \nI,' , \n\nLABELING \n\nThe labeling of a quantized \nsubspace is equivalent to \nthe partitioning of pattern \nspace by the multi-layer \nperceptron. \n\n\u00b71\u00b7 1\u00b71\u00b71-\n\n\u2022 = Class 1 0 = Class 2 \n\nFigure 2. Comparison of Perceptron and Sub-Space Classification \n\nspace and uses the sum of the resultant labels as the overall response. This gives \na set of much smaller memories and inherent in the coding method is that similar \npatterns will have identical labels. \n\nFor example, assume a 16 bit pattern - 0101101001010100. Taking a four-bit \nsample from this, say bits 0-3, giving 0100. This can be used to address a 16 word \nmemory to produce a single bit. If this bit is set to 1, then it is in effect labelling all \npatterns with 0100 as their first four bits; that is 4096 patterns of the form \nxxxxxxxxxxxx0100. Taking a second sample, namely bits 4-7 (0101). This labels \nxxxxxxxx0101xxxx patterns, but when added to the first sample there will be 256 \npatterns labelled twice (namely, xxxxxxxx01010100) and 7936 (Le. 8192-256) \nlabelled once. \nThe third four-bit sample produces 16 patterns (namely, \n\n\f732 \n\nJohnson, Allinson and Moon \n\nxxx(101001010100) labelled three times. The fourth sample produces only one \npattem 0101101001010100, which has been labelled four times. If an input pattern \nis applied which differs from this by one bit, then this will now be labelled three \ntimes by the samples; if it differs by two bits, it will either be labelled two or three \ntimes depending on whether the changes were in the same four-bit sample or not. \nThus a distance measure is implicit in the coding method and reflects the \nassumed clustering of patterns. Applying this approach to the earlier problem of a \n32 x 32 binary input vector and taking 128 eight-bit samples results in a distance \nmeasure between 0 and 128 and uses 32K bits of memory per unit. \n\nWeight Vector Approximation \n\nIt is possible to make an estimate of the approximate weight vector for a particular \nsample from the bit table. For simplicity, consider a binary image from which t \nsamples are taken to form a word, w, where \n\nw = xo + 2x1 + .... + 2 \n\nt-1 \n\n~-1 \n\nThis word can be used to address a vector W. Every bit in W[b] which is 1 either \nincreases the weight vector probability where the respective bit in the address is \nset, or decreases if it is clear. Hence, if BIT [w,i] is the ith bit of wand A[i] is the \ncontents of the memory {O, 1} then, \n\nW[b] = E A[i] (2 BIT(b,i) -1) \n\n2t-1 \n\ni = 0 \n\nThis represents an approximate measure of the weight element. Table 1 \ndemonstrates the principle for a four-bit sample memory. Given randomly \ndistributed inputs this binary vector is equivalent to the weight vector [2, 4, 0, -2]. \n\nIf there is a large number of set bits in the memory for a particular unit then that \nwill always give a high response - that is, it will become saturated. However, if \nthere are too few bits set, this unit will not rfiSpond strongly to a general set of \npatterns. The number of bits must, therefore, be fixed at the start of training, \ndistributed randomly within the memory and only redistribution of these bits \nallowed. Set bits could be taken from any other sample, but some samples will be \nmore important than others. The proportion of 1's in an image should not be used \nas a measure, otherwise large uniform regions will be more significant than the \npattern detail. This is a form of magnitude independent operation similar to the \nuse of the normalised dot product applied in the analogue approach and so bits \nmay only be moved from addresses with the same number of set bits as the \ncurrent address. \n\n\fDigital Realisation of Self-Organising Maps \n\n733 \n\nTABLE 1. Weight Vector Approximation \n\nAddress \nX3 x2 x, \n\nXo \n\nWeight change \nA W3 W2 W, W \n\n0 \n\nAddress \nx3 x2 x, \n\nXo \n\nWeight change \nA W3 W2 W, Wo \n\n0 0 \n\n0 0 \n\n1 \n\n0 \n\n1 \n\n0 \n\n1 \n\n0 \n\n0 \n\n0 \n\n1 \n\n1 \n\n0 \n\n1 \n\n0 \n\n1 0 \n\n0 0 \n\n1 0 \n\n0 \n\n1 \n\n1 0 \n\n1 0 \n\n+ + \n\n1 \n\n0 \n\n1 1 \n\n+ -\n\n1 1 0 \n\n+ + -\n\n1 \n\n1 \n\n1 \n\n1 \n\n1 \n\n1 \n\n0 \n\n1 \n\n0 \n\n1 0 \n\n1 \n\n1 \n\n1 \n\n0 \n\n0 \n\n0 \n\n1 \n\n1 \n\n1 \n\n1 \n\n+ -\n\n+ + -\n+ + -\n\n+ \n\n+ + + \n\n-\n\n+ + + + \n\n0 0 \n\n0 \n\n0 0 \na 0 \n\n1 \n\n1 \n\n1 0 \n\n1 0 \n\n0 \n\n0 \n\n0 \n\n1 \n\n1 0 \n\n0 1 1 1 \n\nEquivalent weight vector \n\n2 4 0-2 \n\nOrthogonal Projections \n\nIn order to speed up the simulation further, instead of representing each unit by a \nsingle bit in memory, each unit can be represented by a combination of bits. \nHence many calculations can be effectively computed in parallel. The number of \nunits which require a 1 for a particular sample will always be relatively small, and \nhence these can be coded. The coding method employed is to split the binary \nword, W, into x and y fields. These projection fields address a two dimensional \nmap and so provide a fast technique of approximating the true content of the \nmemory. The x bits are summed separately to the y bits, and together they give a \ngood estimate of the unit co-ordinates with the most bits set in x and in y. This \nmap becomes, in effect, a winner-takes-all network. The reducing neighbourhood \nof adaption employed in the Kohonen algorithm can also be readily incorporated \nby applying an overall mask to this map during the training phase. \n\nThough only this output map is required during normal application of the system \nto image recognition tasks, it is possible to reconstruct the distribution of the two(cid:173)\ndimensional weight vectors. Figure 3, using the technique illustrated in Table 1, \nshows this weight vector map for the concentric circle test image applied \n\n\f734 \n\nJohnson, Allinson and Moon \n\nFigure 3. Reconstructed Feature Map of Local Edges \n\npreviously in the conventional analogue approach. This is a small digitised map \ncontaining 32 x 32 elements each with 16 x 16 input units and can be applied, \nusing a general purpose desktop microcomputer running at 4 mips, in a few \nmilliseconds. \n\nAPPLICATION EXAMPLES \n\nCharacter Recognition \n\nThough a long term objective remains the development of general purpose \ncomputer vision systems, with many layers of interacting feature maps together \nwith suitable pre- and post-processing, many commercial tasks require decisions \nbased on a constricted range of objects - that is their perceptual set is severely \nlimited. However, ease of training and speed of application are paramount. An \nexample of such an application involves the recognition of characters. \n\nFigures 4 and 5 show an input pattern of hand-drawn A's and B's. The network, \nusing the above digital technique, was given no information concerning the input \nimage and the input window of 32 x 32 pixels was placed randomly on the image. \n,The network took less than one minute to adapt and can be applied in 25 ms. This \nnetwork is a 32 x 32 feature map of 32 x 32 elements, thus giving over one million \neffective weights. The output map forms two distinct clusters, one for A's in the \ntop right corner of the map (Figure 4), and one for B's in the bottom left corner \n(Figure 5). If further characters are introduced in the input image then the output \nmap will, during the training phase, self-organise to incorporate them. \n\n\fDigital Realisation of Self-Organising Maps \n\n735 \n\nFigure 4. Trained Network Response for 'A' in Input Window \n\nFigure 5. Trained Network Response for 'B' in Input Window \n\n\f736 \n\nJohnson, Allinson and Moon \n\nCorrupted Images \n\nOnce the maximum response from the map is known, then the parts of the input \nwindow which caused it can be reconstructed to provide a form of ideal input \npattern. The reconstructed input pattern is shown in the figures beneath the input \nimage. This reconstruction can be employed to recognise occuluded patterns or \nto eliminate noise in subsequent input images. \n\nFigure 6. Trained Network Response for Corrupted 'A' in Input Window. \nReconstructed Input Pattern Shown Below Test Image \n\nFigure 6 shows the response of the network, trained on the input image of Figures \n4 and 5, to a corrupted image of A's and B's. It has still managed to recognise the \ninput character as an A, but the reconstructed version shows that the extra noise \nhas been eliminated. \n\nObject Centring \n\nThe centering of an object within the input window permits the application of \nconformant mapping strategies, such as polar exponential grids, to be applied \nwhich yields scale and rotation invariant recognition. The same network as \nemployed in the previous example was used, but a target position for the \nmaximum network response was specified and the network was adapted half-way \nbetween this and the actual maximum response location. \n\n\fDigital Realisation of Self-Organising Maps \n\n737 \n\nFigure 7. Trained Network Response for Off-Centred Character. Input Window is \n\nLow-Pass Filtered as shown. \n\nFigure 7 shows such a network. When the response is in the centre of the output \nmap then an input object (character) is centred in the recognition window. In the \nexample shown, there is an off-centred response of the trained network for an off(cid:173)\ncentred character. This deviation is used to change the position of the input \nwindow. Once centering has been achieved, object recognition can occur. \n\nCONCLUSIONS \n\nThe application of unsupervised feature maps for image recognition has been \ndemonstrated. The digital realisation technique permits the application of large \nmaps. which can be applied in real time using conventional microcomputers. The \nuse of orthogonal projections to give a winner-take-all network reduces memorY \nrequirements by approximately 3D-fold and gives a computational cost of O(n 1/2), \nwhere n is the number of elements in the map. The general approach can be \napplied in any form of feedforward neural network. \n\nAcknowledgements \n\nThis work has been supported by the I nnovation and Research Priming Fund of \nthe University of York. \n\n\f738 \n\nJohnson, Allinson and Moon \n\nReferences \n\nW. W. Bledsoe and I. Browning. Pattern Recognition and Reading by Machine. \n\nProc. East. Joint Compo Conf., 225-232 (1959). \n\nM. J. Johnson and N. M. Allinson. An Advanced Neural Network for Visual Pattern \n\nRecognition. Proc. UKIT 88, Swansea, 296-299 (1988). \n\nT. Kohonen. Self Organization and Associative Memory. Springer-Vertag, Bertin \n\n(1984). \n\nT. Kohonen. The 'Neural' Phonetic Typewriter. Computer21,11-22 (1988). \n\nE. Oja. Subspace Methods of Pattern Recognition. Research Studies Press, \n\nLetchworth (1983). \n\nG. D. Tattersall, P. W. Linford and R. Linggard. Neural Arrays for Speech \n\nRecognition. Br. Telecom Techno/. J. Q. 140-163 (1988). \n\n\f", "award": [], "sourceid": 104, "authors": [{"given_name": "Nigel", "family_name": "Allinson", "institution": null}, {"given_name": "Martin", "family_name": "Johnson", "institution": null}, {"given_name": "Kevin", "family_name": "Moon", "institution": null}]}