{"title": "VLSI Implementation of TInMANN", "book": "Advances in Neural Information Processing Systems", "page_first": 1046, "page_last": 1052, "abstract": null, "full_text": "VLSI Implementation of TInMANN \n\nMatt Melton Tan Phan Doug Reeves Dave Van den Bout \nElectrical and Computer Engineering Dept. \nNorth Carolina State University \nRaleigh, NC 27695-7911 \n\nAbstract \n\nA massively parallel, all-digital, stochastic architecture - TlnMAN N -\nis \ndescribed which performs competitive and Kohonen types of learning. A \nVLSI design is shown for a TlnMANN neuron which fits within a small, \ninexpensive MOSIS TinyChip frame, yet which can be used to build larger \nnetworks of several hundred neurons. The neuron operates at a speed of \n15 MHz which allows the network to process 290,000 training examples \nper second. Use of level sensitive scan logic provides the chip with 100% \nfault coverage, permitting very reliable neural systems to be built. \n\n1 \n\nINTRODUCTION \n\nUniprocessor simulation of neural networks has been the norm, but benefiting from \nthe parallelism in neural networks is impossible without specialized hardware. Most \nhardware-based neural network simulators use a single high-speed AL U or multiple \nDSP chips connected through communication buses. The first approach does not \nallow exploration of the effects of parallelism, while the complex processors used in \nthe second approach hinder investigations into the minimal hardware needs of an \nimplementation. Such knowledge can be gained only if an implementation possess \nthe same characteristics as a neural network -\ni.e. that it be built from many \nsimple, cooperating processing elements. However, constructing and connecting \nlarge numbers of processing elements (or neuron,,) is difficult. Highly-connected, \ndensely-packed analog neurons can be practically realized on a single VLSI chip, \nbut interconnecting several such chips into a larger system would require many I/O \npins. In addition, external parasitic capacitances and noise can affect the reliable \ntransfer of data between the chips. These problems are avoided in neural systems \n\n1046 \n\n\fVLSI Implementation of TInMANN \n\n1047 \n\nbased on noise-resistant digital signals that can be multiplexed over a small number \nof wires. \n\nThe next section ofthis paper describes the basic theory, algorithm, and architecture \nof the TlnMANN digital neural network. The third section illustrates the VLSI \ndesign of a TlnMANN neuron that operates at 15 MHz, is completely testable, and \ncan be cascaded to form large Kohonen or competitive networks. \n\n2 TlnMANN ALGORITHM AND ARCHITECTURE \n\nIn the competitive learning algorithm (Rumelhart, 1986), training vectors oflength \nW, V= (Vi, V2,\"\" vw), are presented to a winner-take-all network of N neurons. \nEach neuron i possesses a weight vector of length W, Wi = (Wil' Wi2, ... , WiW), \nand a winning neuron k is selected as the one whose weight vector is closest to the \ncurrent training vector. Neuron k is then moved closer to the training vector by \nmodifying its weights as follows \n\nW1cj \u00a2= Wlcj + f\u00b7 (Vj - W1cj) 0 < f < I, 1 ~ j ~ W. \n\nH the network is trained with a set of vectors that are naturally clustered into \nN groups, then each neural weight vector will eventually reside in the center of a \ndifferent group. Thereafter, an input vector applied to the network is encoded by \nthe neuron that has been sensitized to the cluster containing the input. \n\nKohonen's self-organizing feature maps (Kohonen, 1982) are trained using a gener(cid:173)\nalization of competitive learning where each neuron i is provided with an additional \nX-element vector, Xi = (Zit, Z'2, ... , ZiX), that defines its topological position with \nrelation to the other neurons in the network. As before, neuron k of the N neurons \nwins if it is the closest to the current training vector, but the weight adjustment \nnow affects all neurons as determined by a decreasing function f of their topological \ndistance from neuron k and a threshold distance dr: \nWij \u00a2= Wij + \u20ac \n\u2022 f( II X1c - Xi II, dr) . (Vj - Wij) 0 < f < I, 1 < j < W, 1 ~ i < N . \nThis function allows the winning neuron to drag its neighbors toward a given section \nof the input space so that topologically close neurons will eventually react similarly \nto closely spaced input vectors. \n\nThe integer Markovian learning algorithm of Figure 1 simplifies the Kohonen learn(cid:173)\ning procedure by noting that the neuron weights slowly integrate the effects of \nstimuli. This integration can be done by stochastically updating the weights with \na probability proportional to the neural input. The stochastic update of the neural \nweights is done by generating two uncorrelated random numbers, Ri and R 2 , on the \ninterval [0, dr] that each neuron compares to its distance from the current training \nvector and its topological distance from the winning neuron, respectively. A neuron \nwill try to increment or decrement the elements of its weight vector closer to the \ntraining vector if the absolute value of the intervening distance is greater than R i , \nthus creating a total movement proportional to the distance when averaged over \nmany cycles. This movement is inversely modulated by the topological distance \nto the winning neuron k via a comparison with R2. The total effect produced by \nthese two stochastic processes is equivalent to that produced in Kohonen's original \nalgorithm, but only simple additive operations are now needed. Figure 2 shows \n\n\f1048 Melton, Phan, Reeves, and \\an den Bout \n\nfor( i \u00a2= 1 j i :s; N j i \u00a2= i + 1 ) \n\nfor( i \u00a2= 1 i i =5 Wi j \u00a2= j + 1 ) \n\nWi; \u00a2= random() \n\nfor( vE {training set} ) \n\nparallelfor( all neurons i ) \n\ndi \u00a2= Ci \nfor( i \u00a2= 1; j =5 Wi j \u00a2= j + 1 ) \n\n~\u00a2=di+lvi-Wiil \n\nk\u00a2=1 \nfor( i \u00a2= 1 i i =5 N i i \u00a2= i + 1 ) \n\nif( di < die ) \n\nk\u00a2=i \n\nparallelfor( all neurons i ) \n\ndi \u00a2= 0 \nfor( j \u00a2= 1 i j ~ X; j \u00a2= j + 1 ) \n~ \u00a2= ~ + IZii - zleil \n\nfor( j \u00a2= 1i j ~ Wi j \u00a2= j + 1 ) \n\nRl \u00a2= random( ch) \nR2 \u00a2= random( ch) \nparallelfor( all neurons i ) \n\n/* lItochalltic weight update * / \nif( Iv; - Wiil > Rl and ds =5 R2 ) \n\nwii \u00a2= wii+ sign(vi - Wi;) \n\nFigure 1: The integer Markovian learning algorithm. \n\nour simplified algorithm operates correctly on a problem that has often been solved \nusing Kohonen networks. \n\nThe integer Markovian learning algorithm is practical to implement since only sim(cid:173)\nple neurons are needed to do the additive operations and a single global bus can \nhandle all the broadcast transmissions. The high-level architecture for such an im(cid:173)\nplementation is shown in Figure 3. TlnMANN consists of a global controller that \ncoordinates the actions of a linear array of neurons. The neurons contain circuitry \nfor comparing and updating their weights, and for enabling and disabling them(cid:173)\nselves during the conditional portions of the algorithm. The network topology is \nconfigured by arranging the neurons in an X-dimensional space rather than by \nstoring a graph structure in the hardware. This allows the calculation of the topo(cid:173)\nlogical distance between neurons using the same circuitry as is used in the weight \ncalculations. TlnMANN performs the following operations for each training vector: \n\n1. The global controller broadcasts the W elements of v while each neuron accu(cid:173)\n\nmulates in A the absolute value of the difference between the elements of its \nweight vector (stored in the small, local RAM) and those of the training vector. \n\n2. The global controller does a binary search for the neuron closest to the training \n\n\fVLSI Implementation of TInMANN \n\n1049 \n\nr \n\nII \n\nI \n\nFigure 2: The evolution of 100 TlnMANN neurons when learning a \n\ntwo(cid:173)\n\ndimensional vector quantization. \n\nvector by broadcasting distance values bisecting the range containing the win(cid:173)\nning neuron. The neurons do a comparison and signal on the wired-OR status \nline if their distance is less than the broadcast value (i.e. the carry bit c is set). \nNeurons with distances greater than the broadcast value are disabled by reset(cid:173)\nting their e flags. However, if no neuron is left enabled, the controller restores \nthe enable bits and adjusts its search region (this action is needed on ~ M /2 \nof the search steps, where M is the machine word length used by TlnMANN). \nThe last neuron left enabled is the winner of the competition (ties are resolved \nby the conditional logic in each neuron). \n\n3. The topological vector of the winning neuron is broadcast to the other neurons \nthrough gate G. The other neurons accumulate into A and store into Tl the \nabsolute value of the difference between their topological vectors and that of \nthe winning neuron. \n\n4. Random number R2 is broadcast by the global controller and those neurons \nhaving topological distances in Tl greater than R2 are disabled. The remaining \nneurons each compute the distance between a component of their weight vector \nand that of the training vector broadcast by the global controller. All neurons \nwhose calculated distances are greater than random number Rl broadcast by \nthe controller will increment or decrement their weight elements depending \non the carry bits left in the c flags during the distance calculations. Then \nall neurons are re-enabled and this step is repeated for the remaining W - 1 \nelements of the training vector. \n\nA single training vector can be processed in 11 W + X + 2.5M + 15 clock cycles \n(Van den Bout, 1989). A word-width of 10 bits and a clock cycle of 15 MHz would \nallow TlnMANN to learn at a rate of 200,000 three-dimensional vectors per second \nor 290,000 one-dimensional vectors per second. \n\n3 THE VLSI IMPLEMENTATION OF TlnMANN \n\nFigure 4 is a block diagram for the VLSI TlnMANN neuron built from the compo(cid:173)\nnents listed in Table 1. The design was driven by the following requirements: \n\nSize: The TlnMANN neuron had to fit within a MOSIS TinyChip frame, so we \nused small, dense, ripple-carry adders. A 10-bit word size was selected as a \n\n\f1050 Melton, Phan, Reeves, and \\an den Bout \n\nFigure 3: The TlnMANN architecture. \n\nTable 1: Components of the VLSI TlnMANN neuron. \n\nI Component \nABDiff \n\nP \n\nCFLAG \nPASum \n\nA \n\n8-word memory \n\nMUX \nEFLAG \n\nFUnction \n\n10-bit, two's-complement, npple-borrow subtractor that calculates \ndifferences between data in the neuron and data broadcast on the \nglobal bus (B_Bus). \n10-bit pipeline register that temporarily stores the difference out(cid:173)\nput by ABDitf. \nRecords the sign bit of the difference stored in P. \n10-bit, two's-complement, ripple-carry adder/subtract or that adds \nor subtracts P from the accumulator depending on the sign bit in \nCFLAG. This implements the absolute value function. \nAccumulates the absolute values from PASum to form the Manhat(cid:173)\ntan distance between a neuron and a training vector. \nStores the weight and topology vectors, the con6cience register (De(cid:173)\nSieno, 1988), and one working register. \nSteers the output of A or the memory to the input of ABDitf. \nStores the enable bit used for conditionally controlling the neuron \nfunction during the binary search and weight update phases. \n\n\fVLSI Implementation of TInMANN \n\n1051 \n\n!Len \n\na..,path \n\nramAl \n\nerr \n\nFigure 4: Block Diagram of the VLSI TlnMANN neuron. \n\ncompromise between saving area and retaining numeric precision. The multi(cid:173)\nplexer was added so that A could be used as another temporary register. The \nneuron logic was built with the OASIS silicon compiler (Kedem, 1990), but the \nmemory was hand-crafted to reduce its area. In the final TlnMANN neuron, \n4000 transistors are divided between the arithmetic logic (7701' x 13001') and \nthe memory (7101' x 11601')' \n\nthe \n\ntotal \n\nExpandability: The use of broadcast communications reduces \n\nTlnMANN chip I/O to only 35 pins. This low connectivity makes it practi(cid:173)\ncal to build large Kohonen networks. At the chip level, the use of a silicon \ncompiler lets us expand the design if more silicon area becomes available. For \nexample, the word-size could be readily expanded and the layout automatically \nregenerated by changing a single-statement in the hardware description. Also, \nhigher-dimensional vector spaces could be supported by adding more memory. \nSpeed: In the worst case, the memory access time is 12 ns, each adder delay is \n45 ns, and the write time for A is 10 ns. This would have limited TlnMANN \nto a top speed of 9 MHz. P was added to break the critical path through the \nadders and bring the clock frequency to 15 MHz. At the board level, the ripple \nof status information through the OR gates is sped up by connecting the status \nlines through an OR-tree. \n\nTestability: To speed the diagnosis of system failures caused by defective chips, \nthe TlnMAN N neuron was made 100% testable by building EFLAG, CFLAG, P, \nand A from level-sensitive scannable latches. Test patterns are shifted into the \nchip through the scanJn pin and the results are shifted out through scan_out. \nAll faults are covered by only 27 test patterns. A 100% testable neural system \nis built by concatenating the scan-in and scan_out pins of all the chips. \n\n\f1052 Melton, Phan, Reeves, and \\an den Bout \n\nFigure 5: Layout of the TlnMANN neuron. \n\nEach component of the TlnMANN neuron was extensively simulated to check for \ncorrect operation. To test the chip I/O, we performed a detailed circuit simulation \nof two TlnMAN N neurons organized as a competitive network. The simulation \ndemonstrated the movement of the two neurons towards the centroids of two data \nclusters used to provide training vectors. \n\nFour of the TlnMANN neurons in Figure 5 were fabricated by MOSIS. Using the \nbuilt-in scan path, each was found to function at 20 MHz (the maximum speed of \nour tester) . These chips are now being connected into a linear neural array and \nattached to a global controller. \n\nReferences \nD. E. Van den Bout and T. K. Miller m. \"TInMANN: The Integer Markovian \nArtificial Neural Network\". In IJCNN, pages II:205-II:211, 1989. \n\nD. DeSieno. \"Adding a Conscience to Competitive Learning\". In IEEE Interna(cid:173)\ntional Conference on Neural NetworklJ, pages 1:117-1:124, 1988. \n\nG. Kedem, F. Brglez, and K. Kozminski. \"OASIS: A Silicon Compiler for \nRapid Implementation of Semi-custom Designs\". In International WorklJhop on \nRapid SYlJtemlJ Proto typing, June 1990. \n\nT. Kohonen. \"Self-Organized Formation of Topologically Correct Feature Maps\" . \nBiological CyberneticlJ, 43:56-69, 1982. \nD. Rumelhart and J. McClelland. Parallel Dilltributed ProcelJlJing: Ezplorations \nin the Microstructure of Cognition, chapter 5. MIT Press, 1986. \n\n\f", "award": [], "sourceid": 364, "authors": [{"given_name": "Matt", "family_name": "Melton", "institution": null}, {"given_name": "Tan", "family_name": "Phan", "institution": null}, {"given_name": "Doug", "family_name": "Reeves", "institution": null}, {"given_name": "Dave", "family_name": "Van den Bout", "institution": null}]}