{"title": "A Parallel Analog CCD/CMOS Signal Processor", "book": "Advances in Neural Information Processing Systems", "page_first": 748, "page_last": 755, "abstract": null, "full_text": "A Parallel Analog CCD/CMOS Signal Processor \n\nCharles F. Neugebauer \n\nAmnon Yariv \n\nDepartment of Applied Physics \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\nAbstract \n\nA CCO based signal processing IC that computes a fully parallel single \nquadrant vector-matrix multiplication has been designed and fabricated with a \n2j..un CCO/CMOS process. The device incorporates an array of Charge \nCoupled Devices (CCO) which hold an analog matrix of charge encoding the \nmatrix elements. Input vectors are digital with 1 - 8 bit accuracy. \n\n1 INTRODUCTION \n\nVector-matrix multiplication (VMM) is often used in neural network theories to describe \nthe aggregation of signals by neurons. An input vector encoding the activation levels of \ninput neurons is multiplied by a matrix encoding the synaptic connection strengths to \ncreate an output vector. The analog VLSI architecture presented here has been devised to \nperfonn the vector-matrix multiplication using CCO technology. The architecture \ncalculates a VMM in one clock cycle, an improvement over previous semiparallel devices \n(Agranat et al., 1988), (Chiang. 1990). This architecture is also useful for general signal \nprocessing applications where moderate resolution is required, such as image processing. \n\nAs most neural models have robust behavior in the presence of noise and inaccuracies, \nanalog VLSI offers the potential for highly compact neural circuitry. Analog \nmultiplication circuitry can be made much smaller than its digital equivalent, offering \nsubstantial savings in power and IC size at the expense of limited accuracy and \nprogrammability. Oigitall/O, however, is desirable as it allows the use of standard \nmemory and control circuits at the system level. The device presented here has digital \ninput and analog output and elucidates all relevant perfonnance characteristics including \n\n748 \n\n\fA Parallel Analog CCD/CMOS Signal Processor \n\n749 \n\naccuracy, speed, power dissipation and charge retention of the VMM. In practice. on-chip \ncharge domain AID converters are used for converting analog output signals to facilitate \ndigital communication with off-chip devices. \n\nMatrix Charge \n\n.. = -~ > -= Q. -= 0 \n\nI \n\n2 \n\nI . \nI \n\nU. \nJ \n\nColumn Gate \n\nInput Vector \n\nFigure 1: Simplified Schematic of CID Vector Matrix Multiplier \n\n2 ARCHITECTURE DESCRIPTION \n\nThe vector-matrix multiplier consists of a matrix of CCD cells that resemble Charge \nInjection Device (CID) imager pixels in that one of the cell's gates is connected vertically \nfrom cell to cell fonning a column electrode while another gate is connected horizontally \nfonning a row electrode. The charge stored beneath the row and column gates encodes \nthe matrix. A simplified schematic in Figure 1 shows the array organization. \n\n2.1 BINARY VECTOR MATRIX MULTIPLICATION \n\nIn its most basic configuration, the VMM circuit computes the product of a binary input \nvector, Uj' and an analog matrix of charge. The computation done by each CID cell in \nthe matrix is a multiply-accumulate in which the charge, Qij' is multiplied by a binary \ninput vector element, Uj' encoded on the column line and this product is summed with \nother products in the same row to fonn the vector product, Ii, on the row lines. \nMultiplication by a binary num ber is equivalent to adding or not adding the charge at a \n\n\f750 \n\nNeugebauer and Yariv \n\nparticular matrix element to its associated row line. \n\nThe matrix element operation is shown in Figure 2 which displays a cross-section of one \nof the rows with the associated potential wells at different times in the computation. \n\nMatrix Charge \n\nColumn Gate \n\nV (out) \n\nL\\ 1 \n\n\\ \n\n+lOV \n\nOV \n\ni\"\",l\",~\"\"J\"\",. \n\nOV \n\nV row+ Q/C \n\nI IER~ l(floating) \n\nY2ZlZi;ZZU~ QV22lZUA \n\n+10V \n\nV row \n\nR\"\",l\",~\"\"J2~~:!~ting) \n\n+10V \n\nV row+ Q/C \n\nrzz,Laarwul2!:ting) \n\n.\".\".\".\".\" . \n. \".\".\".\".\". \n\n................. \n\n(c) \n\n(d) \n\n+10V \n\nOV \n\nFigure 2: CID Cell Operation \n\nIn the initial state, prior to the VMM computation, the matrix of charges Qij is moved \n\n\fA Parallel Analog CCO/CMOS Signal Processor \n\n751 \n\nbeneath the column electrodes by placing a positive voltage on all column lines, shown \nin Figure 2(a).. A positive voltage creates a deep potential well for electrons. At this \npoint, the row lines are reset to a reference voltage, V row' by FETs Q 1 and then \ndisconnected from the voltage source, shown in Figure 2(b). The computation occurs \nwhen the column lines are pulsed to a negative voltage corresponding to the input vector \nUj' shown in Figure 2(c). The binary Uj is represented by a negative pulse on the jth \ncolumn line if the element Uj is a binary 1, otherwise the column line is kept at the \npositive voltage. This causes the charges in the columns that correspond to binary l's in \nthe input vector to be transferred to their respective row electrodes which thus experience a \nvoltage change given by \n\nN-l Q \n\n.1Vi = L ijUj \n\nj=O Crow \n\nwhere N is the number of elements in the input vector and Crow is the total capacitance \nof the row electrode. Once the charge has been transferred, the column lines are reset to \ntheir original positive voltages 1 , resulting in the potential diagram in Figure 2(d). The \nvoltage changes on the row lines are then sampled and the matrix of charges are returned \nto the column electrodes in preparation for the next VMM by pulsing the row electrodes \nnegative as in Figure 2(e). In this manner, a complete binary vector is multiplied by an \nanalog matrix of charge in one CCD clock cycle. \n\n3 DESIGN AND OPERATION \n\nThe implementation of this architecture contains facilities for electronic loading of the \nmatrix. Originally proposed as an optically loaded device (Agranat et al., 1988), the \nelectronically loaded version has proven more reliable and consistent. \n\n3.1 LOADING THE CCD ARRAY WITH MATRIX ELEMENTS \n\nThe CCD matrix elements described above can be modified to operate as standard four \nphase CCD shift registers by simply adding another gate. The matrix cell is shown in \nFigure 3. The fabricated single quadrant cell size is 24J.l.m by 24J.lIl1 using a 2J.lIl1 \nminimum feature size CCD/CMOS process. More aggressive design rules in the same \nprocess can reduce this to 20J.lIl1 by 20J.un. These cells, when abutted with each other in \na row, form a horizontal shift register which is used to load the matrix. Electronic \nloading of the matrix is accomplished in a fashion similar to CCD imagers. A fast CCD \nshift register running vertically is added along one side of the matrix which is loaded with \none column of matrix charges from a single external analog data source. Once the fast \nshift register is loaded, it is transferred into the array by clocking the matrix electrodes to \nact as an array of horizontal shift registers, shown in Figure 3(a). This process is repeated \nuntil the entire matrix has been filled with charge. \n\n1 Returning the column lines to their original voltage levels has the effect of canceling the \n\neffect of stray capacitive coupling between the row and column lines, since the net column \nvoltage change is zero. \n\n\f752 \n\nNeugebauer and Yariv \n\nPhasel \n\nPhase2 \n\nPhase3 \n\nPhase4 \n\nDC \n.,1 \n\nColumn \n\nDC \n\nF-\n\n1 \n\n!1 1 r-\n\nRow \n1 \n\n~l~~$~$~\u00b7i~a;;;:\u00b714====~------~r-\n\n(b) \n\nFigure 3: CID Cell Used to Load Matrix \n\nWhen the matrix has been loaded, the charge can be used for computation with two of the \nfour gates at each matrix cell kept at constant potentials, shown in Figure 3(b). The \ncomputation process moves the charge repeatedly between two electrodes. Incomplete \ncharge transfer, a problem with our previous architecture (Agranat et al., 1990), does not \ndegrade perfonnance since any charge left behind under the column gates during \ncomputation is picked up on the next cycle, shown in Figure 2(e). Only dark current \ngeneration degrades the matrix charges during VMM, causing them to increase \nnonunifonnly. In order to limit the effects of dark current generation on the matrix \nprecision, the matrix charge must be refreshed periodically. \n\n3.2 FLOATING GATE ROW AMPLIFIERS \n\nIn order to achieve better linearity when sensing charge, a floating gate amplifier is often \nused in CCD circuits. In the scheme described above, the induced voltage change of the \nrow electrode significantly modifies its parasitic capacitance, resulting in a nonlinear \nvoltage versus charge characteristic. To alleviate this problem, an operational amplifier \nwith a capacitor in the feedback loop is added to each row line, shown in Figure 4. When \n\n\fA Parallel Analog CCO/CMOS Signal Processor \n\n753 \n\ncharge is moved underneath the row line in the course of a VMM operation, the row \nvoltage is kept constant by the action of the op-amp with an output voltage given by \n\nN-l Q U \n\nAVi=L ~ \n\nj=o Cf \n\nwhere Cf is the feedback capacitance. \n\nReset \n\nCf \n\n\" Column Gate \n\nFigure 4: Linear Charge Sensing \n\nThe feedback capacitor is a poly-poly structure with vastly improved linearity compared to \nthe row capacitance. This enhancement also has the effect of speeding the row line \nsummation due to the well known benefits of current mode transmission. In addition. \nthe possibility of digitally selecting a feedback capacitor value by switching power-of-two \nsized capacitors into the feedback loops creates a practical means of controlling the gain of \nthe output amplifiers, with the potential for significantly extending the dynamic range of \nthe device. \n\n3.3 DIGITAL INPUT BUFFER AND DIVIDE-BY-TWO CIRCUITRY \n\nMany applications such as image processing require multilevel input capability. This can \neasily be implemented by using the VMM circuitry in a bit-serial mode. The operation \nof the device is identical to the structure described above except that processing n-bit input \nprecision requires n cycles of the device. Digital shift registers are added to each input \ncolumn line that sequentially present the column lines with successively more significant \nbits of the input vector, shown in Figure 5. Using the notation u}n-I), which represents \nthe binary vector formed by taking the nth bits of all the input elements, the first VMM \ndone by the circuit is given by \n\nN-l Q\"U(O) \n\n(0) ~ IJ \n\nj \n\nAV\u00b7 = \u00a3.J \nj=O \n\n1 \n\nCf \n\nwhere AV i(O) is the output vector represented as voltage changes on the row lines. The \nrow voltages are stored on large capacitors, CI, which are allowed to share charge with \nanother set of equally sized capacitors, C2, effectively dividing the output vector by two. \n\n\f754 \n\nNeugebauer and Yariv \n\nReset \n\nCr \n\nMatrix Charge \n\nRow Line \n\n/' \n\n~ \n\nColumn Gate \n\n-. \n\nFigure 5: Switched Capacitor Divide-By-Two Circuit \n\nThe next most significant bit input vector, Uj(1), is then multiplied and creates another \nset of row voltage changes which are stored and shared to add another charge to the \npreviously divided charge giving \n\nout (1) _ ~ IJ \n\nN-1 Q. -0(1) \nj \n-\u00a3,.. \nj=O Cf \n\nN-1 Q\"U(O) \n\n+-\u00a3,.. \n\n1 ~ IJ \nj \n2j =0 Cf \n\nV\u00b7 \n1 \n\nwhere V iout( 1) is the voltage on C2 after two clock cycles. The process is repeated n \ntimes, effectively weighting each successive bit's data by the proper power of two factor \ngiving a total output VOltage~Of N-l \n\nv~ut (n-l) = 1 L Qi r. 2k-nut-1) =...L L QijDj \n\nN-1 \n\n) ) \n\n{ \n\nCf '=0 \n\nk=l \n\nCfj=o \n\nafter n clock cycles where Dj now represents the multivalued digital input vector. In this \nmanner, multivalued input of n-bit precision can be processed where n is only limited by \nthe analog accuracy of the components2. \n\n4 EXPERIMENTAL RESULTS \n\nA number of VMM circuits have been fabricated implementing the architecture described \nabove in a 2Jl.11l double-poly CCD/CMOS process. The largest circuit contains a \n128x128 array of matrix elements. The matrix is loaded electronically through a single \npin using the CCD shift register mode of the CID cell, shown in Figure 3. Matrix \nelement mismatches due to threshold variations are avoided since all matrix elements are \ncreated by the same set of electrodes. \nA list of relevant system characteristics is given in Table 1. The matrix of charge is \n\n2 If 4-bit input is required the device is simply clocked four times. Since the power of two \nscaling is divisive. the most significant bit is always given the same weighting regardless of \nthe input word length. \n\n\fA Parallel Analog CCD/CMOS Signal Processor \n\n755 \n\nloaded in 4ms and needs to be refreshed every 20ms to retain acceptable weight accuracy at \nroom temperature. giving a refresh overhead of 20%. A simple linear ftlter bank was \nloaded with a sinusoidal matrix and multiplied with a slowly chirped input signal to \ndetennine the linearity and noise limits. \n\nTable 1: Experimental Results \n\nCharge Transfer Efficiency \nCell Size \nBit Rate \nRefresh Time \nNoise Limits \nLinearity \nPower Consumption \n\n(excluding output drivers) \n\nConnections Per Second \n\n(binary input vectors) \n\n5 SUMMARY \n\n0.99995 \n24 J.1ffi x 24 Jlffi \n4 MHz \n4ms \n7 bits \n5 bits \n\n