{"title": "An Adaptive WTA using Floating Gate Technology", "book": "Advances in Neural Information Processing Systems", "page_first": 720, "page_last": 726, "abstract": null, "full_text": "An Adaptive WTA using Floating Gate \n\nTechnology \n\nw. Fritz Kruger, Paul Hasler, Bradley A. Minch, and Christ of Koch \n\nCalifornia Institute of Technology \n\nPasadena, CA 91125 \n\n(818) 395 - 2812 \n\nstretch@klab.caltech.edu \n\nAbstract \n\nWe have designed, fabricated, and tested an adaptive Winner(cid:173)\nTake-All (WTA) circuit based upon the classic WTA of Lazzaro, \net al [IJ. We have added a time dimension (adaptation) to this \ncircuit to make the input derivative an important factor in winner \nselection. To accomplish this, we have modified the classic WTA \ncircuit by adding floating gate transistors which slowly null their \ninputs over time. We present a simplified analysis and experimen(cid:173)\ntal data of this adaptive WTA fabricated in a standard CMOS 2f.tm \nprocess. \n\n1 Winner-Take-All Circuits \n\nIn a WTA network, each cell has one input and one output. For any set of inputs, the \noutputs will all be at zero except for the one which is from the cell with the maximum \ninput. One way to accomplish this is by a global nonlinear inhibition coupled with a \nself-excitation term [2J. Each cell inhibits all others while exciting itself; thus a cell \nwith even a slightly greater input than the others will excite itself up to its maximal \nstate and inhibit the others down to their minimal states. The WTA function is \nimportant for many classical neural nets that involve competitive learning, vector \nquantization and feature mapping. The classic WTA network characterized by \nLazzaro et. al. [IJ is an elegant, simple circuit that shares just one common line \namong all cells of the network to propagate the inhibition. \n\nOur motivation to add adaptation comes from the idea of saliency maps. Picture \na saliency map as a large number of cells each of which encodes an analog value \n\n\fAn Adaptive wrA using Floating Gate Technology \n\n721 \n\nVtun01 \n\nVdd \u00b1 \n\nM4 \n\nV 1 \n\n~C1 \n\n, \ni C2 \n\nVfg1 \n\nM2 \n\nVb1 \n\nV \n\nVtun02 \n\nJLV~ \n-A ~5 \n\n1--r---'-c-\n2\n\nFigure 1: The circuit diagram of a two input winner-take-all circuit. \n\nreflecting some measure of the importance (saliency) of its input. We would like \nto pay attention to the most salient cell, so we employ a WTA function to tell us \nwhere to look. But if the input doesn't change, we never look away from that one \ncell. We would like to introduce some concept of fatigue and refraction to each cell \nsuch that after winning for some time, it tires, allowing other cells to win, and then \nit must wait some time before it can win again. We call this circuit an adaptive \nWTA. \n\nIn this paper, we present an adaptive WTA based upon the classic WTA; Figure 1 \nshows a two-input, adaptive WTA circuit. The difference between the classic and \nadaptive WTA is that M4 and Ms are pFET single transistor synapses. A single \ntransistor synapse [3] is either an nFET or pFET transistor with a floating gate and \na tunneling junction. This enhancement results in the ability of each transistor to \nadapt to its input bias current. The adaptation is a result of the electron tunneling \nand hot-electron injection modifying the charge on the floating gate; equilibrium is \nestablished when the tunneling current equals the injection current. The circuit is \ndevised in such a way that these are negative feedback mechanisms, consequently \nthe output voltage will always return to the same steady state voltage determined \nby its bias current regardless of the DC input level. Like the autozeroing amplifier \n[4], the adaptive WTA is an example of a circuit where the adaptation occurs as a \nnatural part of the circuit operation. \n\n2 pFET hot-electron injection and electron tunneling \n\nBefore considering the behavior of the adaptive WTA, we will review the processes of \nelectron tunneling and hot-electron injection in pFETs. In subthreshold operation, \nwe can describe the channel current of a pFET (Ip) for a differential change in gate \n\nvoltage, ~ Vg, around a fixed bias current Iso, as Ip = Iso exp ( - ,,~:g ) where Kp is \nthe amount by which ~ Vg affects the surface potential of the pFET, and UT is ki. \n\nWe will assume for this paper that all transistors are identical. \n\nFirst, we consider electron tunneling. We start with the classic model of electron \n\n\f722 \n\nW. F. Kruger, P. Hasler, B. A. Minch and C. Koch \n\nDrain \n\nL \nJ ,,' \n\n,0\u00b7 \n\n-1 ... 2QOnA \n\nI .... ~ \n\n-1 \u2022\u2022 1nA \n-La.1inA \n\nt \n\nIS \n\n10 \n\n10.6 \n\n11 \n\ne \n\n&.5 \n\n\"'\" \n(b) \n\nEc~ \u2022\u2022\u2022 \nEv - - - . ' - ... -....\".~ \n\nSource \n\nChannel \n\n(a) \n\nFigure 2: pFET Hot Electron Injection. \n(a) Band diagram of a subthreshold pFET \ntransistor for favorable conditions for hot-electron injection. (b) Measured data of pFET \ninjection efficiency versus the drain to channel voltage for four source currents. Injection \nefficiency is the ratio of injection current to source current. At cI>dc equal to 8.2V, the \ninjection efficiency increases a factor of e for an increase cI>dc of 250mV. \n\ntunneling through a silicon - Si0 2 system [5]. As in the autozeroing amplifier [4], \nthe tunneling current will be only a weak function for the voltage swing on the \nfloating gate voltage through the region of subthreshold currents; therefore we will \napproximate the tunneling junction as a current source supplying I tunO current to \nthe floating gate. \n\nSecond, we derive a simple model of pFET hot-electron injection. Figure 2a shows \nthe band diagram of a pFET operating at bias conditions which are favorable for \nhot-electron injection. Hot-hole impact ionization creates electrons at the drain edge \nof the depletion region. These secondary electrons travel back into the channel \nregion gaining energy as they go. When their energy exceeds that of the Si02 \nbarrier, they can be injected through the oxide to the floating gate. The hole \nimpact ionization current is proportional to the source current, and is an exponential \nfunction of the voltage drop from channel to drain (c)de). The injection current is \nproportional to the hole impact ionization current and is an exponential function \nof the voltage drop from channel to drain. We will neglect the dependence of the \nfloating-gate voltage for a given source current and c)de as we did in [4]. Figure \n2b shows measured injection efficiency for several source currents, where injection \nefficiency is the ratio of the injection current to source current. The injection \nefficiency is independent of source current and is approximately linear over a 1 \n- 2V swing in c)de; therefore we model the injection efficiency as proportional to \n\nexp ( - t~~c ) within that 1 to 2V swing, where Vinj is a measured device parameter \nwhich for our process is 250mV at a bias c)de = 8.2V, and 6,c)de is the change in \nc) de from the bias level. An increasing voltage input will increase the pFET surface \npotential by capacitive coupling to the floating gate. Increasing the pFET surface \npotential will increase the source current thereby decreasing c) de for a fixed output \nvoltage and lowering the injection efficiency. \n\n\fAn Adaptive WTA using Floating Gate Technology \n\n723 \n\n,o'r-----~----~---___, \n\nCulftlnt steP I nput \n\n10 ,77nA \u00b7 14.12nA - lO.11M \n\nV .. n . 43.3SV \n\n~~\\ \n\\ \n\\ \n\\ \n\\ \n\\ \n\\ \n\\ \n\\ \n\n\\ \n\n\"-\n\n\" \n\n1.55 \n\n~,. \n\n, \n,I \n/ \nt \n/ \n~\n! \nius ! \nI \n! \nI \n\nJ \n\n1.4 \n\nj \n\nj \n\n~~+~ \n1 .35 O~--;:20::----!:40'---:!:60:--:::'80-----:'::::OO--;'=:-20 ----:-, 40=-~' 60:::---:-:'80::--::!200 \n\n1111\"18 (5) \n\n(a) \n\n1000~------7.50:-------:'::::OO------'!'SO \n\nInput CuTent Step (% of bas cumtnt) \n\n(b) \n\nFigure 3: Illustration of the dynamics for the winning and losing input voltages. (a) \nMeasured Vi verses time due to an upgoing and a downgoing input current step. The \ninitial input voltage change due to the input step is much smaller than the voltage change \ndue to the adaptation. (b) Adaptation time of a losing input voltage for several tunneling \nvoltages. The adaptation time is the time from the start of the input current step to the \ntime the input voltage is within 10% of its steady state voltage. A larger tunneling current \ndecreases the adaptation time by increasing the tunneling current supplied to the floating \ngate. \n\n3 Two input Adaptive WTA \n\nWe will outline the general procedure to derive the general equations to describe \nthe two input WTA shown in Fig. 1. We first observe that transistors M 1 , M 2 , \nand Ma make up a differential pair. Regardless of any adaptation, the middle V \nnode and output currents are set by the input voltages (Vl and V2) , which are set \nby the input currents, as in the classic WTA [1]. The dynamics for high frequency \noperation are also similar to the classic WTA circuit. Next, we can write the two \nKirchhoff Current Law (KCL) equations at Vl and V2 , which relate the change in \n~ and V2 as a function of the two input currents and the floating gate voltages. \nFinally, we can write the two KCL equations at the two floating gates VJgl and \nVJ g2 , which relates the changes in the floating gate voltages in terms of Vl and V2. \nThis procedure is directly extendable to multiple inputs. A full analysis of these \nequations is very difficult and will be described in another paper. \n\nFor this discussion , we present a simplified analysis to develop the intuition of the \ncircuit operation. At sufficiently high frequencies, the tunneling and injection cur(cid:173)\nrents do not adapt the floating gate voltages sufficiently fast to keep the input \nvoltages at their steady state levels. At these frequencies, the adaptive WTA acts \nlike the classic WTA circuit with one small difference. A change in the input volt(cid:173)\nages, Vl or V2 is linearly related to V by the capacitive coupling (~Vl = - \u00a7; ~ V), \nwhere this relationship is exponential in the classic WTA. There is always some ca(cid:173)\npacitance C2 , even if not explicitly drawn due to the overlap capacitance from the \nfloating gate to drain. This property gives the designer the added freedom to mod(cid:173)\nify the gain. We will assume the circuit operates in its intended operating regime \nwhere the floating gate transistors settle sufficiently fast such that their channel \n\n\f724 \n\nW. F. Kruger, P. Hasler, B. A. Minch and C. Koch \n\n. ' \n\n. \n\n\" . .,.V ....... f \n\n,- .; \n\n35 \n\n~25 \n\nL > \n1 .. \n\n10'\u00b7 \n\n10\" \n\n10\" \nc~ ..... t2(A.. \n\n(a) \n\n'0' \n~1fIp.ll12(A) \n\n(b) \n\nFigure 4: Measured change in steady state input voltages as a function of bias current. \n(a) Change in the two steady state output voltages as a function of the bias current of the \nsecond input. The bias current of the first input was held fixed at 8.14nA. (b) Change in \nthe RMS noise of the two output voltages as a function of the bias current of the second \ninput. The RMS noise is much higher for the losing input than for the winning input. \nNote that where the two bias currents crOSS roughly corresponds to the location where the \nRMS noise on the two input voltages is equal. \n\ncurrent equals the input currents \n\nI \n\nJ. -\n, - 80 exp \n\n(_ K6,V/9i ) \n\nUT \n\ndIi _ -J.~ dV/ gi \n, UT dt \n\n-+ dt -\n\n(1) \n\nfor all inputs indexed by i, but not necessarily fast enough for the floating gates to \nsettle to their final steady state levels. \n\nTo develop some initial intuition, we shall begin by considering one half of the two \ninput WTA: transistors M 1 , M2 and M4 of Figure 1. First, we notice that Ioutl is \nequal to Ib (the current through transistor Mt}; note that this is not true for the \nmultiple input case. By equating these two currents we get an equation for V as \nV = KV1 - KVb, where we will assume that Vb is a fixed bias voltage. Assuming the \ninput current equals the current through M 4 , VI obeys the equation \n\n(KG1 + G2 ) - = - - - - + ItunO \n\nGTUT dII \nKIt dt \n\ndVI \ndt \n\n( II \n\n-\n180 \n\n) \nexp( ---) -1 \n\n6, VI \nVinj \n\n(2) \n\nwhere CT is the total capacitance connected to the floating gate. The steady state \nof (2) is \n\nsv; = KVinj I (~) \n\n'n \n\nU \nT \n\nn \n\nI \n\n80 \n\n(3) \n\nwhich is exactly the same expression for each input in a multiple input WTA. We get \na linear differential equation by making the substitution X = exp( D..v..Vl) [4], and we \nget similar solutions to the behavior of the autozeroing amplifier. Figure 3a shows \nmeasured data for an upgoing and a downgoing current step. The input current \nchange results in an initial fast change in the input voltage, and the input voltage \nthen adapts to its steady state voltage which is a much greater voltage change. \nFrom the voltage difference between the steady states, we get that Vinj is roughly \n500mV. \n\n\"'1 \n\n\fAn Adaptive WTA using Floating Gate Technology \n\n725 \n\no \n\n10 \n\n15 \n\n20 \n\n25 \n\nl1me(a) \n\n(a) \n\n30 \n\n35 \n\n.a \n\n45 \n\n50 \n\no \n\n5 \n\n10 \n\n,5 \n\n20 \n\n30 \n\n35 \n\n.a \n\n45 \n\n50 \n\n25 \n\n11me(.) \n(b) \n\nFigure 5: Experimental time traces measurements of the output current and voltage \nfor small differential input current steps. (a) Time traces for small differential current \nsteps around nearly identical bias currents of 8.6nA. (b) Time traces for small differential \ncurrent steps around two different bias currents of 8.7nA and O.88nA . In the classic WTA, \nthe output currents would show no response to the input current steps. \n\nReturning to the two input case, we get two floating gate equations by assuming \nthat the currents through M4 and M5 are equal to their respective input currents \nand writing the KCL equations at each floating gate. If VI and V2 do not cross \neach other in the circuit operation, then one can easily solve these KCL equations. \nAssume without loss of generality that VI is the winning voltage; which implies that \n~ V = K~ Vl . The initial input voltage change before the floating gate adaptation \ndue to a step in the two input currents of II ~ It and 12 ~ It is \n\n~VI = GT In (It) ~V2 ~ GT In (II It) \n\nG2 \n\nIt 12 \n\nKGl \n\nII' \n\n(4) \n\nfor G2 much less than KGl . In this case, Vl moves on the order of the floating gate \nvoltage change, but V2 moves on the order of the floating gate change amplified up \nby .g;.. The response of ~ VI is governed by an identical equation to (2) ofthe earlier \nhalf-analysis, and therefore results in a small change in VI. Also, any perturbation \nof V is only slightly amplified at Vl due to the feedback; therefore any noise at V \nwill only be slightly amplified into VI. The restoration of V2 is much quicker than \nthe Vl node if G2 is much less than KGl ; therefore after the initial input step, one \ncan safely assume that V is nearly constant. The voltage at V is amplified by - ~ \nat 112; therefore any noise at V is amplified at the losing voltage, but not at the \nwinning voltage as the data in Fig. 4b shows. The losing dynamics are identical \nto the step response of an autozeroing amplifier [4]. Figure 3b shows the variation \n. of the adaptation time verses the percent input current change for several values of \ntunneling voltages. \n\nThe main difficulty in exactly solving these KCL equations is the point in the \ndynamics where Vi crosses V2 , since the behavior changes when the signals move \n\n\f726 \n\nW. F. Kruger, P. Hasler, B. A. Minch and C. Koch \n\nthrough the crossover point. If we get more than a sufficient Vi decrease to reach \nthe starting V2 equilibrium, then the rest of the input change is manifested by an \nincrease in V2 \u2022 If the voltage V2 crosses the voltage Vi, then V will be set by the \nnew steady state, and Vi is governed by losing dynamics until Vi :::::l V2 \u2022 At this \npoint Vi is nearly constant and V2 is governed by losing dynamics. This analysis is \ndirectly extendible to arbitrary number of inputs. \n\nFigure 5 shows some characteristic traces from the two-input circuit. Recall that the \nwinning node is that with the lowest voltage, which is reflected in its corresponding \nhigh output current. In Fig. 5a, we see that as an input step is applied, the output \ncurrent jumps and then begins to adapt to a steady state value. When the inputs \nare nearly equal, the steady state outputs are nearly equal; but when the inputs \nare different, the steady state output is greater for the cell with the lesser input. \nIn general, the input current change that is the largest after reaching the previous \nequilibrium becomes the new equilibrium. This additional decrease in Vi would \nlead to an amplified increase in the other voltage since the losing stage roughly \nlooks like an autozeroing amplifier with the common node as the input terminal. \nThe extent to which the inputs do not equal this largest input is manifested as a \nproportionally larger input voltage. The other voltage would return to equilibrium \nby slowly, linearly decreasing in voltage due to the tunneling current. This process \nwill continue until Vi equals V2. Note in general that the inputs with lower bias \ncurrents have a slight starting advantage over the inputs with higher bias currents. \n\nFigure 5b illustrates the advantage of the adaptive WTA over the classic WTA. In \nthe classic WTA, the output voltage and current would not change throughout the \nexperiment, but the adaptive WTA responds to changes in the input. The second \ninput step does not evoke a response because there was not enough time to adapt \nto steady state after the previous step; but the next step immediately causes it to \nwin. Also note in both of these traces that the noise is very large in the loosing \nnode and small in the winner because of the gain differences (see Figure 4b). \n\nReferences \n\n[1] J. Lazzaro, S. Ryckebusch, M.A. Mahowald, and C.A. Mead \"Winner-Take(cid:173)\nAll Networks of O(N) Complexity\" , NIPS 1 Morgan Kaufmann Publishers, \nSan Mateo, CA, 1989, pp 703 - 711. \n\n[2] Grossberg S. \"Adaptive Pattern Classification and Universal Recoding: I. Par(cid:173)\nallel Development and Coding of Neural Feature Detectors.\" Biological Cyber(cid:173)\nnetics vol. 23, 121-134, 1988. \n\n[3] P. Hasler, C. Diorio, B. A. Minch, and C. Mead, \"Single \n\n'fransis(cid:173)\n\ntor Learning Synapses\", NIPS 7, MIT Press, 1995, 817-824. Also at \nhttp://www.pcmp.caitech.edu/ anaprose/paul. \n\n[4] P. Hasler, B. A. Minch, C. Diorio, and C. Mead, \"An autozeroing amplifier \nusing pFET Hot-Electron Injection\", ISCAS, Atlanta, 1996, III-325 - III-328. \nAlso at http://www.pcmp.caitech.edu/anaprose/paul. \n\n[5] M. Lenzlinger and E. H. Snow (1969), \"Fowler-Nordheim tunneling into ther(cid:173)\n\nmally grown Si02 ,\" J. Appl. Phys., vol. 40, pp. 278-283, 1969. \n\n\f", "award": [], "sourceid": 1205, "authors": [{"given_name": "W.", "family_name": "Kruger", "institution": null}, {"given_name": "Paul", "family_name": "Hasler", "institution": null}, {"given_name": "Bradley", "family_name": "Minch", "institution": null}, {"given_name": "Christof", "family_name": "Koch", "institution": null}]}