{"title": "A Silicon Primitive for Competitive Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 713, "page_last": 719, "abstract": null, "full_text": "A silicon primitive for competitive learning \n\nDavid Usu \n\nMiguel Figueroa \n\nChris Diorio \n\nComputer Science and Engineering \n\nThe University of Washington \n\n114 Sieg Hall, Box 352350 \n\nSeattle, W A 98195-2350 USA \n\nhsud,  miguel, diorio@cs.washington.edu \n\nAbstract \n\nCompetitive  learning  is  a  technique  for  training  classification  and \nclustering  networks.  We  have  designed  and  fabricated  an  11-\ntransistor primitive,  that  we  term  an  automaximizing  bump  circuit, \nthat  implements  competitive  learning  dynamics.  The  circuit  per(cid:173)\nforms  a  similarity  computation,  affords  nonvolatile  storage,  and \nimplements  simultaneous  local  adaptation  and  computation.  We \nshow  that  our  primitive  is  suitable  for  implementing  competitive \nlearning  in  VLSI,  and  demonstrate  its  effectiveness  in  a  standard \nclustering task. \n\n1  Introduction \n\nCompetitive  learning  is  a  family  of neural  learning  algorithms  that  has  proved  use(cid:173)\nful  for  training many  classification and  clustering  networks  [1].  In  these  networks,  a \nneuron's  synaptic  weight  vector  typically  represents  a  tight  cluster  of data  points. \nUpon presentation of a new  input to  the  network,  the neuron representing the closest \ncluster adapts  its  weight vector, decreasing  the difference between the  weight vector \nand  present input.  Details  on  this  adaptation  vary  for  different competitive learning \nrules,  but  the  general  functionality  of the  synapse  is  preserved  across  various  com(cid:173)\npetitive  learning  networks.  These  functions  are  weight  storage,  similarity  computa(cid:173)\ntion, and competitive learning dynamics. \nMany  VLSI  implementations  of competitive  learning  have  been  reported  in  the  lit(cid:173)\nerature  [2].  These  circuits  typically  use  digital  registers  or  capacitors  for  weight \nstorage.  Digital  storage  is  expensive  in  terms  of die  area  and  power  consumption; \ncapacitive  storage  typically  requires  a  refresh  scheme  to  prevent  weight  decay.  In \naddition,  these  implementations  require  separate  computation  and  weight-update \nphases,  increasing  complexity.  More  importantly,  neural  networks  built  with  these \ncircuits typically do not adapt during normal operation. \n\nSynapse  transistors  [3][4]  address  the  problems  raised  in  the  previous  paragraph. \nThese devices use the floating-gate  technology to provide nonvolatile  analog storage \nand local adaptation in silicon.  The adaptation mechanisms do  not perturb the  opera(cid:173)\ntion  of the  device,  thus  enabling  simultaneous  adaptation and  computation.  Unfortu(cid:173)\nnately,  the  adaptation  mechanisms  provide  dynamics  that  are  difficult  to  translate \n\n\finto  existing  neural-network learning rules.  Allen  et.  al.  [5]  proposed  a silicon com(cid:173)\npetitive learning  synapse  that used  floating  gate  technology in  the  early  90's.  How(cid:173)\never,  that  approach  suffers  from  asymmetric  adaptation  due  to  separate  mechanisms \nfor  increasing  and  decreasing  weight  values.  In  addition,  they  neither characterized \nthe  adaptation dynamics  of their device,  nor demonstrated competitive learning with \ntheir device. \nWe  present  a  new  silicon  primitive,  the  automaximizing  bump  circuit,  that  uses \nsynapse  transistors  to  implement  competitive  learning  in  silicon.  This  ll-transistor \ncircuit  computes  a  similarity  measure,  provides  nonvolatile  storage,  implements \nlocal  adaptation,  and  performs  simultaneous  adaptation  and  computation.  In  addi(cid:173)\ntion,  the  circuit  naturally  exhibits  competitive  learning  dynamics.  In  this  paper,  we \nderive  the  properties  of the  automaximizing  bump  circuit directly  from  the  physics \nof synapse  transistors,  and  corroborate  our  analysis  with  data  measured  from  a chip \nfabricated  in  a  0.351lm  CMOS  process.  In  addition,  experiments  on  a  competitive \nlearning  circuit,  and  software  simulations  of the  learning rule,  show  that  this  device \nprovides a suitable primitive for competitive learning. \n\n2  Synapse  transistors \nThe  automaxmizing  bump  circuit's  behavior  depends  on  the  storage  and  adaptation \nproperties  of  synapse  transistors .  Therefore  this  section  briefly  reviews  these  de(cid:173)\nvices.  A  synapse  transistor  comprises  a  floating-gate  MOSFET,  with  a control  gate \ncapacitively  coupled  to  the  floating  gate,  and  an  associated  tunneling  implant.  The \ntransistor uses  floating-gate  charge  to  implement  a  nonvolatile  analog  memory,  and \noutputs  a  source  current  that  varies  with  both  the  stored  value  and  the  control-gate \nvoltage.  The  synapse  uses  two  adaptation  mechanisms:  Fowler-Nordheim  tunneling \n[6]  increases  the  stored  charge;  impact-ionized  hot-electron  injection  (IHEI)  [7] \ndecreases  the  charge.  Because tunneling  and  IHEI  can both  be  active during  normal \ntransistor operation, the synapse enables simultaneous adaptation and computation. \nA  voltage  difference  between  the  floating  gate  and  the  tunneling  implant  causes \nelectrons  to  tunnel  from  the  floating  gate,  through  gate  oxide,  to  the  tunneling  im(cid:173)\nplant.  We can approximate this  current (with respect to fixed  tunneling and floating(cid:173)\ngate voltages,  V tunO  and  V go )  as  [4]: \n\nwhere  ItunO  and  Vx  are  constants  that depend  on  V tunO  and  V gO ,  and  Ll V tun  and  Ll Vg  are \ndeviations of the tunneling and  floating gate voltages from these fixed  levels . \nIHEI  adds  electrons  to  the  floating  gate,  decreasing  its  stored  charge.  The  IHEI \ncurrent  increases  with  the  transistor's  source  current  and  drain-to-source  voltage; \nover a small drain-voltage range,  we  model this dependence as  [3][4]: \n\n(1) \n\n(2) \n\nwhere the constant Vy depends on the VLSI process, and  Ut  is  the thermal voltage. \n\n3  Automaximizing  bump  circuit \nThe  automaximizing  bump  circuit  (Fig.  1)  is  an  adaptive  version  of  the  classic \nbump-antibump  circuit  [8].  It uses  synapse  transistors  to  implement the  three  essen(cid:173)\ntial  functions  of a  competitive  learning  synapse:  storage  of a  weight  value  f1\"  com(cid:173)\nputation  of a  similarity  measure  between  the  input  and  f1\"  and  the  ability  to  move f1, \ncloser to  the  input.  Both circuits  take  two  inputs,  VI  and  V2 ,  and  generate three cur-\n\n\fVdd \n\n(a) \n\nV,o(V) \n(b) \n\nFigure  1.  (a)  Automaximizing  bump  cir(cid:173)\ncuit.  MI-M5  form \nthe  classic  bump(cid:173)\nantibump  circuit;  we  added  M6-MII  and \nthe  floating  gates.  (b)  Data  showing  that \nthe  circuit  computes  a  similarity  between \nthe  input,  V in ,  and  the  stored  value,  /-l,  for \nthree  different  stored  weights.  Yin  is  rep(cid:173)\nresented as  VI =+ Vin12,  V2=-Vin/2. \n\nrents.  The  two  outside currents, II and  /Z,  are a measure  of the dissimilarity  between \nthe two inputs; the center current,  Imid'  is  a measure of their similarity: \n\nImid  = Ib(l+l\\,cosh  (K~V)) \n\n2 \n\n~ \n\n-1 \n\n(3) \n\nwhere  A,  and  K  are  process  and  design-dependent parameters,  ~ V  is  the  voltage  dif(cid:173)\nference  between  VI  and  V2 ,  and  h  is  a bias current.  Imid  is  symmetric  with respect to \nthe difference between  VI  and  V2 ,  and approximates a Gaussian centered at ~ V = O. \nWe  augment the  bump-anti bump  circuit by  adding floating gates  and  tunneling junc(cid:173)\ntions  to  MI-M5,  turning  them  into  synapse  transistors;  MI  and  M3  share  the  same \nfloating  gate and  tunneling junction, as  do  M2  and  M4.  We also  add  transistors  M6-\nMIl  to  control  IHEI.  For convenience,  we  will  refer to  our new  circuit merely  as  a \nbump  circuit.  The  charge  stored  on  the  bump  circuit's  floating  gates,  QI  and  Q2, \nshift  Imi/S  peak away  from  ~V=O by  an  amount determined  by  their difference.  We \ninterpret this  difference as  the  weight, p, stored  by  the  circuit,  and interpret Imid  as  a \nsimilarity measure between the circuit's input and stored weight. \n\nTunneling  and  IHEI  adapt the  bump  circuit's  weight.  The  circuit  is  automaximizing \nbecause  tunneling  and  IHEI  naturally  tune  the  peak of Imid  to  coincide  with  the  pre(cid:173)\nsent  input.  This  high-level  behavior  coincides  with  the  dynamics  of  competitive \nlearning;  both  act to decrease the difference between  a stored weight and  the  applied \ninput.  Therefore,  no  explicit  computation  of the  direction  or  magnitude  of  weight \nupdates  is  necessary-the  circuit  naturally  performs  these  computations  for  us. \nConsequently,  we  only  need  to  indicate  when  the  circuit  should  adapt,  not  how  it \ndoes  adapt.  Applying  -IOV  to  Vlun  and  -OV  to  Vinj  activates  adaptation.  Applying \n<8V to Vlun  and >2V to  Vinj  deactivates adaptation. \n\n3.1  Weight  storage \n\nThe  bump  circuit's  weight  value  derives  directly  from  the  charge  on  its  floating(cid:173)\ngates.  A  synapse  transistor's  floating-gate  charge  looks,  for  all  practical  purposes, \n\n\flike a voltage source,  V\"  applied  to  the control gate.  This  voltage source has a value \nVs = QICi\",  where  Cin  is  the  control-gate  to  floating-gate  coupling  capacitance and  Q \nis  the  floating  gate charge.  We encode the  input to  the bump  circuit,  Yin,  as  a differ(cid:173)\nential  signal:  VI= Vin/2;  and  V2 =-Vin/2  (similar results  will  follow  for  any  symmet(cid:173)\nric  encoding  of Yin)'  As  a  result,  froid  computes  the  similarity  between  the  two  float(cid:173)\ning-gate  voltages:  Vfgl = VsI+ Vin/2,  and  Vfg2= Vs2  - Vin/2  where  VsI  and  Vs2  are  the \nvoltages  due  to  the  charge stored  on  the floating gates.  We define the bump circuit's \nweight, /1,  as: \n\n(4) \n\nThis weight corresponds  to  the  value  of Yin  that equalizes the  two  floating-gate  volt(cid:173)\nages  (and maximizes froid).  Part (b)  of Fig.  1 shows  the  bump  circuit's froid  output for \nthree  weight  values,  as  a  function  of  the  differential  input.  We  see  that  different \nstored  values  change  the  location  of the  peak,  but  do  not  change  the  shape  of the \nbump.  Because floating gate charge is nonvolatile, the weight is also nonvolatile. \n\nThe  differential  encoding  of the  input makes  the  bump  circuit's  adaptation  symmet(cid:173)\nric with respect to  (Vin -/1). Without loss of generality, we  can represent Yin  as: \n\n(5) \n\nIf we  apply  Vin!2  and  -Vin!2  to  the  two  input  terminals,  we  arrive  at  the  following \ntwo floating-gate  voltages: \n\nVfgl  =  (Vs2  + Vsl + ~n -\nVfg2  =  (Vs2  + Vsl  - ~n + /1) 1 2 \n\n/1) 1 2 \n\n(6) \n\n(7) \n\nBy  reversing  the  sign  of (Vin -/1),  we  obtain  the  same  floating-gate  voltages  on  the \nopposite terminals.  Because the floating gate voltages  are  independent of the  sign  of \n(Vin-/1),  the  bump circuit's learning rule is symmetric with respect to  (Vin-/1). \n\n3.2  Adaptation \n\nWe  now  explore  the  bump  circuit's  adaptation  dynamics.  We define  L1Vfg=Vfgl-Vfg2' \nFrom Eqs.  4-7,  we  can  see  that  Vin-/1=L1Vfg .  Consequently,  the  learning rate,  dfl/dt, \nis  equivalent  to  -dL1 Vfgldt.  In  our subsequent  derivations,  we  consider only  positive \nL1 Vfg, because  adaptation is  symmetric  (albeit with  a change of sign).  We  show com(cid:173)\nplete derivations of the equations in this  section in [9]. \n\nTunneling  causes  adaptation  by  decreasing  the  difference  between  the  floating-gate \nvoltages  Vfgl  and  Vfg2 .  Electron  tunneling  increases  the  voltage  of  both  floating \ngates,  but,  because  tunneling  increases  exponentially  with  smaller  floating-gate \nvoltages  (see Eq.l), tunneling decreases  the difference.  Assuming that Ml 's floating \ngate voltage is  lower than M2's, the change in L1 Vfg  due to electron tunneling is: \n\nd L1 Vfg  1 dt  =  -(I tunl  -ftun2 ) 1 Cfg \n\nWe substitute Eq.1  into Eq.8 and  solve for the tunneling learning rule: \n\nd \n\nL1Vfg \n\nId \n\nt  =  -ftOe \n\n(.1.Vtun-.1.VO)/Vx \n\n.  h \n\n.  sm  ((L1Vfg  -f/J)  Y.) \n\n12 \n\n(8) \n\n(9) \n\nwhere ftO=ftunO/Cfp  Vx  is  a model constant,  L1 Vo  =  (L1 Vfgl  + L1 Vfg2 )12,  and  f/J  models  the \ntunneling mismatch between  synapse  transistors.  This rule depends  on three factors: \n\n\flO ' .--~--~--~--~------, \n\n10 ' \n\n10 ' \n\n10' \n\n10 ' \n\n10 111 \n\n.~ \n] \n~ \n~ .., \n,. \n., \n\n'0 \nI \n\n'\" ~ \n.., \n,. ., \n\n'1 \n\n0 \n\no  injection data \n.d  tunneling  data \n-fit \n\no data \nfit \n\n0.1 \n\n0.2 \n\n.1.V,& \n\n0.3 \n\n(V) \n\n0.4 \n\n(a) \n\n10 ' \n\n0 \n\n0.1 \n\n0.2 \n\n0.3 \n\n0.4 \n\n0.5 \n\n.lVi'  (V) \n\n(b) \n\nFigure  2.  (a)  Measured  adaptation  rates,  due  to  tunneling  and  IHEI,  along  with  fits \nfrom  Eqs.9  and  11.  (b)  Composite  adaptation  rate,  along  with  a  fit  from  (12).  We \nslowed  the  IHEI  adaptation  rate  (by  using  a  higher  Vinj ),  compared  with  the  data \nfrom part (a),  to cause better matching between tunneling and IHEI. \n\na  controllable  learning  rate,  ~ Vtun ; the  difference  between  Yin  and  f.1,  ~ Vrg ; and  the \naverage floating gate voltage,  ~ Yo. \n\nThe  circuit  also  uses  IHEI  to  decrease  ~ Vrg . We  bias  the  bump  circuit  so  that  only \ntransistors  Ml  and  M2 exhibit IHEI.  According  to  Eq.2, IHEI depends linearly on  a \ntransistor's  source  current,  but  exponentially  on  its  source-to-drain  voltage.  Conse(cid:173)\nquently,  we  decrease  ~ Vrg  by  controlling  the  drain  voltages  at  Ml  and  M2.  Coupled \ncurrent  mirrors  (M6-M7  and  M8-M9)  at  the  drains  of Ml  and  M2,  simultaneously \nraise  the  drain  voltage  of the  transistor  that  is  sourcing  a  larger  current,  and  lower \nthe  drain  voltage  of the  transistor  that  is  sourcing  a  smaller  current.  The  transistor \nwith  the  smaller  source  current  will  experience  a  larger  Vsd ,  and  thus  exponentially \nmore  IHEI,  causing  its  source  current  to  rapidly  increase.  Diodes  (MlO  and  M11) \nfurther  increase  the  drain  voltage  of  the  transistor  with  the  larger  current,  further \nreducing its  IHEI.  The net effect is  that IHEI  acts  to  equalize the currents,  and,  like(cid:173)\nwise,  the  floating  gate  voltages .  Recently  Hasler  proposed  a  similar  method  for \ncontrolling IHEI in a floating gate differential pair [4]. \nAssuming II >h, the change in ~ Vrg  due to  IHEI is: \n\n(10) \n\nWe expand the  learning rule  by  substituting Eq.2 into  Eq.lO. To compute values  for \nthe drain  voltages  of MI  and  M2,  we  assume that all  of II  flows  through MIl and all \nof 12  flows  through M7.  The IHEI learning rule is  given below: \nd/1 Vf g  1 dt  = - IjOe9WO  (e -rVi\"i<l>l (/1 Vfg )  - e ~V;\"i <l>2 (/1 Vfg \u00bb \n\n(11) \n\nwhere  fjo=finjO/Crg,  r=-2heVy, 17=-lIVy, and  I;=KlVy. <1>1  and <1>2  are given by: \n\n<I> l (~ Vfg )  = \n\n(( \n\nf b  -\n\n<l>2(~Vfg) =  (  Ib -Imid \n\n( \n\nf mid \n\n)/2  h \n\ncos  ( K~ Vfg \n\n12 \n\nV t \n\n)) l-2U,/KVy  -lO~Vfg \n\ne \n\n) / 2 \n\ncosh(clVfg  VI  e \n\n1 2 \n\n))  -(l~Vfg ( \n\n-K~Vfg I U, )- u, IVy) \n\nl-e \n\n(12) \n\n(13) \n\nwhere  (J  =(I-V/Vy)Kl2Vh  and  w  =Kl2Vt-Kl2Vy-llVy.  Like  tunneling,  the  IHEI \nrule  depends  on  three  factors:  a  controllable  learning  rate,  Vinj ; the  difference  be(cid:173)\ntween  Yin  and  f.1,  ~ Vrg; and  ~ Yo.  Part  (a)  of Fig.  2  shows  measurements  of d~ Vrgldt \nversus  ~ Vfg  due to  tunneling and IHEI, along with fits  to Eqs.9 and  11  respectively. \n\n\fIHEI  and  tunneling  facilitate  adaptation  by  adding  and  removing  charge  from  the \nfloating  gates,  respectively.  Isolated,  any  of these  mechanisms  will  eventually  drive \nthe  bump  circuit out  of its  operating range. In  order to  obtain useful  adaptation,  we \nneed  to  activate  both  mechanisms  at  the  same  time.  There  is  an  added  benefit  to \ncombining  tunneling  and  IHEI:  Part  (a)  Fig  2  shows  that  tunneling  acts  more \nstrongly  for  smaller  values  of  ~ Vfg ,  while  IHEI  shows  the  opposite  behavior.  The \nmechanisms complement each other, providing adaptation over more than a I V range \nin ~ Vrg. We combine Eq. 9 and Eq.11  to  derive  the bump learning rule: \n~viIj \n\n(~Vtun -~I'o)/V,  . \n\n\u00a2oVo \n\nTViol \n\n(e \n\nCPl(~Vrg)-e \n\nCP2(~Vfg)) (14) \n\n--d~Vrg / dt  =/tOe \n\nsinh((~Vrg -\u00a2')/2V~)+IjOe \n\nPart  (b)  of Fig.  2  illustrates  the  composite  weight-update  dynamics.  When  ~ Vfg  is \nsmall,  adaptation  is  primarily  driven  by  IHEI,  while  tunneling  dominates  for  larger \nvalues  of ~ Vfg \u2022 \n\nThe  bump  learning rule  is  unlike  any  learning rule  that we  have  found  in  the  litera(cid:173)\nture.  Nevertheless,  it exhibits  several  desirable  properties.  First,  it naturally  moves \nthe  bump  circuit's  weight  towards  the  present  input.  Second,  the  weight  update  is \nsymmetric  with  respect  to  the  difference  between  the  stored  value  and  the  present \ninput.  Third,  we  can  vary  the  weight-update  rate  over  many  orders  of magnitude  by \nadjusting  Vlun  and  Vinj \u2022  Finally,  because the  bump circuit uses  synapse  transistors  to \nperform adaptation, the circuit can adapt during normal operation. \n\n4  Competitive  learning  with  bump  circuits \n\nWe  summarize  the  results  of simulations  of the  bump  learning  rule  and  also  results \nfrom  a competitive learning  circuit fabricated  in  the  TSMC  0.35  f.lm  process  below. \nFor  further  details  consult  [9].  We  first  compared  the  performance  of  a  software \nneural  network on  a standard  clustering  task,  using  the  bump  learning rule  (fitted  to \ndata from Fig.  2),  and a basic competitive learning rule (learning rate p=O.OI): \n\ndjl / dt =  p  X  CVin  -\n\njl) \n\n(15) \n\nWe  trained  both  networks  on  data  drawn  from  a  mixture  of 32  Gaussians,  in  a  32-\ndimensional  space.  The  Gaussian  means  were  drawn  from  the  interval  [0,1]  and  the \ncovariance  matrix  was  the  diagonal  matrix 0.1 *1.  On  an  input presentation,  the  net(cid:173)\nwork updated  the  weight  vector of the closest neuron using either the  bump  learning \nrule,  or  Eq.15.  We  measured  the  performance  of the  two  learning  rules  by  evaluat(cid:173)\ning  the  coding  error of each trained  network,  on  a test  set drawn from the  same dis(cid:173)\ntribution  as  the  training  data.  The  coding  error  is  the  sum  of the  squared  distances \nbetween  each  test  point  and  its  closest  neuron.  Part  (a)  of Fig.  3  shows  that  the \nbump circuit's rule performs favorably with the hard competitive learning rule. \n\nOur VLSI  circuit (Part (b)  of Fig. 3)  comprised  two  neurons  with  a one-dimensional \ninput (a  neuron  was  a  single bump circuit),  and  a feedback network to control  adap(cid:173)\ntation.  The  feedback  network comprised  a winner-take-all  (WT A)  [10]  that detected \nwhich  bump  was  closest to  the  present input,  and  additional circuitry  [9]  that  gener(cid:173)\nated  Vtun  and  Vinj  from  the  WT A  output.  We  tested  this  circuit on  a  clustering  task, \nto  learn  the  centers  of a mixture  of two  Gaussians.  In  part (c)  of Fig.  3,  we  compare \nthe  performance  of our  circuit  with  a  simulated  neural  network  using  Eq.15.  The \nVLSI  circuit performed comparably  with  the  neural  network,  demonstrating that our \nbump  circuit,  in  conjunction  with  simple  feedback  mechanisms,  can  implement \ncompetitive  learning  in  VLSI.  We  can  generalize  the  circuitry  to  multiple  dimen(cid:173)\nsions  (multiple  bump  circuits  per  neuron)  and  multiple  neurons;  each  neuron  only \nrequires one Vlun  and Vinj  signal. \n\n\f+ Hard competitive learning rule \no  bump learning rule \n\n3  X 10' \n\n2.6 \n\n2.2 \n\nt \n\n1.8 \n\n1.4 \n\n10~~1 ~00~0--2\"'0~00n-'3~0~00--'4~00~0~<~~0~0--N60~00 \n\nnumber of training examples \n\n(a) \n\nFigure  3.  (a)  Comparison  of  a  neural \nnetwork  using  the  bump  learning  rule \nversus  a  standard  competitive  learning \nrule.  We  drew  the  training  data  from  a \nmixture  of  thirty-two  Gaussians,  and \naveraged  the  results  over ten  trials.  (b)  A \ncompetitive \n(c)  Per(cid:173)\nformance  of  a  competitive  learning  cir(cid:173)\ncuit  versus  a  neural  network for  learning \na mixture of two  Gaussians. \n\nlearning  circuit. \n\n(b) \n\ntarget values  -\ncircuit output + \no \nneural network output = :-: \n\n..... \n\n09 \n\nDB \n\n07 \n\n05 \n\n~ 06 \n\n:J ..  04 \n\n> \n\nQ) \n\n03 \n\n+ \nI  + \n+ \n~+ \no  \u2022  . .  ....... \nI \n, .0- _,-/ \n\n/ \n\n'\" \n\n02 .. \n\n0 1  ~ \n\n/ \n\n/ \nSOD \n\nO ~~--~--~--~--~~--~--~~ \no \n4500 \n\n4000 \n\n3500 \n\n3000 \n\n1500 \n\n1000 \n\n2000 \n\n2500 \n\nAcknowledgements \nThis work was  supported by  the NSF under grants BES  9720353  and  ECS  9733425, \nand  by a Packard Foundation Fellowship. \n\nnumber of training examples \n\n(c) \n\nReferences \n\n[1]  M.A. Arbib (ed.),  The  Handbook of Brain Theory and Neural Networks, Cambridge, MA: The \n\nMIT Press, 1995. \n\n[2]  H.C. Card, D.K. McNeill, and C.R. Schneider, \"Analog VLSI circuits for competitive learning \n\nnetworks\", in Analog Integrated Circuits and Signal Processing,  15, pp. 291-314, 1998. \n\n[3]  C.  Diorio,  \"A p-channel MOS  synapse transistor with  self-convergent memory writes\", IEEE \n\nTransactions on Electron Devices, vol. 47, no. 2, pp 464-472, 2000. \n\n[4]  P.  Hasler,  \"Continuous-Time Feedback in  Floating-Gate MOS  Circuits,\" to  appear in  IEEE \n\nTransactions on Circuits and Systems IT,  Feb. 2001 \n\n[5]  T.  Allen  et.  aI,  \"Electrically  adaptable  neural  network  with  post-processing  circuitry,\"  U.S. \n\nPatent No.  5,331 ,215, issued July 19, 1994. \n\n[6]  M.  Lenzlinger  and  E.H.  Snow,  \"Fowler- Nordheim  tunneling  into  thermally  grown  Si02\", \n\nJournal of Applied Physics, vol. 40(1), pp. 278-283 , 1969. \n\n[7]  E. Takeda, C. Yang, and A.  Miura-Hamada, Hot Carrier Effects in MOS Devices, San Diego, \n\nCA: Academic Press, 1995. \n\n[8]  T.  Delbruck,  \"Bump  circuits  for  computing  similarity  and  dissimilarity  of analog  voltages\", \n\nCNS Memo 26, California Institute of Technology, 1993. \n\n[9]  D. Hsu, M. Figueroa, and C. Diorio, \"A silicon primitive for competitive learning,\" UW CSE \n\nTechnical Report no . 2000-07-01, 2000. \n\n[10]  J.  Lazzaro,  S.  Ryckebusch, M.A.  Mahowald,  and  c.A.  Mead, \"Winner-take-all  networks  of \nO(n)  complexity\",  in  Advances  in  Neural  Information  Processing  Systems,  San  Mateo, CA: \nMorgan Kaufman, vol.  1, pp 703-711 , 1989. \n\n\f", "award": [], "sourceid": 1905, "authors": [{"given_name": "David", "family_name": "Hsu", "institution": null}, {"given_name": "Miguel", "family_name": "Figueroa", "institution": null}, {"given_name": "Chris", "family_name": "Diorio", "institution": null}]}