{"title": "Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 685, "page_last": 691, "abstract": null, "full_text": "Neuron-MOS  Temporal Winner  Search \n\nHardware for  Fully-Parallel Data \n\nProcessing \n\nTadashi SHIBATA, Tsutomu NAKAI, Tatsuo MORIMOTO \nRyu KAIHARA,  Takeo YAMASHITA, and Tadahiro  OHMI \n\nAza-Aoba, Aramaki, Aobaku, Sendai 980-77  JAPAN \n\nDepartment of Electronic  Engineering \n\nTohoku  University \n\nAbstract \n\nA  unique  architecture  of  winner  search  hardware  has  been  de(cid:173)\nveloped  using  a  novel  neuron-like  high  functionality  device  called \nNeuron  MOS  transistor  (or  vMOS  in  short)  [1,2]  as  a  key  circuit \nelement.  The circuits developed  in  this work  can find  the location \nof the  maximum  (or  minimum)  signal  among  a  number  of input \ndata on  the  continuous-time  basis,  thus enabling real-time  winner \ntracking as well as fully-parallel sorting of multiple input data.  We \nhave  developed  two  circuit  schemes.  One  is  an  ensemble  of self(cid:173)\nloop-selecting v M OS ring oscillators finding the winner as an oscil(cid:173)\nlating node.  The other is  an ensemble of vMOS variable threshold \ninverters receiving a common ramp-voltage for  competitive excita(cid:173)\ntion  where  data sorting  is  conducted  through  consecutive  winner \nsearch actions.  Test circuits were fabricated by a double-polysilicon \nCMOS  process  and  their  operation  has  been  experimentally veri(cid:173)\nfied. \n\n1 \n\nINTRODUCTION \n\nSearch  for  the  largest  (or  the  smallest)  among  a  number  of input  data,  Le.,  the \nwinner-take-all  (WTA)  action,  is  an  essential  part  of intelligent  data  processing \nsuch  as  data retrieval  in associative  memories  [3],  vector  quantization  circuits  [4], \nKohonen's self-organizing  maps  [5]  etc.  In  addition  to  the  maximum or  minimum \nsearch,  data sorting  also  plays  an  essential  role  in  a  number  of signal  processing \nsuch as  median filtering in  image  processing, evolutionary algorithms in optimizing \nproblems  [6]  and  so forth .  Usually such  data processing is  carried out  by  software \nrunning on general purpose computers,  but the  computation  time  increases  explo-\n\n\f686 \n\nT. SHIBATA, T.  NAKAI, T. MORIMOTO, R. KAIHARA, T.  YAMASHITA, T. OHMI \n\nsively with the increase in  the volume of data.  In order to build electronic systems \nhaving a  real-time-response  capability,  the direct  implementation of fully  parallel \nalgorithms on the integrated circuits hardware is critically demanded. \nA  variety of WTA  [4,  7,  8)  circuits have been implemented so  far  based on  analog \ncurrent-mode circuit  technologies.  A  number of cells,  each  composed  of a  current \nsource, competitively share the total current specified by a  global current sink and \nthe winner is  identified through the current concentration toward the cell  via tacit \npositive feedback  mechanisms.  The circuit implementations using MOSFET's op(cid:173)\nerating in the subthreshold regime  [4,  7)  are ideal for large scale integration due to \nits ultra low power nature.  Although they are inherently slow at circuit levels,  the \nperformance at a  system  level  is far  superior  to  digital  counterparts owing  to  the \nflexible  computing algorithms of analog.  In order  to achieve  a  high  speed opera(cid:173)\n[8).  However, \ntion,  MOSFET's  biased  at strong  inversion  is  also  utilized  in  Ref. \ncost must be traded off for  increased power. \n\nWhat we  are presenting in  this paper is a  unique WTA architecture implemented \nby vMOS  technology [1,2].  In vMOS circuits the summation of multiples of voltage \nsignals is conducted on the vMOS floating gate (or better be called \"temporary float(cid:173)\ning gate\"  when used in a  clocked scheme [9])  via charge sharing among capacitors, \nand the result of the summation controls  the transistor action.  The voltage-mode \nsummation  capability of vMOS  has  been  uniquely  utilized  to  produce  the  WTA \naction.  No DC current flows  for  the sum operation itself in contrast to the Kirch(cid:173)\nhoff sum.  In  vMOS  transistors,  however,  DC  current  flows  in  a  CMOS  inverter \nconfiguration  when  the floating  gate is  biased  in  the  transition  region.  Therefore \nthe power consumption is larger than in the subthreshold circuitries.  However,  the \nvMOS WTA's presented in this article will give an opportunity of high speed opera(cid:173)\ntion at much less power consumption than current-mode circuitries operating in the \nstrong inversion mode.  In the following we present two kinds of winner search hard(cid:173)\nware featuring very fast operation.  The winner can be tracked in a continuous-time \nregime with a  detection  delay time of about  lOOpsec,  while the sorting of multiple \ndata is conducted in a  fixed  frame of time of about 100nsec. \n\n2  NEURON-MOS CONTINUOUS-TIME WTA \n\nFig.  1(a)  shows  a  schematic  circuit  diagram  of a  vMOS  continuous-time  WTA \nfor  four  input  signals.  Each  signal  is  fed  to  an  input-stage  vMOS  inverter-A:  a \n\nVs \n\n::~fw \n\n(a) \n\nVA'-V ... \n\nV,.,-VA4 \n\n,ole  'Lc \n~ ..  o~ ~ .. \n0:71' \nV \u2022\u2022 1  l  (b) \n'ole \n\n: \n:  v.  (c) \n\nV \u2022\u2022 1 \n\nVAI-VA4 \n\nVa \n\nV. \n\no~  ~ \u2022\u2022 \n(d) \n\n: v. \n\nl \n\nVoc1 \n\nFigure  1:  (a)  Circuit  diagram  of vMOS  continuous-time  WTA  circuit.  (b)lV(d) \nResponse of VAl  IV V A4 as &  function of the floating-gate potential of vMOS inverter(cid:173)\nA. \n\n\fNeuron-MOS Temporal Winner Search Hardware for Fully-parallel Data Processing \n\n687 \n\nCMOS  inverter in  which the common  gate is made floating  and  its potential ,pFA \nis determined  via capacitance coupling with  three input  terminals.  VI ('\" 'V4)  and \nVR  are equally coupled to the floating gate and a  small capacitance pulls down  the \nfloating  gate  to  ground.  The  vMOS  inverter-B  is  designed  to  turn  on  when  the \nnumber of l's in  its inputs  (VAl'\" VA4)  is more  than  1.  When  a  feedback loop  is \nformed as shown in the figure, it becomes a ring oscillator composed of odd-numbers \nof inverter stages. \nWhen  Vi  '\" V4  = 0,  the circuit is  stable with  VR  = 1 because inverter-A's do  not \nturn on.  This is because the small grounded capacitor pulls down the floating gate \npotential ,pFA  a  little smaller than its inverting threshold  (VDD/2)  (see Fig.  l(b)). \nIf non-zero signals are given to input terminals, more-than-one inverter-A's turn on \n(see Fig.  l(c)) and the inverter-B also turns on, thus initiating the transition of VR \nfrom  VDD  to O.  According to the decrease in VR,  some of the inverter-A's turn off \nbut the inverter-B (number 1 detector) still stays at on-state until the last inverter(cid:173)\nA  turns off.  When the last inverter-A,  the one receiving the largest voltage input, \nturns off,  the inverter-B also  turns off and VR  begins to increase.  As a  result,  ring \noscillation occurs only in the loop including the largest-input inverter-A(Fig.  l(d)). \nIn this manner, the winner is identified as an oscillating node.  The inverter-B can \nbe altered to a  number \"2\" detector or a  number \"3\" detector etc.  by just reducing \nthe input voltage to the largest coupling capacitor.  Then it is  possible for  top two \nor top  three to  be winners. \n\n10 \n\n20 \n\n31) \n\n40 \n\n50 \n\n60 \n\n70 \n\n:~ -\n4fl; ...  VAi\u00b7\u00b7\u00b7f]\u00b7\u00b7\u00b7\u00b7  .. : .  fl\u00b7 ...... , ...... -. \n\n. . . .. \n\n', \n\n~  2~  . . . . . .. . ...\n\n..\n\n.\n\n.. . . ... \n\n. \n\n!  o~, .. .\n\n; \n\nI\n\n\" \n\n\u2022 \n\nI. \n\n.\n\n. 1. \n\no \n\n4() \n\n.0 \n\nlOD \n\n120 \n\n140 \n\n~~ \n\nTIME [JIaec]  (a) \n\nw \nCJ \n< .... \n-' o \n> \n\n' \n\n.. ~ \n.. ~ \n\n\" \n.... - ~ .. \n\"' \n\nI \n\n...  l  I \n60 \n\nl \n\nI \n\n.. j \n70 \n\n. \n\no \n4 ~\n2\" \no t..f  ~..J.!~ ......... i ~-'-'-'!  ,~ ..... i~...J I \no \n50 \n(b) \n\n...... ,. VA. : \n\nTIME  [nsec) \n\n30 \n\n10 \n\n20 \n\n40 \n\nJ. \n\nl \n\nFigure 2:  (a)  Measured  wave  forms  of four-input  WTA as  depicted  in  Fig.  1(80) \n(bread  board  experoment) .  (b)  Simulation  results  for  non-oscillating  WTA  ex(cid:173)\nplained in  Fig.  3. \n\nFig.  2(80)  demonstrates  the  measured  wave  forms  of a  bread-board  test  circuit \ncomposed of discrete components for verifying the circuit idea.  It is clearly seen that \nring oscillation  occurs only at the temporal winner.  However,  the  ring  oscillation \nincreases  the  power dissipation,  and  therefore,  non-oscillating  circuitry  would  be \npreferred.  An  example  of simulation  results  for  such  a  non-oscillating  circuit  is \ndemonstrated in Fig.  2(b). \nFig. \n\n3(80)  gives  the  circuit  diagram  of  a  non-oscillating  version  of  the  vMOS \n\n\f688 \n\nT.  SHIBATA. T. NAKAI. T. MORIMOTO. R.  KAIHARA. T. YAMASHITA. T. OHMI \n\nvMOS Inv., .. r-A \n\nvMOS Inv_r-B \n\n'1  I \n\n~ \n\nVa \n\nVt  ~: I~I \nVa  ~T \n\n1[>.1>:  V .. \n\nV. \n\nCOXT~ \n\nR \n(a) \n\nVA \n\naD \n\n\u2022  No,,-olCillalinl mod, \no  Olcillatl,. mod, \n\n() \n\n1 aD  0 \n=-.; \n\n00 \n\n0 \n\n0 \n\na  \u2022  \u2022  \u2022  10 \n\nRI'0) \n\n(b) \n\n;0,2 \n! \nf \n\n,,0.1i-\n\n\u2022 \n\u2022 \n\n0 \n\n\u2022 \n\n0 \n\n0 \n\n\u2022 \n\n.R.O 1111 \n\n2000 \n\n4000 \n\n\u2022\u2022 0 \n\nCUT/c... \n(c) \n\nFigure 3:  (a)  Circuit  diagram  of non-oscillating-mode  WTA.  HSPICE simulation \nresults:  (b)  combinations  of R  and  CEXT  for  non-oscillating  mode;  (c)  winner \ndetection delay as a  function  of capacitance load. \n\ncontinuous-time WTA. In order to suppress the oscillation, the loop gain is reduced \nby removing the two-stage CMOS inverters in front of the inverter-B and RC delay \nelement is  inserted  in  the feedback loop.  The small grounded  capacitors  were  re(cid:173)\nmoved  in  inverter-A's.  The waveforms demonstrated in Fig.  2(b) are the HSPICE \nsimulation  results with  R  =  0  and  CEXT  =  20Cgote(Cgote:  input capacitance of \nelemental CMOS inverter=5.16f.F)  .  The circuit was simulated assuming a  typical \ndouble-poly 0.5-pm CMOS process.  Fig.  3(b) indicates the combinations of Rand \nC EXT  yielding the non-oscillating mode of operation obtained by HSPICE simula(cid:173)\ntion.  It is important to note that if CEXT  ~ 15Cgote ,  non-oscillating mode appears \nwith R  =  O.  This me8JlS  the output resistance of the inverter-B plays the role  of \nR.  When  the number of inverter-A's is  increased,  the increased  capacitance load \nserves as CEXT.  Therefore, WTA having more than 19 input signals C8Jl  operate in \nthe non-oscillating mode.  Fig.  3(c)  represents the detection delay as a  function  of \nCEXT.  It is known that the increase in CEXT, therefore the increase in the number \nof input signals to the WTA, does not significantly increase the detection delay and \nthat the delay is  only in the r8Jlge of 100  to 200psec. \n\nA  photomicrograph  of a  test circuit of the non-oscillating  mode  WTA  fabricated \nby Tohoku-University st8Jldard double-polysilicon  CMOS process on 3-pm  design \nrules, and the measurement results are shown in Fig.  4(80)  and (b), respectively. \n\nI-\n\nv  \"\"  V \n:---/ \n\nI\"-\n\nINPU T OAl ~ \n\n~ \n\n1  Y\u00a5.  ~ V \n[\\( \n\"---' V  r--\n\nV  \"-\nI'-- / \n\nV \n\nv. \n\n~ ~ I-- ~ ~  ~ \n\"\"\" \n\n.... \n\no \n\nOUTP  TO ~TA \n\nVA' \n\n(b) \n\nTIM E \n\n[2511uc/dlv) \n\n(a) \n\nFigure 4:  (a)  Photomicrograph of a  test circuit for  4-input continuous-time WTA. \nChip size is 800pmx500pm including all peripherals (3-pm rules).  The core circuit \nof Fig.  3(80)  occupies approximately 0.12 mm2 \u2022  (b)  Measured wave forms. \n\n\fNeuron-MOS Temporal Winner Search Hardware for Fully-parallel Data Processing \n\n689 \n\n3  NEURON-MOS DATA SORTING CIRCUITRY \n\nThe elemental idea of this circuit was first proposed at ISSCC  '93 [3]  as an applica(cid:173)\ntion of the vMOS WTA circuit.  In the present work, a clocked-vMOS technique [9] \nwas  introduced  to enhance the accuracy and reliability of vMOS circuit operation \nand test circuits were fabricated and their operation have been verified. \nFig.  5(80) shows the circuit diagram of a test circuit for sorting three analog data VA, \nVB,  and Vc , and a  photomicrograph of a  fabricated  test circuit  designed  on 3-pm \nrules is shown in Fig.  5(b).  Each input stage is a  vMOS inverter:  a  CMOS inverter \nin  which  the  common  gate  is  made  floating  and  its  potential  fjJ F  is  determined \nby  two  input  voltages  via  equa.lly-weighted  capacitance  coupling,  namely  fjJF  = \n(VA  + VRAMP)/2.  The  reset  signal  forces  the  floating  node  be  grounded,  thus \ncancelling the charge on  the  vMOS floating  gate each time  before sorting.  This is \nquite essential in achieving long-term reliability of vMOS operation.  In the second \nstage are flip-flop  memory cells to store sorting results.  The third stage is a  circuit \nwhich counts the number of 1's at its three input terminals and outputs the result in \nbinary code.  The concept of the vMOS A/D converter design  [10]  has been utilized \nin  the circuit. \n\n(a) \n\n(b) \n\n...............  ~ \n\n(j) vMOS  @ Data latch \n\n@ Counter \n\n. \n\nInverter \n\nFigure 5:  (a)  Circuit diagram  of vMOS  data-soring circuit.  (b) Photomicrograph \nof a  test  circuit fabricated  by Tohoku  Univ.  Standard  double-polysillicon  CMOS \nprocess  (3-pm  rules).  Chip size is  1250pmxBOOpm including a.ll  peripherals. \n\nThe sorting circuit is activated by ramping up VRAMP  from  OV  to VDD.  Then  the \nvMOS inverter receiving the largest input turns on first and the output data of the \ncounter at this moment (0,0) is latched in the respective memory cells.  The counter \noutput changes  to  (0,1)  after gate delays  in  the counter and  this  code is  latched \nwhen  the  vMOS  inverter receiving  the second  largest  turns on.  Then the counter \ncounts up  to  (1,0).  In this  manner,  the all input data are numbered according  to \nthe order of their magnitudes after a ramp voltage scan is completed. \n\nThe  measurement  results  are  demonstrated  in  Fig.  6(80)  in  comparison  with  the \nHSPICE simulation results.  Simulation was carried  out on  the same  architecture \ncircuit  designed on O.5-pm design  rules and operated under 3V power supply.  For \nthree  analog  input  voltages:  VA  = 5V,  VB  = 4V,  and  Vc  = 2V,  (0,0),  (0,1), \n\n\f690 \n\nT. SHIBATA, T.  NAKAI, T.  MORIMOTO, R.  KAIHARA, T.  YAMASHITA, T.  OHMI \n\nMEASUREMENT \n\n~ \nr \n\nIi L r \n\nr ' \n10~/div \n\n20nsec/civ \n\n(a) \n\n40 \n~30 -\nS 20 \nj  1: \n\n_100 \n~80 \n-eo \nS40 \n~ 20 \n0 \n\nc: \n\n3-INPUT \n\nSORnNG CIRCUIT \n\n4 \n\n8 \n2 \nSortIng Accuracy  (bit ] \n\n6 \n\n(b) \n\n15-INPUT \n\nSORTING CIRCUIT \n\n4 \n\n8 \n2 \nSortIng Accuracy  (bit ] \n\n6 \n\n(c) \n\nFigure 6:  (a)  Wave forms  of the test circuit shown in Fig.  5(a)  measured  without \nbuffer circuitry (left) and simulation results of a  circuit designed with 0.5-pm rules \n(right).  (b)  Minimum scan  time vs.  sorting accuracy for  a  three-input sorter.  (c) \nMinimum scan time vs.  sorting accuracy for  a  15-input sorter. \n\nand (1,0)  are latched, respectively, after the ramp voltage scan, thus accomplishing \ncorrect sorting.  Slow operation of the test circuit is due to the loading effect caused \nby  the direct  probing of the  node  voltage  without  output buffer  circuitries.  The \nsimulation with  a  0.5-pm-design-rule circuit  indicates the sorting is accomplished \nwithin  the scan time of 4Onsec. \nIn Fig.  6(b),  the minimum scan  time obtained by simulation is  plotted as a  func(cid:173)\ntion of the bit accuracy in sorting analog data.  N -bit accuracy means the minimum \nvoltage  difference  required  for  winner  discrimination  is  VDD/22 \u2022  If the  ramp  rate \nis  too  fast,  the vMOS  inverter receiving  the next  largest data turns on before  the \ncorrect counting results become available,  leading  to an erroneous operation.  The \nscan  time/accuracy relation in  Fig.  6(b)  is  primarily determined  by the  response \ndelay in  the counter.  It should  be  noted that the number of inverter stages in  the \ncounter (vMOS A/D converter) is always three indifferent to the number of output \nbits, namely, the delay would not increase significantly by the increase in  the num(cid:173)\nber of input data.  In order to investigate this, a  15-input counter was designed and \nthe delay time was evaluated by HSPICE simulation.  It was 312 psec in comparison \nwith 110 psec of the 3-input counter of Fig.  5(a).  The scan time/accuracy relation \nfor  the  15-input sorting circuit  is  shown  in  Fig.  6( c),  indicating  the sorting of 15 \ninput data can be accomplished in  100  nsec with 8-bit accuracy. \n\n\fNeuron-MOS  Temporal Winner Search Hardware for Fully-parallel Data Processing \n\n691 \n\n4  CONCLUSIONS \nA  novel  neuron-like functional  device  liMOS  has  been  successfully  utilized  in  con(cid:173)\nstructing intelligent  electronic circuits which  can  carry out search for  the  temporal \nwinner.  As  a  result,  it  has  become  possible  to  perform  data  sorting  as  well  as \nwinner  search  in  an instance,  both requiring  very  time-consuming  sequential  data \nprocessing on a digital computer.  The hardware algorithms presented here are typ(cid:173)\nical examples  of the  liMOS  binary-multivalue-analog merged  computation scheme, \nwhich  would  play an important  role  in  the future flexible  data processing. \n\nAcknowledgements \n\nThis  work  was  partially  supported  by  Grant-in-Aid  for  Scientific  Research \n(06402038) from the Ministry of Education, Science, Sports, and Culture, Japan.  A \npart of this work  was  carried out in  the  Super Clean Room of Laboratory for  Elec(cid:173)\ntronic Intelligent Systems,  Research Institute of Electrical communication, Tohoku \nUniversity. \n\nReferences \n\n[1]  T.  Shibata  and  T .  Ohmi,  \"A  functional  MOS  transistor  featuring  gate-level \nweighted  sum and  threshold  operations,\"  IEEE  Trans.  Electron  Devices,  Vol.  39, \nNo.6, pp.1444-1455  (1992). \n\n[2]  T. Shibata, K.  Kotani, T. Yamashita, H.  Ishii,  H.  Kosaka,  and T. Ohmi, \"Imple(cid:173)\nmenting  interlligence  on  silicon  using  neuron-like  functional  MOS  transistors,\"  in \nAdvances in Neural Information  Processing  Systems  6 (San  Francisco, CA:  Morgan \nKaufmann  1994)  pp.  919-926. \n\n[3]  T. Yamashita,  T.  Shibata,  and  T. Ohmi,  \"Neuron  MOS  winner-take-all  circuit \nand its application to associative memory,\"  in ISSCC Dig.  Tech.  Papers, Feb.  1993, \nFA  15.2,  pp.  236-237. \n[4]  G. Gauwenberghs and V.  Pedroni, \" A charge-based CMOS parallel analog vector \nquantizer,\"  in  Advances  in  Neural  Information  Processing  Systems  7 (Cambridge, \nMA:  The  MIT Press  1995)  pp.  779-786. \n\n[5]  T.  Kohonen,  Self-Organization  and  Associative  Memory,  2nd  ed.  (New  York: \nSpringer-Verlag  1988). \n\n[6]  M.  Kawamata,  M.  Abe,  and T.  Higuchi,  \"Evolutionary digital filters,\"  in  Proc. \nInt.  Workshop  on Intelligent Signal Processing and Communication Systems, seoul, \nOct.,  1994, pp.  263-268. \n\n[7]  J.  Lazzaro,  S.  Ryckebusch,  M.  A.  Mahowald,  and  C.  A.  Mead,  \"Winner-Take(cid:173)\nAll  networks  of O(N)  complexity,\"  in  Advances  in  Neural  Information  Processing \nSystems  1 (San  Mateo,  CA:  Morgan Kaufmann  1989)  pp.  703-711. \n[8]  J . Choi and  B. J. Sheu,  \"A high-precision  VLSI  winner-take-all  circuit  for  self(cid:173)\norganizing neural networks,\"  IEEE J. Solid  State Circuits, Vol.  28,  No.5, pp.576-\n584  (1993). \n\n[9]  K.  Kotani,  T.  Shibata,  M.  Imai,  and  T.  Ohmi,  \"Clocked-Neuron-MOS  logic \ncircuits  employing  auto-threshold-adjustment,\"  in  ISSCC  Dig.  Technical  Papers, \nFeb.  1995,  FA  19.5, pp.  320-321. \n\n[10]  T. Shibata and T.  Ohmi,  \"Neuron  MOS  binary-logic  integrated circuits:  Part \nII, Simplifying techniques of circuit configuration and their practical applications,\" \nIEEE Trans.  Electron  Devices,  Vol.  40,  No.5, 974-979  (1993). \n\n\f", "award": [], "sourceid": 1030, "authors": [{"given_name": "Tadashi", "family_name": "Shibata", "institution": null}, {"given_name": "Tsutomu", "family_name": "Nakai", "institution": null}, {"given_name": "Tatsuo", "family_name": "Morimoto", "institution": null}, {"given_name": "Ryu", "family_name": "Kaihara", "institution": null}, {"given_name": "Takeo", "family_name": "Yamashita", "institution": null}, {"given_name": "Tadahiro", "family_name": "Ohmi", "institution": null}]}