{"title": "The Effects of Circuit Integration on a Feature Map Vector Quantizer", "book": "Advances in Neural Information Processing Systems", "page_first": 226, "page_last": 231, "abstract": null, "full_text": "226  Mann \n\nThe  Effects  of Circuit  Integration  on  a  Feature \n\nMap  Vector  Quantizer \n\nJim  lVIann \n\nMIT Lincoln  Laboratory \n\n244  Wood  St. \n\nLexington,  ~IA 02173 \n\nemail:  mann@vlsi.ll.mit.edu \n\nABSTRACT \n\nThe effects  of parameter  modifications  imposed  by  hardware con(cid:173)\nstraints on  a self-organizing feature  map  algorithm were examined. \nPerformance was  measured  by  the  error  rate  of a  speech  recogni(cid:173)\ntion  system which  included  this  algorithm  as  part of the  front-end \nprocessing.  System parameters  which  were  varied  included  weight \n(connection  strength)  quantization,  adap tation  quantization,  dis(cid:173)\ntance  measures  and  circuit  approximations  which  include  device \ncharacteristics  and  process  variability.  Experiments  using  the  TI \nisolated word database for  16 speakers demonstrated degradation in \nperformance when  weight quantization fell  below 8 bits.  The com(cid:173)\npetitive nature  of the  algorithm  rela..xes  constraints on  uniformity \nand  linearity which makes it an  excellent candidate for  a fully  ana(cid:173)\nlog  circuit implementation.  Prototype circuits have been fabricated \nand characterized following  the constraints established through the \nsimulation efforts. \n\n1 \n\nIntroduction \n\nThe self-organizing feature  map  algorithm  developed  by Kohonen  [Kohonen,  1988] \nreadily lends itself to  the task of vector quantization for  use  in  such areas  as speech \nrecognition.  However,  in  considering  practical  imp lementations,  it  is  necessary  to \n\n\fThe Effects of Circuit Integration on a Feature Map Vector Quantizer \n\n227 \n\n, 0 \no \u00b7 \n\nW \n~ < \nex: \nex: \n0 \nex: \nex: \nw \n0 \nex: \n0 \n~ \n\n100 \n\n90 \n\n80  -\n\n70 \n\n60 \n\n50 \n\n40 \n\n30 \n\n20 \n\n10 \n\n0 \n\n0 \n\n, , \n, , \n, , \n, \n\n, \n\n\\ \n\nEUCLIDEAN \n\n\u2022 - - - - - Dor PRODUCT \n\n\\ \n\n\\ \n\\ \n\n\\ \n\n\\ \n\\ \n\n\\ \n\\ \n\n, , \n, \n, \n, , \n, , , , \n, .. .. --\n\n1 \n\n2  3 \n\n4 \n\n5  6  7 \n\n8  9  10  11  12  13 \n\nNUMBER OF  WEIGHT BITS \n\nFigure  1:  Recognition  performance  of  the  Euclidean  and  dot  product  activity \ncalculators  plotted as  a  function  of weight  precision . \n\nunderstand the limitations imposed by circuitry on algorithm performance.  In order \nto  test  the  effects  of  these  constraints  on  overall  performance  a  simulation  was \nwritten  which  permits ready variation of critical system parameters. \n\nThe feature  map algorithm was placed  in  the  frontend  of a discrete  hidden  Ylarkov \nmodel (H111'I)  speech recognition program as  the vector quantizer (VQ)  in order  to \ntrack the effects of feature map algorithm modifications by  monitoring overall word \nrecognition  accuracy.  The  system  was  tested  on  TI's  20  isolated  word  database \nconsisting of 16  speakers.  Each speaker had  1 training session  consisting of 10  repe(cid:173)\ntitions of each word in  the vocabulary and  8  test sessions consisting of 2 repetitions \nof each word. \n\nThe key parameters tested  include; quantization of both the weight coefficients  and \nlearning  rule,  and  several  different  activation  computations,  the  dot  product  and \nthe  mean  squared  error  (i.e.  squared  Euclidean  distance),  as  well  as  the  circuit \napproximations  to these  calculators. \n\n2  Results \nA  unique  dependency  between  weight  quantization  and  distance  measure  emerged \nfrom  the  simulations  and  is  illustrated  in  the  graph  presented  in  Figure  1.  The \nnetwork  equipped  with  the  mean  squared  error  activity  calculator shows  a  \"knee\" \nin  the word error rate at 6 bits of precision  in  the weight representation.  The overall \nperformance  dropped only  slightly between  the  essentially  ideal  floating  point case, \nat 1.45% error rate, and  the 6 bit case,  at 2.99% error  rate.  At 4 bits,  the error  rate \nclimbs  to  7.62%.  This  still  corresponds  to  a  recognition  accuracy  of  better  than \n92%  but  does show a  marked degradation  in  performance. \n\n\f228  Mann \n\n-./ \n\nWi/ \n\n\u2022 -L \nI -\n\n-\n\n-./ \n\nXn \n\nWi-I.; \n\n\u2022 \n...L \nI \n-\n-\n\n-./-V \n\nI \n--\n\nXn_1 \n\nwo.; \n\nXo \n\nFigure 2:  .-\\  circuit  approximation  to  the  dot  product  calculator. \n\nThe dot product does not degrade as  gracefully  with reduced precision in  the weight \nrepresentation  as  the  mean  squared  error  activity  calculation.  This  is  due  to  the \nnormalization  required  on  the  input,  and  subsequently  the  weight  vectors,  which \ncompresses  the  space  onto  the  unit  hypersphere.  This step  is  necessary because of \nthe  inherent  sensitivity  of this  metric  to  vector  magnitude  in  making  decisions  of \nrelative distance.  Here  the\" knee\"  in  the error curve occurs at 8 bits.  Below  8 bits, \nperformance  drops off dramatically, reaching 40.6% error  rate at 6 bits.  The double \nprecision  floating  point case starts off at  1.68%  and  is  3.44%  at 8  bits. \n\nCircuit  approximations to these activity calculators  were  also included  in  the simu(cid:173)\nlations.  An  approximation  to  the dot  product operation  can  be  implemented with \nsingle  transistors operating in  the ohmic  region at each  connection  as  illustrated  in \nFigure  2. \nThese  area.  related considerations  can  often  overshadow  the  performance  penalties \nassociated  with  their implementation.  The simulation  results from  this  circuit  ap(cid:173)\nproximation  match  the  performance  of the  digital  calculation  of  the  dot  product \nalmost  exactly  as  seen  in  Figure  3.  This  indicates  that  the  performance  of  the \nsystem  depends  more  on  the  monotonicity  of  the  product operation  performed  at \neach connection  then  its linearity. \n\nEffects  of process  variations  on  transistor  thresholds  were  also  examined.  There \nappears  to  be a  gradual decrease in  system performance with  increasing  variability \nin  transistor  thresholds  as  seen in  Figure  4.  The cause of this  phenomena remains \nto be investigated. \n\nA weight adjustment rule  which simplifies  circuitry consists of quantizing the learn(cid:173)\ning  rate  gain  term.  An  integer  step  is  added  to  or  subtracted  from  the  weight \ndepending  on  the  magnitude  of  the  difference  between  it  and  the  input.  In  the \n\n\fThe Effects of Circuit Integration on a Feature Map Vector Quantizer \n\n229 \n\n100  ~_.~ _________ ~ \n\n, \n\n, \n\n.~. \n\n\"\"\" \n\n~ \n\n~ \n\n~ \n\n\" \n\\ , \n\\ \nI \n\\ \n\u2022 I \n\nt . , , \n\n': , \n\no' \nW \n\n-,0 \n~ < a:: \na:: \n0 a:: \na:: \nw \n0 a:: \n0 \n~ \n\n90 \n\n80 \n\n70 \n\n60 \n\n50 \n\n40 \n\n30 \n\n20 \n\n10 \n\nKOHONEN  LEARNING RULE  t \n......................................... \nINCi DEC  LEARNING RULE \nKO.~f'!~ .LE.4:.R~l~G ~UL.e.  ~TRANSISTER \nINC:DEC  LEARNING RULE \n\nPRODUCT \n\nCIRCUIT \n\nDOT \n\n0~--~~~~~~~~======3 \n\no \n\nI \n\n2  3 \n\n4  5 \n\n6  7 \n\n8  9  10  11  12  13 \n\nNUMBER OF  WEIGHT BITS \n\nFigure  3:  Similarity between  the  transistor  circuit  simulation  and  the digital  cal(cid:173)\nculation  of the dot  product \n\n100 \n\n90  -\n\n-,0 \n~ \nw \n~ < a:: \na:: \n0 \na:: \na:: \nw \na \na:: \n0 \n~ \n\n80 \n\n70 \n\n60 \n\n50 \n\n40 \n\n30 \n\n20 \n\n10 \n\n0 \n\n0 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n60 \n\n70 \n\n80 \n\n90  100 \n\nSTD. DEV. (mV) \n\nFigure 4:  The effects of transistor threshold  variation on recognition  performance. \n(8  bit  weight;  Gaussian  distributed,  mean(Vth)  =  0.75 volts). \n\n\f230  Mann \n\n2.0 \n\n1.8  I-\n\n1.6  I-\n\n1.4  r-\n\n1.2  r-\n\n1.0  l-\n\n0.8  r-\n\n0.6  r-\n\n0.4  I-\n\n0.2  r-\n\n, 0 o\u00b7 \na:: \n0 \na:: \na:: \nw \ne \na: \n0 \n~ \n\nI \n\nI \n\n--------\n\n-\n-\n-\n-\n-\n-\n-\n-\n\n-\n\nI \n2 \n\n4 \n\nI \n6 \n\nI \n8 \n\nMAX. WEIGHT ADJUST (+I-) \n\nDBl \n\nPRECISION \n\nFigure  5:  'Nord recognition  error  rate  as  a  function  of learning  rate gain  quanti(cid:173)\nzation. \n\nsimplest  case.  a  fixed  increment  or  decrement  operation  is  performed  based  only \nupon  the  sign  of the  difference  between  the  two  terms.  Even  in  this  simplest  case \nno  degradation  in  performance  was  noted  while  using  an  8  bit  weight  representa(cid:173)\ntion  as  demonstrated  in  the  graph  shown  in  Figure  5.  In  fact,  performance  was \noften  improved  over  the  original  learning  rule.  The  error  rates  using  an  incre(cid:173)\nment/decrement learning rule  with  8 weight  bits was  O.9i%  and  2.0% for  the mean \nsquared  error and  the dot  product,  respectively. \n\nAn  additional  learning  rule  is  being  tested,  targeted  at  a  floating  gate  implemen(cid:173)\ntation  which  uses  a  \"flash\"  EPROM  memory  structure  at  each  synapse.  Weight \nchanges are restricted to positive adjustments locally while all  negative adjustments \nare made globally  to  all  weights.  This corresponds to a forgetting  term, or constant \nweight  decay, in  the learning  rule.  This  rule  was  chosen  to be compatible with  one \ntechnique in  non-volatile charge storage which  allows selective write  but only  block \nerase. \n\n3  Hardware \nA  prototype  synaptic  array  and  weight  adaptation  circuit  have  been  designed  and \nfabricated  [Mann,  1989].  A  single  transistor  synapse  computes  its  contribution  to \nthe dot  product activity calculation.  The weight is stored dynamically as  charge on \nthe gate of the synapse transistor.  The input is  represented as a voltage on the drain \nof the transistor.  The current through  the transistor is  proportional  to  the  product \nof the gate  voltage (i.e.  the  weight)  and  the  drain  voltage  (i.e.  the  input strength) \nwith the source connected to a virtual ground (see  Figure  2).  The sources of several \nof these synapse connected together form  the accumulation needed  to realize  the dot \nproduct.  Circuitry for  accessing stored  weight  information  has  also  been  included. \n\n\fThe Effects of Circuit Integration on a Feature Map Vector Quantizer \n\n231 \n\nThe synapse  array  works  as  expected except  for  circuitry  used  to  read  the  weight \ncontents.  This circuit  requires  very  high  on-chip  voltages causing  other  circuits  to \nlatch-up  when  the clocks  are  turned on . \n\nThe weight  adaptation circuit  performs the simple increment/decrement operation \nbased  on  the  comparison  between  the  input  and  weight  magnitudes.  Both  quanti(cid:173)\nties  are  first  converted  to  a  digital  representation by  a flash  A/D  converter before \ncomparison.  This  circuit  also  performs  the  required  refresh  operation  on  weight. \ncontents,  much  like  that  required  for  dynamic  RAM's  but  requiring  analog  charge \nstorage.  This insures that weight drift is constrained to lie within boundaries defined \nby the precision of the weight representation determined by the A/D con version pro(cid:173)\ncess.  This circuit  was functional in the refresh  and increment modes,  but would  not \ndecrement  correctly. \n\nFurther  tests  are  being  conducted  to  establish  the  causes  of the  circuit  problems \ndetected  thus  far.  Additional  work  is  proceeding on  a  non-volatile  charge storage \nversion  of this device.  Some  test structures  have been  fabricated  and  are  currently \nbeing characterized for  compatibility with  this task. \n\nThis work  was  supported  by  the  Department of the  Air  Force. \n\nReferences \n\nT.  Kohonen.  (1988)  Self-Organization  and  Associative  Memory,  Berlin:  Springer(cid:173)\nVerlag. \nJ.  Mann  &  S.  Gilbert.  (1989)  An  Analog  Self-Organizing  Neural  Network  Chip . \nIn  D.  S.  Touretzky  (ed.),  Advances  in  Neural  Information  Processing  Systems  1, \n739-747.  San  Mateo,  CA:  Morgan  Kaufmann. \n\n\f", "award": [], "sourceid": 248, "authors": [{"given_name": "Jim", "family_name": "Mann", "institution": null}]}