{"title": "An Analog Neural Network Inspired by Fractal Block Coding", "book": "Advances in Neural Information Processing Systems", "page_first": 795, "page_last": 802, "abstract": null, "full_text": "An  Analog  Neural  Network  Inspired  by \n\nFractal  Block  Coding \n\nFernando  J.  Pineda \nThe Applied Physics Laboratory \nThe Johns Hopkins University \nJohns Hokins Road \nLaurel, MD 20723-6099 \n\nAndreas  G.  Andreou \nDept. of Electrical &  Computer \nEngineering \nThe Johns Hopkins University \n34th &  Charles St. \nBaltimore, MD 21218 \n\nAbstract \n\nWe consider the problem of decoding block coded data, using a physical \ndynamical system. We sketch out a decompression algorithm for fractal \nblock  codes  and  then  show  how  to  implement  a  recurrent  neural \nnetwork  using  physically  simple  but highly-nonlinear,  analog  circuit \nmodels of neurons and synapses. The nonlinear system has many fixed \npoints, but we have at our disposal a procedure to choose the parameters \nin  such a way  that only one solution, the desired solution, is stable. As \na  partial  proof of the  concept,  we  present experimental  data  from  a \nsmall system a 16-neuron analog CMOS chip fabricated in a 2m analog \np-well process. This chip operates in the subthreshold regime and,  for \neach choice of parameters, converges to a unique stable state. Each state \nexhibits a qualitatively fractal shape. \n\n1.  INTRODUCTION \n\nSometimes, a nonlinear approach is the  simplest way to  solve a linear problem.  This is \ntrue  when  computing  with  physical  dynamical  systems  whose  natural  operations  are \nnonlinear.  In  such  cases  it may  be  expensive,  in  terms  of physical  complexity,  to \nlinearize  the  dynamics.  For example  in  neural  computation  active  ion  channels  have \nhighly  non  linear  input-output  behaviour  (see  Hille  1984).  Another  example  is \n\n\f796 \n\nFernando Pineda. Andreas G. Andreou \n\nsubthreshold CMOS  VLSI  technology 1.  In both examples the physics  that governs the \noperation of the active devices, gives rise to gain elements that have exponential transfer \ncharacteristics.  These  exponentials  result  in  computing  structures  with  non-linear \ndynamics. It is therefore worthwhile, from both scientific and engineering perspectives, to \ninvestigate the idea of analog computation by highly non-linear components. \nThis  paper,  explores  an  approach  for  solving  a  specific  linear  problem  with  analog \ncircuits that have nonlinear transfer functions.  The computational task considered here is \nthat of fractal block code decompression (see e.g. Jacquin,  1989). \nThe conventional approach to decompressing fractal  codes is  essentially an  excercise in \nsolving  a  high-dimenional  sparse  linear  system  of equations  by  using  a  relaxation \nalgorithm.  The  relaxation  algorithm  is  performed  by  iteratively  applying  an  affine \ntransformation  to  a  state  vector.  The  iteration  yields  a  sequence of state  vectors  that \nconverges to a vector of decoded data. The approach taken in this paper is based on the \nobservation that one can construct a physically-simple nonlinear dyanmical system whose \nunique  stable  fixed  point  coincides  with  the  solution  of the  sparse  linear  system  of \nequations. \nIn the next section we briefly summarize the basic ideas behind fractal block coding. This \nis  followed  by  a  description  of an  analog  circuit  with  physically-simple  nonlinear \nneurons.  We show how to set the input voltages for the  network so that we can program \nthe position of the stable fixed  point.  Finally  , we  present experimental results obtained \nfrom a test chip fabricated in a 2mm CMOS process. \n\n2.  FRACTAL  BLOCK  CODING  IN  A  NUTSHELL \n\nLet the  N-dimensional  state  vector I  represent a one dimensional curve sampled on  N \npoints.  An affine transformation of this vector  is  simply a transformation of the form I' \n= WI+B  , where  W is  an NxN -element matrix  and B  is an N-component  vector.  This \ntransformation can be iterated to produce a sequence of vectors I(O) ... . ,I(n). The sequence \nconverges  to  a  unique  final  state 1*  that  is  independent  of the  initial  state 1(0) if the \nmaximum eigenvalue A.max of the matrix  W satisfies Amax < 1.  The uniqueness of the final \nstate implies that to transmit the  state r  to a receiver, we can  either  transmit r  directly, \nor we can transmit Wand B  and let the receiver perform the iteration to generate r.  In \nthe  latter  case  we  say  that  Wand B  constitute  an  encoding  of the  state 1*.  For  this \nencoding to be useful, the amount of data needed to transmit  Wand B must be less than \nthe amount of data needed to transmit r  This is the case when  Wand B  are  sparse and \nparameterized and when  the  total  number of bits needed to transmit these parameters is \nless than the total number of bits needed to transmit the uncompressed state r \nFractal  block coding is  a special  case of the  above approach. It amounts  to choosing a \n\nlWe consider  subthreshold analog VLSI., (Mead 1989; Andreou and Boahen, 1994). A \nsimple subthreshold model is  ~ = I~nfet) exp(K'Vgb)( exp( -vsb) -exp( -Vdb\u00bb)  for \nNFETS, where 1C  - 0.67 and  I~  t) = 9.7 x 10-18 A.  The voltage differences Vgb, \n,vsb,and Vdb  are in units of the thermal voltafje, Vth= 0.025V. We use a corresponding \nexpression for PFETs of the from  Ids = I~pfe  exp( -K'Vgb)( exp(vsb) - exp(vdb\u00bb)  where \nI~Pfet)  =3.8xl0-18 A. \n\n\fAn Analog Neural Network Inspired by Fractal Block  Coding \n\n797 \n\nblocked structure for the matrix W.  This structure forces large-scale features to be mapped \ninto small-scale features.  The result is  a steady state r  that represents a curve  with self \nsimilar (actually self affine) features. As  a concrete example of such a structure, consider \nthe following transformation of the state I. \n\nfor  O~ i ~ N  -1 \n\n2 \n\nr i = w R12i- N + bR \n\nN \n\nfor  2: ~ i ~ N-l \n\n(1) \n\nThis transformation has two blocks. The transformation of the first N/2 components of I \ndepend  on  the  parameters  W Land  b L  while  the  transformation  of  the  second  N/2 \ncomponents depend  on  the  parameters WR,  and bR  . Consequently just four parameters \ncompletely specify this transformation. This transformation can be expressed as a single \naffine transformation as follows: \n\n/' 0 \n\nwL \n\n10 \n\nb L \n\nI'N/2-1 \n\n/' N12 \n\n= \n\nWR \n\nwL  1 \nN/2-1  + \n1 \nN/2 \n\nb L \nbR \n\n(2) \n\n/' N-l \n\nwR \n\nI N - 1 \n\nbR \n\nThe  top  and  bottom  halves  of I  I  depend  on  the  odd  and  even  components  of I \nrespectively.  This  subsampling causes features  of size  I  to be mapped into features  of \nsize 112.  A subsampled copy of the state I  with transformed intensities is  copied into the \ntop half of 1'. Similarly,  a subsampled copy of the state I  with transformed intensities is \ncopied  into  the  bottom  half of 1'.  If this  transformation  is  iterated,  the  sequence  of \ntransformed vectors will converge provided the eigenvalues determined by WL  and WR are \nall less than one (i.e.  WL  and WR  < 1). \nAlthough  this  toy  example  has  just four  free  parameters  and  is  thus  too  trivial  to  be \nuseful  for actual compression  applications,  it does  suffice to generate state vectors with \nfractal  properties  since at  steady  state,  the  top  and  bottom halves of I'  differ from  the \nentire curve by an affine transformation. \nIn  this  paper we  will  not describe  how  to  solve the inverse problem  which consists of \nfinding  a  parameterized  affine  transformation that produces  a given final  state  T.  We \nnote,  however,  that it  is  a special  (and  simpler) case of the  recurrent  network training \nproblem,  since the problem is  linear, has  no  hidden units and  has only one fixed point. \nThe reader is refered to (Pineda,  1988) or.  for a least squares algorithm in the context of \nneural  nets  or to  (Monroe  and  Dudbridge,  1992)  for  a  least  squares  algorithm  in  the \ncontext of coding. \n\n3.  A  CMOS  NEURAL  NETWORK  MODEL \nNow that we have described the salient aspects of the fractal decompression problem, we \ntum to the problem of implementing an analog neural network whose nonlinear dynamics \nconverges to  the  same fixed  point as  the  linear system.  Nonlinearity  arises  because  we \n\n\f798 \n\nFernando Pineda, Andreas G. Andreou \n\nmake  no  special  effort  to  linearize  the  gain  elements  (controlled  conductances  and \ntransconductances) of the  implementation medium.  In  this  section  we  first  describe a \nsimple neuron. Then we analyze the dynamics of a network composed of such neurons. \nFinally we describe how to program the fixed point in the actual physical network. \n3.1  The  analog  Neuron \n\\\\(~t) woulgll)  like  to  create  a  neuron  model  that  calculates  the  transformation \nI  = al  + b .  Consider  the  circuit  shown  in  figure  1.  This  has  three  functional \nsections which compute by adding and subtracting currents and. where voltages are \"log\" \ncoded; this is the essence of the \"current-mode\" aproach in circuit design (Andreou et.al. \n1994). The first section, receives an input voltage from a presynaptic neuron, converts it \ninto a current I(in), and multiplies it by a weight a. The second section adds and subtracts \nthe bias current b.  The last section converts the output current into an output voltage and \ntransmits  it to  the  next  neuron  in  the  network.  Since  the  transistors  have  exponential \ntransfer characteristics, this voltage is logarithmically coded. \nThe parameters a and b are set by external voltages.  Theyarameter a, is set by a single \nexternal voltage Va  while the bias parameter b = br -) - b( +  is set by two external voltages \nvb(+)  and vbH  .  Two  voltages are used  for  b  to account for  both positive and  negative \nbIas values since b( -\u00bb0 and b( + \u00bb0 . \n\n'{,(+) \n\nV \n\nr - - - - - - - - - ,  \nr - - - - - ,   r - - - - - I  \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \n(in) I \nI \nI \nII \n:  aiin~ \nI \nI \nI  ~ I \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nI \nL ________  .I \nI \nI \n\nI \nI \nI \nI \nI \nI L ______ -' \n\n: \n\n(ou \n\nII  I l \n\nI \nI \nI \nI \nI \n\nI \n\nI \nI \nI \nI \n\nI \nI \n\nI~--~---+----~~r--;--~--~V \n\n(out) \n\nI \nI \nI \nI \nI \nI \nI \n\n------\nI \n\n~-) \n\nFigure 1. The analog neuron has three sections. \n\nTo derive the dynamical equations of the neuron, it is neccesary to add up all the currents \nand invoke  Kirchoffs current law, which requires that \n\nlout ) _alin ) +b(+) -b(-) = Ic . \n\n(3) \n\nIf we now assume a simple subthreshold model for the behavior of the FET's and PFETs \nin the neuron, we can obtain the following expression  for the current across the \ncapacitor: \n\nQ  dl out) \n- - -\nlout) \n\ndt \n\n=1 \nc \n\n(4) \n\n\fAn Analog Neural Network  Inspired by  Fractal Block  Coding \n\n799 \n\nwhere Q =  Cl1cVth determines the characteristic time scale of the neuron2.  It immediately \nfollows  from the last two expressions that the dynamics of a single neuron is determined \nby the equation \n\ndJCout) \n\nQ \n\ndt \n\n= _/(out) (I(out)  _  al(in) _  b). \n\n(5) \n\nWhere b = M-) - M+) . This equation appears to have a quadratic nonlinearity on the r.h.s. \nIn fact,  the noninearity is even more complicated since,  the cooeficients a,  M +) and b( -) \nare  not  constants,  but  depend  on  I(out)  (through  v(out).  Application  of the  simple \nsubthreshold model, results  in a multiplier gain that is  a  function  of v( out)  (and  hence \ntout))  as well as Va  . It is given by \n\na( va' v<out\u00bb) = 2exp(- v~ {sinh( v~ - va) -sinh( v~ _v(out\u00bb J \n\n(6) \n\n(7.a) \n\n(7.b) \n\nSimilarly, the currents  b(+) and  bH  are given by \n\nb( +) = I~fpet) exp( KV b(+) )( 1- exp( _v(out\u00bb) \n\nand \n\nb(-) = I~nfet) exp(KVb(_) )(I_exp(_v(Out\u00bb) \n\nrespectively, where  va  ==  vdd - va . \n3.2  Network  dynamics  and  Stability  considerations \nWith these results we conclude that, a network of neurons, in  which each neuron receives \ninput from only one other neuron, would have a dynamical equation of the form \n\nQ _ !  = -1\u00b7(/\u00b7 - a\u00b7  I\u00b7  1 \u00b7(\u00b7) - b\u00b7) \n\n) \nI  J! \n\n! !  \n\nd[' \ndt \n\n( \n\n! \n\nI \n\n(8) \n\nwhere the connectivity of the network is determined by the function j( i) . The fixed points \nof these highly  nonlinear equations occur when  the r.h.s.  of (8)  vanishes.  This can only \nhappen if either  Ii = Oor if  (Ii  - aJj(i) - bi ) = 0  for each i.  The local stability of each of \nthese  fixed  points  follows  by  examining  the  eigenvalues  (A.)  of  the  corresponding \njacobian. The expression for the jacobian at a general point I  is \n\nJ'k = dFi  = -Q[(/. - a.J.(\u00b7) - b\u00b7)8\u00b7k + [,(1- a~1 .(.) - b~)8'k - a.J.8 '(')k] \n\nI \n\nI \n\nJ \n\nI \n\n. \n\n! \n\nI  J \n\nI \n\nI  J \n\nI \n\nI \n\nI \n\nI \n\n! \n\ndlk \n\nI \n\nI \n\n(9) \n\nWhere the partial derivatives, a'i  and b'j  are with respect to Ii.  At a fixed point the \njacobian takes the form \n\nbi 8ik \n\n{\n\nJik  = Q  -/i [ (1- a[lj(i) - b[)8ik  - ai8j(i)k] \n\nif  Ii = 0 \nif  (Ii - ailj(i) - bi ) = O\u00b7 \n\n(10) \n\n2C represents the total gate capacitance from all the transistors connected to the horizontal \nline of the neuron. For the 2J..l analog proc~s~, the  gate capacitance is aprroximately 0.5 \nfF/J..l2  so a 10J..l x  10J..l FET has a charactenstlc charge of Q =2.959 x  10- 4 Coulombs at \nroom temperature. \n\n\f800 \n\nFernando Pineda,  Andreas G.  Andreou \n\nThere are two cases of interest. The first case is when no neurons have zero output. This \nis the \"desired solution.\" In this case, the jacobian specializes to \n\nJik  =-QIi[(1-aiIj(i) -b[)Oik -aiOj(i)k]' \n\n(11) \n\nWhere, from (6) and (7), it can be shown that the partial derivatives, a'i  and b'i  are  both \nnon-positive.  It immediately  follows,  from  Gerschgorin's  theorem,  that  a  sufficient \ncondition  that  the  eigenvalues  be  negative  and  that  the  fixed  point  be  stable,  is  that \nlail <l. The second case is when at least one of the neurons has zero output. We call these \nfixed points the  \"spurious solutions.\" In this case some of the eigenvalues  are  very  easy \nto  calculate because terms of the  form  (bi  -).) ,where  Ii  = 0,  can be factored from  the \nexpression for det(J-.?J). Thus some eigenvalues can be made positive by making some of \nthe  bi  positive.  Accordingly,  if all  the  bi  satisfy  bi  >0 ,  some  of the eigenvalues  will \nnecessarily  be  positive  and  the  spurious  solutions  will  be  unstable.  To summarize the \nabove discussion,  we  have  shown that by  choosing  bi >0 and  lail <1  for all  i,  we  can \nmake  the desired fixed  point stable and the spurious fixed  points unstable.  Note that a \nsufficient condition for bi >0  is if b~ +) = O. \nIt remains to  show that the  system must converge to the desired fixed point,  i.e.  that the \nsystem cannot oscillate or wander chaotically. To do this we consider the connectivity of \nthe  network  we  implemented in  our test chip.  This  is  shown schematically in  figure  2. \nThe first eight neurons  receive  input from  the  odd  numbered neurons  while the  second \neight neurons receive  input from  the even  numbered  neurons.  The  neurons on  the left(cid:173)\nhand side all  share the  weight,  WL,  while the neurons on  the right  share  the  weight WR. \nBy  tracing  the  connections,  we  find  that  there  are  two  independent loops  of neurons: \nloop #1  = {0,8,12,14,IS,7,3,1}  and  loop #2 = {2,9,4,1O,13,6,1l,S}. \n\nFigure  2.  The  connection  topology  for  the  test chip is  determined  by \nthe matrix of equation (1).  The neurons are labeled 0-15. \n\nBy inspecting each loop,  we see that it passes through either the left or right hand range \nan  even  number of times.  Hence,  if there  are  any  inhibitory  weights  in  a  loop,  there \nmust  be  an  even  number of them. This  is  the  \"even  loop criterion\",  and  it  suffices  to \nprove that the network is globally asymptotically stable, (Hirsch,  1987). \n\n3.3.  Programming  the  fixed  point \nThe nonlinear circuit of the previous section converges to a fixed point which is the \nsolution of the following system of transcendental equations \n-\n\n(-)  * \n\n* \n\n* \nIi  -ai(li ,va)Ij(i)  -bi \n\n* \n\n(Ii  ,vb<-\u00bb-O \n\n(12) \n\n\fAn Analog Neural  Network  Inspired by Fractal Block Coding \n\n801 \n\nwhere the coefficients ai  and bi  are given by equations (6) and (7b) respectively. \nSimilarly, the iterated affine transformations converge to the solution of the following \nlinear equations \n\n* \nIi  - A/j(j)  - B j  = 0 \n\n* \n\n(13) \n\nwhere the coefficients {Ai ,Bi } and the connectionsj(i) are obtained by solving the \napproximate inverse problem with the additional constraints that bi >0 and  lail <1  for all \ni,.  The requirement that the fixed points of the two systems be identical results in the \nconditions \n\nAj = aj(lj ,va) \n\n* \n\nB  - b(-)(I* \n\nj  -\n\nj \n\nj  ,V b(-) \n\n) \n\n(14) \n\nThese equations can be solved for the required input voltages Va,  and  vb(-).  Thus  we  are \nable to construct a nonlinear dynamical system that converges to the same fixed point as a \nlinear system. For this programming method to  work,  of course, the  subthreshold model \nwe have used to characterize the network must accurately model the physical properties of \nthe neural network. \n4.  PRELIMINARY  RESULTS \nAs a first step towards realizing a working system, we fabricated a Tiny chip containing \n16  neurons arranged in two groups of eight. The topology is the same as shown in figure \n2.  The neurons are similar to  those in figure  1 except that the bias term in each block of \n8 neurons has the form  b = kb( -) + (7 - k )b( -) , where O::;k::;7  is  the  label  of a particular \nneuron  within  a  block.  This  form  increases  the  complexity  of the  neurons,  but  also \nallows us to represent ramps more easily (see figure 3). \nWe  fabricated  the  chip through  MOSIS  in  a  2~m p-well  CMOS  process.  A  switching \nlayer allows  us  to  change the connection topology at run-time.  One of the four possible \nconfigurations  corresponds  to  the  toplogy  of  figure  2.  Six  external  voltages \n,VbH '  Vi)H  }parameterize the fixed  points of the network. These are \n{Va  ,V  H ' Vi) H ' Va \nconfrolfM blpote~tioIdeters~ There is  multiplexing circuitry included on the  chip that \nselects which  neuron output is to  be amplified by a sense-amp and routed off-chip.  The \nneurons  can be  addressed individually  by  a  4-bit neuron  address.  The addressing  and \nanalog-to-digital conversion is performed by a Motorolla 68HCIIAI microprocessor. \nWe have operated the chip at 5volts and at 2.6 volts.  Figure 3. shows the scanned steady \nstate output of one of the test chips for a particular choice of input parameters with v dd =5 \nvolts. The curve in figure 3. exhibits the qualitatively self-similar features of a recursively \ngenerated object.  We are able to  see three generations of a ramp. At 2.5  volts  we see a \nvery similar curve. We find that the chip draws  16.3 ~ at 2.5 volts. This corresponds to \na steady state power dissipation of 411lW. Simulations indicate that the chip is operating \nin the subthreshold regime when  Vdd  = 2.5 volts. Simulations also indicate that the chip \nsettles in less than one millisecond. We are unable to perform quantitiative measurements \nwith  the  first  chip  because  of  several  layout  errors.  On  the  other  hand,  we  have \nexperimentally  verified  that  the  network  is  indeed  stable  and  that  network  produces \nqualitative  fractals.  We explored  the  parameter space  informatlly.  At no  time did  we \nencounter anything but the desired solutions. \n\n\f802 \n\nFernando Pineda, Andreas G.  Andreou \n\nO~--~~~------~--==:L----~----~----~-\no \n\n10 \n\n2 \n\n12 \n\n14 \n\nNeuron label \n\n4 \n\n6 \n\n8 \n\nFigure 3 D/A output for chip #3  for a particular set of input voltages. \n\nWe have already fabricated a larger design without the layout problems of the prototype. \nThis second design has 32 pixeles and a richer set of permitted topologies. We expect to \nmake quantitative measurements with this second design. In particular we  hope to use it \nto decompress an actual block code. \nAcknowledgements \nThe  work described here is  funded  by  APL !R&D as  well  as a grant from  the  National \nScience Foundation ECS9313934,  Paul Werbos  is  the monitor.  The  authors  would like \nto thank Robert Jenkins, Kim Strohbehn and Paul Furth for many  useful conversations \nand suggestions. \nReferences \nAndreou,  A.G.  and Boahen,  K.A.  Neural  Information Processing I:  The Current-Mode \napproach, Analog VLSI:  Signal and Information Processing,  (eds: M Ismail and T. Fiez) \nMacGraw-Hill Inc., New York. Chapter 6 (1994). \nHille, B., Ionic Channels of Excitable Membranes,  Sunderland,  MA,  Sinauer Associates \nInc.  (1984). \nHirsch, M.  ,Convergence in Neural Nets, Proceedings of the IEEE ICNN, San Diego, \nCA,  (1987). \nJacquin, A.  E., A Fractal Theory of iterated Markov operators with applications to digital \nimage coding,  Ph.D. Dissertation, Georgia Institute of Technology (1989). \nMead, c., Analog VLSI and Neural Systems, Addison Wesley, (1989) \nMonroe, D.M. and Dudbridge, F. Fractal block coding of images, Electronics Letters, 28, \npp.  1053-1055, (1992). \nPineda,  F.J.,  Dynamics  and  Architecuture  for  Neural  Computation,  Journal  of \nComplexity,  4,  216-245  (1988). \n\n\f", "award": [], "sourceid": 957, "authors": [{"given_name": "Fernando", "family_name": "Pineda", "institution": null}, {"given_name": "Andreas", "family_name": "Andreou", "institution": null}]}