{"title": "The Hopfield Model with Multi-Level Neurons", "book": "Neural Information Processing Systems", "page_first": 278, "page_last": 289, "abstract": null, "full_text": "278 \n\nTHE HOPFIELD MODEL WITH MUL TI-LEVEL NEURONS \n\nMichael Fleisher \n\nDepartment of Electrical Engineering \n\nTechnion - Israel Institute of Technology \n\nHaifa 32000, Israel \n\nABSTRACT \n\nThe  Hopfield  neural  network.  model  for  associative  memory  is  generalized.  The  generalization \n\nreplaces  two  state  neurons by neurons taking a  richer set of values.  Two  classes  of neuron  input output \n\nrelations are developed guaranteeing convergence to stable states.  The first is a class of \"continuous\" rela-\n\ntions and the second is a class of allowed quantization rules for the neurons.  The information capacity for \n\nnetworks from  the second class is fOWld  to be of order N 3 bits for a network with N  neurons. \n\nA generalization of the sum of outer products learning rule is developed and investigated as well. \n\n\u00a9 American Institute of Physics 1988 \n\n\f279 \n\nI. INTRODUCTION \n\nThe ability  to  perfonn  collective computation  in  a  distributed  system  of flexible  structure  without \n\nglobal  synchronization  is  an  important  engineering  objective.  Hopfield's  neural  network  [1]  is  such  a \n\nmodel of associative content addressable memory. \n\nAn important property of the Hopfield neural network is its guaranteed  convergence  to  stable states \n\n(interpreted as the stored memories).  In this work we introduce a generalization of the Hopfield model by \n\nallowing the outputs of the neurons  to take a  richer set of values than Hopfield's original binary neurons. \n\nSufficient conditions for preserving  the  convergence  property  are developed  for  the  neuron  input output \n\nrelations.  Two classes of relations are obtained.  The first introduces neurons which simulate multi  thres-\n\nhold functions, networks with such neurons will be called quantized neural networks (Q.N.N.).  The second \n\nclass  introduces continuous neuron input output relations  and  networks  with  such  neurons  will  be called \n\ncontinuous neural networks (C.N.N.). \n\nIn  Section  II,  we introduce  Hopfield's  neural  network and  show its convergence property.  C.N.N. \n\nare introduced in Section m and a  sufficient condition for the neuron input output continuous relations is \n\ndeveloped for preserving convergence.  In  Section IV, Q.N.N.  are introduced and their input output rela(cid:173)\n\ntions  are  analyzed  in  the  same manner as  in  III.  In  Section  IV  we look further at Q.N.N.  by using  the \n\ndefinition  of information capacity  for  neural  networks of [2]  to  obtain  a  tight asymptotic  estimate of the \n\ncapacity for a Q.N.N. with N  neurons.  Section VI is a generalized sum of outer products learning for the \n\nQ.N.N. and section VII is the discussion. \n\nn. THE HOPFIELD NEURAL NETWORK \n\nA neural  network consists of N  pairwise connected neurons. The i 'th neuron can be in  one of two \nstates: Xi = -lor Xi  = + 1.  The connections are fixed  real  numbers denoted  by  W ij  (the connection \nfrom  neuron  i  to  nelD'On  j  ).  Defme  the  state  vector X  to  be  a  binary  vector  whose  i 'th  component \n\ncorresponds to the state of the i 'th neuron.  Randomly and asynchronously, each neuron examines its input \n\nand decides its next output in  the following manner.  Let ti  be the threshold voltage of the i 'th neuron.  If \n\nthe  weighted  sum  of the  present other N -1  neuron  outputs  (which  compose  the  i 'th  neuron  input)  is \n\n\f280 \n\ngreater or equal to ti' the next Xi (xt) is+l. ifnot.Xt is -1.  This action is given in (1). \n\nX\u00b7+ = sgn  [  ~ W\u00b7\u00b7X \u00b7-t\u00b7 ] \n\nIJ  J \n\nI \n\nI \n\nN \nLi \nj=1 \n\n(1) \n\nWe give the following theorem \n\nTheorem 1 (of (1)) \n\nThe network described with symmetric (Wij=Wji )  zero diagonal (Wi;=<\u00bb  connection matrix W \n\nhas the convergence property. \n\nDefme the quantity \n\nE(X) =- - ~ ~ W\u00b7\u00b7X\u00b7X\u00b7 +  ~ t\u00b7X\u00b7 \nI \n\nIJ \n\nI \n\nJ \n\n-\n\n1  N  N \n2  Li  Li \ni  j=1 \n\nN \nLi  I \ni=1 \n\n(2) \n\nWe show that E (X) caD only decrease as a result of the action of the network.  Suppose that Xk  changed \nto X t = Xl +Mk \u2022 the resulting change in E is given by \n\ntJ.E  = -llXk (  1: WkjXj-tk) \n\nN \n\nj=1 \n\n(3) \n\n(Eq. (3)  is correct because of the restrictions on W).  The term  in brackets is exactly the argument of the \nsgn function  in (1) and therefore the signs of IlXk  and the  term  in brackets is the same (or IlXk =<\u00bb  and \nwe get!lE  ~ O.  Combining this with  the fact  that E (X) is bounded shows that eventually the network \nwill remain in a local minimum of E (X).  TlUs cornpJetcs the proof. \n\nThe technique used in  the proof of Theorem  1 is an  important tool  in analyzing neural  networks.  A \n\nnetwork  with  a  particular  underlying E (X) function  can  be  used  to  solve optimization  problems  with \n\nE (K) as the object of optimization.  Thus we see another use of neural networks. \n\n\fm. THE C.N.N. \n\nWe ask ourselves the following question: How can we change the sgn function in  (1) without affecl(cid:173)\n\ning the convergence property?  The new action rule for the i 'th neuron is \n\n281 \n\nX\u00b7+=/\u00b7[  ~ W\u00b7\u00b7X\u00b7  ] \n\nIJ  J \n\n, \n\nN \n1  kI \nj=l \n\n(4) \n\nOur attention is focused on possible choices for Ii ('). The following theorem gives a part of the answer. \n\nTheorem 2 \n\nThe network described  by  (4)  (with  symmetric zero  diagonal  W) has  the convergence  property  if \n\nIi ( . ) are strictly increasing and bounded. \n\nDefine \n\nWe show as before that E ex) can only decrease and since E  is bounded (because of the boundedness of \n\nIi's) the theorem is proved. \n\n(5) \n\nUsinggi(Xi ) = J li-l(u)dU  we have \n\nXj \n\no \n\nUsing the intel111ediate value theorem we gel \n\n(6) \n\n\f282 \n\nis  a \n\npoint  between  X k \n\nif  Mk  > 0  we  have \nwhere  C \nC  S; Xk+LlXk = > Ik-I(C) S;fk-1(Xk+Mk ) and  the  term  in brackets is greater or equal to zero \n=> IlE  SO.  A similar argument holds for Mk  < 0 (of course Mk =0 => llE =0). This comp~etes \n\nand  Xk +LlXk .  Now, \n\nthe proof. \n\nSome remarks: \n\n(a)  Strictly increasing bounded neuron relations are not the whole class of relations conserving the conver-\n\ngence property.  This is seen  immediately  from  the fact  that Hopfield's original  model  (1)  is  not in this \n\nclass. \n\n(b)  The E (X) in  the  C.N.N. coincides with  Hopfield's  continuous  neural  network  [3].  The difference \n\nbetween the two networks lies in  the updating scheme.  In  our C.N.N. the neurons update their outputs at \n\nthe moments they examine their inputs while in [3]  the updating is in  the form of a set of differential equa(cid:173)\n\ntions featuring the time evolution of the network outputs. \n\n(c)  The boundedness requirement of the  neuron  relations  results  from  the  boundedness of E (K).  It is \n\npossible to impose further restrictions on W  resulting in  unbounded neuron relations but keeping E (X) \n\nbounded (from below).  This was done in [4]  where the neurons exhibit linear relations. \n\nIV.  THE Q.N.N. \n\nWe  develop  the  class  of quantization  rules  for  the  neurons,  keeping  the  convergence  property. \n\nDenote  the  set of possible neuron  outputs  by Yo  < Y 1  < ... < Y n  and  the  set of threshold  values  by \nt 1 < t 2  <  ... < t n  the action of the neurons is given by \n\nxt = Y/ \n\nif  t/  < L  W;jXj  ~ tl+l  I=O, ... ,n \n\nN \n\nj=1 \n\n(8) \n\nThe following theorem gives a class of quantization rules with the convergence property. \n\n\fTheorem 3 \n\nAn.y quantization rule for the neurons which is an increasing step functioo that is \n\nYo<Y  <  . .. y  ,t  <  ...  <t \nn \n\nn'  1 \n\n1 \n\nYields a network with the convergence property (with a W symmetric and zero diagonal). \n\nWe proceed to prove. \n\nDefine \n\nwhere G (X) is a piecewise linear convex U  function defined by the relation \n\n283 \n\n(9) \n\n(10) \n\n(11) \n\nAs before we show M  ~ O.  Suppose a change occurred in Xk  such thatXk =Yi - 1.Xt=yi .  We then \n\nhave \n\nA similar argument follows when Xk =Yi ,Xk+=Yi - 1 < Xk .  Any bigger change in Xk  (from Yi  to Yj \nwith  I i - j  I > 1) yields the same result since it can be viewed as a sequence of  I i - j  I changes from Y i \n\nto Yj  each resulting in M  ~O. The proof is completed by noting that LlX'e=O=>M =0 and E (X) is \n\nbounded. \n\n(12) \n\n\f284 \n\nCorollaIy \n\nHopfield's original model is a special case of (9). \n\nV.  INFORMATION CAPACITY OF THE Q.N.N. \n\nWe use the definition of [2] for the information capacity of the Q.N.N. \n\nDefinition  1 \n\nThe information capacity of the Q.N.N.  (bits) is the log (Base 2) of the number of distinguishable \n\nnetworks of N  neurons.  Two networks are distinguishable if observing the state transitions of the neurons \n\nyields different observations.  For Hopfield's original model it was shown in  [2]  that the capacity C  of a \nnetwork  of N  neurons  is  bounded  by  C  ~ log (2(N-l)2f = O(N 3)b.  It  was  also  shown  that \nC  ~ Q(N 3)b  and thus is exactly of the order N 3b.  It is obvious that in our case (which contains the \noriginal  model)  we  must  have  C  ~ Q(N 3)b  as  well  (since  the  lower bound  cannot  decrease  in  this \nricher case).  It is shown in the Appendix that the number of multi threshold functions of N -1 variables \nwith  n+l  oUlput  levels  is  at  most  (n+lf2+N+1  since  we  have  N  neurons  there  will  be \n( (n+lf2+N+1f  distinguishablenetworlcs and thus \n\n01  as before, C  is exactly of O(N 3)b.  In fact,  the rise in C  is probably a faclOr of O(log2n) as can be \n\n(14) \n\nseen from  the upper bound. \n\nVI. \"OUTER PRODUCT\" LEARNING RULE \n\nFor Hopfleld's origiDal network with two state neurons (taking the values \u00b11) a  nalw-al and exten(cid:173)\n\nsively investigated r l.t 1.\u00a3  ] learning rule is the so called sum of outer products construction. \n\n1  1 \nW .. =- ~ X\u00b7X\u00b7 \n) \n\n1)  N  ~  1 \n\n1  K \n\n1=1 \n\n(15) \n\nwhere Xl, ... , X K  are the desired stable states of the network. A well-known result for (15) is that the \n\nasymplOtic capacity K  of the network is \n\n\fK=  N-l  +1 \n\n410gN \n\n285 \n\n(16) \n\nIn this section we introduce a natural generalization of (15) and prove a similar result for the asymp-\n\ntotic capacity. We first limit the possible quantization rules to: \n\n(17) \n\nwith Y  <  ...  < Y \nn \n\no \nt.=~(y.+y. IJ \n\nJ \n\nJ-\n\nJ \n\n2 \n\nwith \n\nis  even \nYi  -:#  0 \n\n(a) \nn+l \n(b)  V  i \n(c) \n\ny.  =-y  . \nn-l \n\nI \n\nj=l, ... n \n\ni=O, ... ,n \n\nN eAt  we  state  that  the  desired  stable  vectors Xl, . . . X K  are  such  that  each  component  is  picked \n\nindependently at random  from  ( Yo ' . . . Y M  }  with equal  probability. Thus.  the K  \u2022 N  components of \n\nthe X 's are zero mean i.i.D random variables. Our modified learning rule is \n\nw ..  = -L  ~ X!. [_1 ] \n\nIJ  N  ~  I \n\nXl \nj \n\n1=1 \n\nNote that for Xi E  (+1, -I} (18) is identical to (16). \n\nDefine \n\n(18) \n\n\f286 \n\n;~~ IYi  -Yjl \n\nl\u00a2oJ \n\nA  = max \niJ \n\nIY.12 \n\nl \n\nIYj  I \n\nWe state that \n\nPROPOsmON: \n\nThe asymptotic capacity of the above network is given by \n\nN \n\nK= - - - - -\n16A 2  logN \n,.., \n\n(6y)2 \n\nPROOF: \n\nDef\"me \n\nP (K , N) = P r \n\n{ K  vectors  chosen  randomly  as  deSCribed} \n\nare  stable  states  with  the  W  of  (  ) \n\n(19) \n\n(20) \n\nwhere Aij is the event that the  i th component of j th  vector is in error. We concentrate on the event All \n\nW.L.G. \n\nThe input u 1 when X' is presented is given by \n\n(21) \n\nThe first term is mapped by (17) into itself and corresponds to the desired Signal. \n\nThe  last  term  is a  sum  of (K -1 )(N -1) i.i.D  zero  mean  random  variables  and  corresponds  to \n\nnoise. \n\n\f287 \n\nK-l \n\nN  -+00 \n\nThe middle term  -N X 1  is  disposed of by assuming -N  ~ O.  (With  a zero diagonal \n\nK-l \n\n1 \n\nchoice of W  (using (18) with i *' j) this term does not appear). \nP r (A 11) = P r  {  noise  gets  us  out  of  range  } \nDenoting the noise by I  we have \n\n(K -1)(N-l)4A 2 \n\n(22) \n\nwhere the  first inequality is from the defmition of .1Yand the second uses the lemma of [6]  p.  58. We thus \n\nget \n\nP (K , N) ~ 1 - K  \u2022 N  . 2exp \n\n,.., \n(,1Y)2N 2 \n\n- -~---'---~ \n\n8(K -l)(N-l)A 2 \n\n(23) \n\nsubstituting (19) and taking N  ~ 00 we get P (K , N) ~ 1 and this completes the proof. \n\nVll.  DISCUSSION \n\nTwo classes of generalization of the Hopfield neural network model were presented.  We give some \n\nremarks: \n\n(a) Any combination of neurons from the two classes will have the convergence property as well. \n\n(b)  Our defmition of the information capacity for the eN.N. is useless since a full observation of the pos\u00b7 \n\nsible state transitions of the netwock is impossible. \n\n\f288 \n\nAPPENDIX \n\nWe prove the following theorem. \n\nTheorem \n\nAn  upper bound  on  the  num~ of multi  threshold  functions  with N  inputs  and  M  points  in  the \n\ndomain (out of(n+l)N possible points) et/ is the solution of the recurrence relation \n\neM - CM - 1 + n \u00b7CM - 1 \nN-l \nN  - N \n\n(A.I) \n\nLet us  look on  the N  dimensional  weight space W.  Each  input point X  divides the  weight space \ninto n+l regions by n  parallel hyperplanes  L  W;X;=tk  k=l, ... ,n.  We keep adding points in such \n\nN \n\n;=1 \n\na way that the new n  hypeq>1anes  corresponding to each added  point partition the W  space into as  many \nregions as  possible.  Assume M -1 points  have  made e t! -I regions and  we add  the M 'lh point.  Each \nhyperplane  (out of n) is divided  into at  most Cf/_l1  region,  (being  itself an  N -1  dimensional  space \ndivided by (M -1)n hyperlines).  We thus have after passing the n hyperplanes: \n\nis e tI = (n + 1).L \n\nN-l[ M-1] \n\ni \n\neM  - CM - I  + n \u00b7CA1 - 1 \n\nN-I \n\nN  -\n\nN \n\nn i and the theorem  is proved . \n\n\u2022 =0 \n\nThe  solution of the recurrence  in  the case M =(n + I f  (all  possible  points) we  have  a bound on \n\nthe number of multi threshold functions of N  variables equal to \n\nand the result used is established. \n\n\f289 \n\nLIST OF REFERENCES \n\n[1]  Hopfield J. J. t  \"Neural networks and physical systems with emergent collective computational abili(cid:173)\n\nties\", Proc. Nat. Acad. Sci. USA, Vol. 79 (1982), pp. 2554-2558. \n\n[2]  Abu-Mostafa Y.S. and Jacques J.  St, \"lnfonnation capacity of the Hopfield model\", IEEE Trans. on \n\nInfo. Theory, Vol. IT-31  (1985. ppA61-464. \n\n[3]  Hopfield J. J., \"Neurons with graded response have collective computational properties like those of \n\ntwo state neurons\", Proc. Nat. Acad. Sci. USA, Vol. 81  (1984). \n\n[4] \n\nFleisher M., \"Fast processing of autoregressive signals by a neural network\", to be presented at IEEE \n\nConference, Israel 1987. \n\n[5]  Levin, E., Private communication. \n\n[6] \n\nPettov, \"Sums of independent random  variables\". \n\n\f", "award": [], "sourceid": 60, "authors": [{"given_name": "Michael", "family_name": "Fleisher", "institution": null}]}