{"title": "Modern Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 253, "page_last": 259, "abstract": "", "full_text": "Modern Analytic Techniques to Solve the \nDynamics of Recurrent  Neural Networks \n\nA.C.C.  Coolen \n\nDept.  of Mathematics \nKing's College London \n\nS.N. Laughton \n\nStrand, London  WC2R 2LS,  U.K. \n\n1 Keble  Road,  Oxford  OX1  3NP,  U.K. \n\nDept.  of Physics - Theoretical Physics \n\nUniversity of Oxford \n\nD.  Sherrington .. \n\nCenter for  Non-linear  Studies \n\nLos  Alamos  National Laboratory \nLos  Alamos,  New  Mexico 87545 \n\nAbstract \n\nWe  describe the use of modern analytical techniques in  solving the \ndynamics  of  symmetric  and  nonsymmetric  recurrent  neural  net(cid:173)\nworks  near saturation.  These explicitly take into account the cor(cid:173)\nrelations between  the  post-synaptic  potentials,  and  thereby  allow \nfor  a  reliable prediction of transients. \n\n1 \n\nINTRODUCTION \n\nRecurrent  neural  networks  have  been  rather  popular  in  the  physics  community, \nbecause  they lend  themselves so  naturally  to  analysis with  tools  from  equilibrium \nstatistical  mechanics.  This  was  the  main  theme  of physicists  between,  say,  1985 \nand 1990.  Less  familiar  to the neural network community is  a  subsequent  wave of \ntheoretical physical studies,  dealing with  the dynamics of symmetric and nonsym(cid:173)\nmetric  recurrent  networks.  The  strategy  here  is  to  try  to  describe  the  processes \nat a  reduced  level  of an appropriate small set of dynamic macroscopic observables. \nAt  first,  progress  was  made  in  solving  the  dynamics  of extremely  diluted  models \n(Derrida  et  al,  1987)  and  of fully  connected  models  away  from  saturation  (for  a \nreview  see  (Coolen  and  Sherrington,  1993)).  This  paper  is  concerned  with  more \nrecent  approaches,  which  take  the  form  of  dynamical  replica  theories,  that  allow \nfor  a  reliable prediction of transients, even near saturation.  Transients provide the \nlink  between  initial  states  and  final  states  (equilibrium  calculations  only  provide \n\n\u00b7On leave from  Department of Physics - Theoretical  Physics, University of Oxford \n\n\f254 \n\nA. C. C. COOLEN, S.  N. LAUGHTON, D. SHERRINGTON \n\ninformation  on  the  possible  final  states).  In  view  of  the  technical  nature  of the \nsubject,  we  will describe only basic ideas and results for  simple models  (full  details \nand applications to more complicated models can be found  elsewhere). \n\n2  RECURRENT NETWORKS  NEAR SATURATION \n\nLet  us  consider  networks  of N  binary  neurons  ai  E {-I, I},  where  neuron  states \nare  updated  sequentially  and  stochastically,  driven  by  the  values  of post-synaptic \npotentials hi .  The probability to find the system at time t in state 0'  =  (a1,' .. , aN) \nis  denoted  by  Pt(O').  For  the  rates Wi(O')  of the  transitions  ai  -t -(7i  and for  the \npotentials hi (0')  we  make the usual choice \n\nWi (0')  =  - [1-ai tanh [,Bhi (0')]] \n\n1 \n2 \n\nhi(O')  = L Jijaj \n\nj:f:i \n\nThe parameter ,B  controls the degree of stochasticity:  the ,B  =  0 dynamics  is  com(cid:173)\npletely random, whereas for ,B  =  00  we find  the deterministic rule ai  -t  sgn[hi(O')]. \nThe evolution in  time of Pt(O')  is  given by the master equation \ndtPt (0')  = l: [Pt (FkO' )Wk (FkO')  - Pt (0' )Wk (0')] \n\n(1) \n\nd \n\nN \n\nk=l \n\nwith  Fk<P(O')  =  <P(a1 , ... ,-(7k, ... ,aN)'  For  symmetric  models,  where  Jij  =  Jji \nfor  all  (ij),  the  dynamics  (1)  leads  asymptotically  to  the  Boltzmann  equilibrium \ndistribution Peq(O')  '\" exp [-,BE(O')],  with the energy  E(O')  =  - Li<j adijaj. \n\nFor associative memory models with Hebbian-type synapses, required to store a set \nof P random binary patterns e/.1  =  (\u20aci, \n.. . , \u20ac~ ), the relevant macroscopic observable \nis  the  overlap  m  between  the  current  microscopic  state  0'  and  the  pattern  to  be \nretrieved  (say,  pattern  1):  m  =  -Iv  Li \u20aclai.  Each  post-synaptic potential  can now \nbe written as the sum of a  simple signal term and an interference-noise term, e.g. \n\n1  p=o:N \n\nJij  =  N  L  \u20acf\u20acj \n\n/.1=1 \n\nhi(O')  =  m\u20acl  + ~ l: \u20acf  l: \u20acjaj \n\n/.1>1 \n\nj:f: i \n\n(2) \n\nAll  complications arise from  the noise terms. \nThe  'Local  Chaos  Hypothesis'  (LCH)  consists  of assuming  the  noise  terms  to  be \nindependently  distributed  Gaussian  variables.  The  macroscopic  description  then \nconsists  of the  overlap  m  and  the  width  ~ of the  noise  distribution  (Amari  and \nMaginu,  1987).  This,  however,  works  only  for  states  near the  nominated  pattern, \nsee  also  (Nishimori  and  Ozeki,  1993).  In  reality  the  noise  components  in  the  po(cid:173)\ntentials have far  more complicated statisticsl .  Due  to the build  up  of correlations \nbetween  the  system  state  and  the  non-nominated  patterns,  the  noise  components \ncan be highly correlated and described by bi-modal distributions.  Another approach \ninvolves a description in terms of correlation- and response functions (with two time(cid:173)\narguments).  Here one builds a generating functional, which is a sum over all possible \ntrajectories in state space,  averaged over the distribution of the non-nominated pat(cid:173)\nterns.  One finds  equations  which  are  exact for  N  -t  00 ,  but,  unfortunately,  also \nrather  complicated.  For  the  typical  neural  network  models  solutions  are  known \nonly  in  equilibrium  (Rieger  et aI,  1988);  information on transients  has so  far  only \nbeen  obtained  through  cumbersome  approximation schemes  (Horner  et  aI,  1989). \nWe  now  turn  to  a  theory  that  takes  into  account  the  non-trivial  statistics of the \npost-synaptic potentials, yet involves observables with one time-argument only. \n\nlCorrelations are negligible only in extremely diluted (asymmetric)  networks  (Derrida \n\net aI , 1987) , and in  networks with independently drawn  (asymmetric)  random synapses \n\n\fModem Analytic Techniques to Solve the  Dynamics of Recurrent Neural Networks \n\n255 \n\n3  DYNAMICAL REPLICA THEORIES \n\nThe evolution of macroscopic observables  n( 0')  =  (0 1 (0'), ... , OK (0'))  can be  de(cid:173)\nscribed by the so-called Kramers-Moyal expansion for the corresponding probability \ndistribution pt(n) (derived directly from  (1)).  Under certain conditions on the sen(cid:173)\nsitivity of n  to single-neuron  transitions  (7i  -t -1J'i,  one  finds  on  finite  time-scales \nand for  N  -t 00  the macroscopic state n to evolve deterministically according to: \n(3) \n\n~n = EO' pt(O')8 [n-n(O')] Ei Wi(O')  [n(FiO')-n(O')] \ndt \n\nEO' pt(O')8 [n-n(O')] \n\nThis equation depends explicitly on time through Pt(O').  However, there are two nat(cid:173)\nural ways for  (3) to become autonomous:  (i)  by the term Ei Wi(O')  [n(FiO') -n(O')] \ndepending  on  u  only  through  n(O')  (as  for  attractor  networks  away  from  satura(cid:173)\ntion),  or  (ii)  by  (1)  allowing  for  solutions  of the  form  Pt(O')  =  fdn(O')]  (as  for \nextremely diluted networks).  In both cases Pt(O')  drops out of (3).  Simulations fur(cid:173)\nther indicate that for  N  -t 00  the macroscopic  evolution  usually  depends  only  on \nthe statistical properties of the patterns {ell},  not on their microscopic  realisation \n('self-averaging').  This leads us  to the following  closure assumptions: \n\n1.  Probability equipartitioning  in  the  n  subshells  of the ensemble:  Pt(O')  '\" \n8 [nt-n(O')].  If n  indeed obeys closed equations, this assumption is safe. \n2.  Self-averaging of the n  flow  with  resfect to the  microscopic  details of the \nnon-nominated patterns:  tt n  -t  (dt n)patt. \n\nOur equations  (3)  are hereby transformed into the  closed set: \n\n~n _  (EO' 8 [n-n(O')] Ei Wi(O')  [n(FiO') - n(O')]) \ndt \n\nEO' 8 [n-n(O')] \n\n-\n\npatt \n\nThe final  observation is  that the tool for  averaging fractions is  replica theory: \n\ndd n  =  lim \nt \n\nlim  ~ (~Wi(O'l) [n(FiO'1)-n(O' 1)]  rrn 8[n-n(O'O )])patt  (4) \n\nn--tO N --too  ~ ~ \n\nO'I \u00b7\u00b7\u00b7O' n \n\ni \n\n0=1 \n\nThe choice to be made for  the observables n(O'), crucial for the closure assumptions \nto make sense,  is  constrained by  requiring the theory to be exact in specific  limits: \n\nexactness for  a  -t 0 :  n = (m, ... ) \nexactness for  t  -t 00:  n = (E, ... ) \n\n(for symmetric models only) \n\n4  SIMPLE VERSION  OF  THE THEORY \n\nFor the Hopfield model (2) the simplest two-parameter theory which is exact for a  -t \no and for  t  -t 00  is  consequently obtained by  choosing  n  = (m,E).  Equivalently \nwe  can choose  n = (m,r), where r(O')  measures the  'interference energy': \n\nm = ~ L~I(7i \n\ni \n\nThe result of working out  (4)  for  n =  (m, r)  is: \n\n!m = J dz  Dm,r[z] tanh,B (m+z) - m \n\"2 dt r =;  dz  Dm,r[z]z tanh,B (m+z) + 1 - r \n1  d \n\n1 J \n\n\f256 \n\nA. C. C.  COOLEN, S. N.  LAUGHTON, D. SHERRINGTON \n\n15  ~----------------------------~ \n\nr \n\n/ \n\n/ \n\n/ \n\n/ \n\n/ \nI \n\no L -____________________________  ~ \n\no \n\nm \n\n1 \n\nFigure  1:  Simulations (N =  32000,  dots)  versus simple  RS  theory  (solid  lines),  for \na  =  0.1  and  j3  =  00.  Upper  dashed  line:  upper  boundary of the  physical  region. \nLower dashed line:  upper boundary of the RS  region  (the AT  instability). \n\nin  which  Dm,r[z]  is  the  distribution of 'interference-noise'  terms  in  the  PSP's,  for \nwhich the replica calculation gives the outcome  (in  so-called RS  ansatz): \n\n2  27rar \n\napr \n\napr \n\nDm,r[z]  =  e-~2 {l-jDY tanh  [>.y  [~] t+(~+Z)-~+{tl} \n+ e-~)2 {1-jDY  tanh [>.y  [~] t +(~-Z)~-{tl} \nthe remaining parameters {q, {t, p}  to be solved from  the coupled equations: \nwith  Dy = [27rj-t e- h2 dy,  ~ = apr->.2jp and>' = pyaq[l-p(l-q)]-l, and  with \n1-p(1-q)2 \nr = [1-p(1-q)]2 \n\nq =  Dy  tanh2 [>.y+{t] \n\n2  27rar \n\nm =  Dy  tanh[>'y+{tj \n\nj \n\napr \n\napr \n\nj \n\nHere  we  only  give  (partly  new)  results  of the  calculation;  details  can  be  found \nin  (Coolen  and  Sherrington,  1994).  The  noise  distribution  is  not  Gaussian  (in \nagreement with simulations, in contrast to LCH). Our simple two-parameter theory \nis  found  to be  exact  for  t  '\" 0,  t  -7  00  and  for  a  -7  O.  Solving  numerically  the \ndynamic  equations  leads  to  the  results  shown  in  figures  1  and  2.  We  find  a  nice \nagreement  with  numerical  simulations  in  terms  of  the  flow  in  the  (m, r)  plane. \nHowever,  for  trajectories  leading  away  from  the  recall  state  m  '\"  1,  the  theory \nfails  to reproduce an overall slowing down.  These deviations can  be quantified  by \ncomparing cumulants of the noise  distributions  (Ozeki and  Nishimori,  1994), or by \napplying  the  theory  to exactly  solvable  models  (Coolen  and  Franz,  1994).  Other \nrecent  applications  include  spin-glass  models  (Coolen  and  Sherrington,  1994)  and \nmore  general  classes  of  attractor  neural  network  models  (Laughton  and  Coolen, \n1995).  The simple two-parameter theory always predicts adequately the location of \nthe transients in the order parameter plane, but overestimates the relaxation speed. \nIn  fact,  figure  2  shows  a  remarkable  resemblance  to  the  results  obtained  for  this \nmodel  in  (Horner et al,  1989)  with  the functional  integral formalism;  the graphs of \nm(t) are almost identical, but here they are derived in  a  much  simpler way. \n\n\fModem Analytic Techniques to  Solve the Dynamics of Recurrent Neural Networks \n\n257 \n\n1 \n\n.8 \n\n2 \u00b76 \n~ \n\n.4 \n\n.2 \n\n10 \n\n--..., \n..., \n'-' \n!--\n\n5 \n\n--\n\n..... \n\n..... .....  ..... \n\n.... \n\n..... .... .... .... \n--\n\n.... ........ \n-----\n\n0 \n\n0 \n\n2 \n\n4 \n\n6 \n\nt \n\nB \n\n10 \n\n0 \n\n0 \n\n2 \n\n4 \n\n6 \n\nt \n\nB \n\n10 \n\nFigure 2:  Simulations  (N =  32000, dots)  versus simple RS  theory (RS  stable:  solid \nlines,  RS unstable:  dashed lines),  now as functions of time, for  Q;  = 0.1  and f3  = 00. \n\n5  ADVANCED VERSION  OF  THE THEORY \nImproving upon the simple theory means expanding the set n beyond n =  (m,E). \nAdding a finite  number of observables will  only  have a  minor impact; a  qualitative \nstep forward, on the other hand, results from introducing a dynamic order parameter \nfunction.  Since  the  microscopic  dynamics  (1)  is  formulated  entirely  in  terms  of \nneuron states and post-synaptic potentials we choose for n (u) the joint distribution: \n\nD[(, h](u) =  N L <5  [( -O\"i] <5  [h-hi(U)] \n\n1 \n\ni \n\nThis choice has the advantages that (a)  both m  and  (for symmetric systems) E  are \nintegrals over D[(, h],  so the advanced  theory  automatically  inherits the exactness \nat t  =  0  and t  =  00  of the simple one,  (b)  it applies equally well  to symmetric and \nnonsymmetric models  and  (c)  as  with  the simple version,  generalisation to models \nwith  continuous  neural  variables  is  straightforward.  Here  we  show  the  result  of \napplying the theory to a  model of the type  (1)  with synaptic interactions: \n\nJij =  ~ ~i~j + .iN [cos(~ )Xij +sin(~ )Yij ] \n\nXij  =  Xji,  Yij  = -Yji  (independent random Gaussian variables) \n\n(describing  a  nominated  pattern  being stored  on  a  'messy'  synaptic  background). \nThe parameter w controls the degree of synaptic symmetry (e.g.  w =  0:  symmetric, \nw  =  7r:  anti-symmetric) .  Equation (4)  applied  to the observable D[(, h](u) gives: \n\n8 \nmDt[C h] =  J2[1-(O\"tanh(f3H))Dt] 8h2Dt [(,h] + 8h A [( , h;Dt] \n\n~ \n\n8 \n\n+ :h {DdCh] [h-Jo(tanh(f3H ))Dt]} \n\n1 \n\n+2 [l+(tanh(f3h)] Dd--(, h]  - 2 [l-(tanh(f3h)] DdC h] \n\n1 \n\n\f258 \n\nA.  C. C.  COOLEN, S.  N.  LAUGHfON, D. SHERRINGTON \n\no .------,------.------.------.------.------~ \n\nE \n\n- .2 \n\n-.4 \n\n-.6 \n\n- .8 \n\n\"(cid:173)\n\n\"-\n\n'~ \n\n~------- -- - --\n\n_ 1 L-____ -L  ______  L -____ ~ ______ ~ ____ ~ ______ ~ \n6 \n\n4 \n\no \n\n2 \n\nt \n\nFigure 3:  Comparison of simulations  (N =  8000, solid  line),  simple two-parameter \ntheory  (RS  stable:  dotted  line,  RS  unstable:  dashed  line)  and  advanced  theory \n(solid  line) ,  for  the  w  = a (symmetric  background)  model,  with  Jo = 0,  f3  = 00. \nNote that the two solid  lines  are almost on top of each other at the scale shown. \n\n\". \n\n0.5 \n\n0 .0 \n\n-0.5 \n\nE \n\n-0.5 \n\no \n\n2 \n\n4 \n\n6 \n\no \n\n2 \n\n4 \n\n6 \n\nt \n\nt \n\nFigure 4:  Advanced theory versus N  = 5600 simulations in the w = ~7r (asymmetric \nbackground) model,  with f3  =  00 and  J  =  1.  Solid:  simulations; dotted:  solving the \nRS  diffusion  equation. \n\n\fModem Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks \n\n259 \n\nwith  (f(a,H))D  =  L:\". JdH D[a,H]J(a, H).  All  complications are concentrated in \nthe kernel A[C h; DJ,  which is to be solved from a nontrivial set of equations emerg(cid:173)\ning from  the replica formalism.  Some results of solving these equations numerically \nare shown in figures 3 and 4 (for details of the calculations and more elaborate com(cid:173)\nparisons  with  simulations  we  refer  to  (Laughton,  Coolen  and  Sherrington,  1995; \nCoolen,  Laughton  and  Sherrington,  1995)).  It is  clear  that  the  advanced  theory \nquite convincingly describes the transients of the simulation experiments, including \nthe hitherto unexplained  slowing down,  for  symmetric and nonsymmetric models. \n\n6  DISCUSSION \n\nIn this paper we  have  described novel  techniques  for  studying the  dynamics of re(cid:173)\ncurrent neural networks near saturation.  The simplest two-parameter theory (exact \nfor  t  =  0,  for  t  --+  00  and for  0:  --+  0) , which  employs as dynamic order parameters \nthe overlap with a  pattern to be recalled and the total 'energy'  per neuron, already \ndescribes  quite  accurately  the  location  of  the  transients  in  the  order  parameter \nplane.  The  price  paid  for  simplicity  is  that  it overestimates the relaxation  speed. \nA  more  advanced  version of the theory,  which  describes  the  evolution  of the joint \ndistribution for  neuron states and post-synaptic potentials, is mathematically more \ninvolved,  but  predicts  the  dynamical  data  essentially  perfectly,  as  far  as  present \napplications allow  us  conclude.  Whether this  latter version is  either exact,  or just \na  very good approximation, still remains to be seen. \nIn  this  paper we  have  restricted ourselves  to models  with  binary neural variables, \nfor  reasons of simplicity.  The theories  generalise  in  a  natural way  to models  with \nanalogue  neurons  (here,  however, already  the simple version  will  generally involve \norder  parameter  functions  as  opposed  to  a  finite  number  of  order  parameters). \nOngoing work along these lines includes,  for  instance, the analysis of analogue and \nspherical attractor networks and networks of coupled oscillators near saturation. \n\nReferences \n\nB.  Derrida, E.  Gardner and  A.  Zippelius  (1987),  Europhys.  Lett.  4:  167-173 \nA.C .C. Coolen and  D.  Sherrington  (1993),  in  J.G.  Taylor  (ed.),  Mathematical  Ap(cid:173)\nproaches  to  Neural  Networks,  293-305.  Amsterdam:  Elsevier. \nS.  Amari and K.  Maginu  (1988),  Neural  Networks  1:  63-73 \n\nH.  Nishimori and T.  Ozeki  (1993),  J.  Phys.  A  26:  859-871 \nH.  Rieger, M.  Schreckenberg and J.  Zittartz (1988),  Z.  Phys.  B  72:  523-533 \nH.  Horner,  D.  Bormann, M.  Frick,  H. Kinzelbach  and A.  Schmidt  (1989),  Z. Phys. \nB 76:  381-398 \nA.C.C.  Coolen and D.  Sherrington  (1994),  Phys.  Rev.  E  49(3):  1921-1934 \nH.  Nishimori and T.  Ozeki  (1994),  J . Phys.  A  27:  7061-7068 \n\nA.C.C.  Coolen and S.  Franz  (1994),  J.  Phys.  A  27:  6947-9954 \n\nA.C.C.  Coolen and D. Sherrington (1994),  J.  Phys.  A  27:  7687-7707 \n\nS.N.  Laughton and A.C.C. Coolen  (1995),  Phys.  Rev.  E  51:  2581-2599 \n\nS.N.  Laughton,  A.C.C.  Coolen and D.  Sherrington (1995),  J.  Phys.  A  (in  press) \nA.C.C.  Coolen,  S.N . Laughton and D. Sherrington (1995),  Phys.  Rev.  B  (in  press) \n\n\f", "award": [], "sourceid": 1049, "authors": [{"given_name": "A.C.C.", "family_name": "Coolen", "institution": null}, {"given_name": "S.", "family_name": "Laughton", "institution": null}, {"given_name": "D.", "family_name": "Sherrington", "institution": null}]}