{"title": "Minimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 329, "page_last": 335, "abstract": null, "full_text": "Minimax and  Hamiltonian  Dynamics of \n\nExcitatory-Inhibitory Networks \n\nH.  S.  Seung, T.  J.  Richardson \n\nBell Labs,  Lucent Technologies \n\nMurray Hill,  NJ 07974 \n\n{seungltjr}~bell-labs.com \n\nJ.  C.  Lagarias \n\nAT&T  Labs-Research \n180 Park Ave.  D-130 \n\nFlorham Park, NJ 07932 \njcl~research.att.com \n\nJ. J. Hopfield \n\nDept.  of Molecular Biology \n\nPrinceton University \nPrinceton, N J  08544 \n\njhopfield~vatson.princeton.edu \n\nAbstract \n\nA  Lyapunov function  for  excitatory-inhibitory networks  is  constructed. \nThe construction assumes symmetric interactions within excitatory and \ninhibitory  populations  of  neurons,  and  antisymmetric  interactions  be(cid:173)\ntween  populations.  The  Lyapunov function  yields  sufficient  conditions \nfor  the  global  asymptotic  stability  of fixed  points.  If these  conditions \nare  violated,  limit  cycles  may be stable.  The relations  of the Lyapunov \nfunction  to optimization theory and classical  mechanics are revealed  by \nminimax and dissipative Hamiltonian forms  of the network  dynamics. \n\nThe dynamics of a neural network with symmetric interactions provably converges to \nfixed  points under very general assumptions[l, 2].  This mathematical result helped \nto establish the paradigm of neural computation with fixed  point attractors[3].  But \nin reality, interactions between  neurons in the brain are asymmetric.  Furthermore, \nthe dynamical behaviors seen in the brain are not confined to fixed point attractors, \nbut also include oscillations and complex nonperiodic behavior.  These other types \nof dynamics can be realized by  asymmetric networks, and may be useful for  neural \ncomputation.  For  these reasons, it is  important to understand the global behavior \nof asymmetric neural networks. \nThe interaction  between  an  excitatory neuron  and  an  inhibitory  neuron  is  clearly \nasymmetric.  Here we  consider a  class of networks that incorporates this fundamen(cid:173)\ntal  asymmetry  of the  brain's  microcircuitry.  Networks  of this  class  have  distinct \npopulations of excitatory and inhibitory neurons,  with  antisymmetric  interactions \n\n\f330 \n\nH.  S.  Seung, T.  1.  Richardson, J.  C.  Lagarias and 1.  1. Hopfield \n\nbetween populations and symmetric interactions within each population.  Such net(cid:173)\nworks display  a  rich repertoire of dynamical behaviors including fixed  points, limit \ncycles[4,  5]  and traveling waves[6]. \nAfter defining the class of excitatory-inhibitory networks, we  introduce a Lyapunov \nfunction  that  establishes  sufficient  conditions  for  the  global  asymptotic  stability \nof  fixed  points.  The  generality  of these  conditions  contrasts  with  the  restricted \nnature of previous convergence results,  which applied only to linear networks[5]' or \nto nonlinear networks with infinitely  fast inhibition[7]. \n\nThe use of the Lyapunov function is illustrated with a competitive or winner-take-all \nnetwork, which consists of an excitatory population of neurons with recurrent inhi(cid:173)\nbition from  a  single neuron[8].  For this network, the sufficient conditions for  global \nstability  of fixed  points  also  happen  to  be  necessary  conditions.  In other  words, \nwe  have proved global stability over the largest possible parameter regime in which \nit holds,  demonstrating the power of the Lyapunov function.  There exists  another \nparameter regime in which numerical simulations display limit cycle oscillations[7]. \n\nSimilar convergence proofs for other excitatory-inhibitory networks may be obtained \nby tedious but straightforward calculations.  All the necessary tools are given in the \nfirst  half of the paper.  But the rest of the paper explains what makes the Lyapunov \nfunction  especially interesting,  beyond the convergence results  it yields:  its  role  in \na  conceptual framework that relates excitatory-inhibitory networks to optimization \ntheory and classical mechanics. \n\nThe  connection  between  neural  networks  and  optimization[3]  was  established  by \nproofs that symmetric networks could find  minima of objective functions[l, 2].  Later \nit  was  discovered  that  excitatory-inhibitory  networks  could  perform  the  minimax \ncomputation of finding  saddle points[9,  10,  11],  though no general proof of this was \ngiven at the time.  Our Lyapunov function finally  supplies such a  proof,  and one of \nits components is  the objective function of the network's minimax computation. \n\nOur Lyapunov function can also be obtained by writing the dynamics of excitatory(cid:173)\ninhibitory  networks  in Hamiltonian form,  with  extra velocity-dependent  terms.  If \nthese  extra terms  are  dissipative,  then  the  energy  of the system  is  nonincreasing, \nand  is  a  Lyapunov  function.  If the  extra terms  are  not  purely  dissipative,  limit \ncycles  are  possible.  Previous  Hamiltonian  formalisms  for  neural  networks  made \nthe  more  restrictive assumption of purely  antisymmetric interactions,  and did  not \ninclude the effect of dissipation[12]. \n\nThis  paper establishes  sufficient  conditions  for  global  asymptotic  stability of fixed \npoints.  The  problem  of  finding  sufficient  conditions  for  oscillatory  and  chaotic \nbehavior  remains  open.  The  perspectives  of minimax  and Hamiltonian  dynamics \nmay help in this task. \n\n1  EXCITATORY-INHIBITORY NETWORKS \n\nThe dynamics of an excitatory-inhibitory network is  defined  by \n\nf(u+Ax-By) , \nTxX+X \nTyY+y  =  g(v+BTx-Cy). \n\n(1) \n(2) \n\nThe state variables are contained in two vectors x  E  Rm and y  E  Rn, which represent \nthe activities of the excitatory and inhibitory neurons,  respectively. \nThe  symbol  f  is  used  in  both  scalar  and  vector  contexts.  The  scalar  function \nf  : R  ~ R  is  monotonic  nondecreasing.  The  vector  function  f  : Rm  ~ Rm  is \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n331 \n\ndefined  by applying the scalar function 1 to each component of a  vector argument, \ni.e.,  l(x) =  (J(xt) , ... ,1(xm)).  The symbol 9  is  used similarly. \n\nThe symmetry of interaction within each population is  imposed  by  the constraints \nA  =  AT  and  C  =  CT.  The antisymmetry  of interaction  between  populations  is \nmanifest in the occurrence of - B  and BT in the equations.  The terms  \"excitatory\" \nand  \"inhibitory\"  are appropriate with the additional constraint that the entries of \nmatrices  A,  B,  and  C  are  nonnegative.  Though  this  assumption  makes  sense  in \na  neurobiological  context  the  mathematics  does  not  depends  on it.  The  constant \nvectors  u  and  v  represent  tonic  input  from  external sources,  or  alternatively  bias \nintrinsic to the neurons. \nThe time  constants  Tz  and Ty  set  the speed of excitatory  and inhibitory synapses, \nIn  the  limit  of  infinitely  fast  inhibition,  Ty  =  0,  the  convergence \nrespectively. \ntheorems for  symmetric networks are applicable[l, 2],  though some effort is required \nin  applying  them  to  the  case  C  =/;  0.  If the  dynamics  converges  for  Ty  =  0,  then \nthere exists some neighborhood of zero in which it still converges[7].  Our Lyapunov \nfunction  goes  further, as  it is  valid for  more general T y \u2022 \n\nThe potential for  oscillatory behavior in excitatory-inhibitory networks like  (1)  has \nlong been  known[4,  7].  The origin of oscillations can  be understood from  a  simple \ntwo neuron model.  Suppose that neuron 1 excites neuron 2,  and receives inhibition \nback  from  neuron  2.  Then the  effect  is  that neuron  1 suppresses  its  own  activity \nwith an effective delay that depends on the time constant of inhibition.  If this delay \nis  long  enough,  oscillations  result.  However,  these  oscillations  will  die  down  to a \nfixed  point, as the inhibition tends to dampen activity in the circuit.  Only if neuron \n1 also excites  itself can the oscillations become sustained. \n\nTherefore,  whether oscillations  are  damped  or sustained  depends  on  the  choice of \nparameters.  In this paper we establish sufficient conditions for the global stability of \nfixed  points in  (1).  The violation of these sufficient  conditions indicates parameter \nregimes  in  which  there  may  be  other  types  of asymptotic  behavior,  such  as  limit \ncycles. \n\n2  LYAPUNOV FUNCTION \n\nWe  will  assume that 1 and 9  are smooth and that their inverses 1-1  and g-1  exist. \nIf the function 1 is  bounded above and/or below, then its inverse 1-1  is  defined on \nthe appropriate  subinterval  of R.  Note  that the set  of (x, y)  lying in  the range of \n(J,g)  is  a  positive invariant set under  (1)  and that its closure is  a  global attractor \nfor  the system. \nThe scalar function  F  is  defined  as  the antiderivative of 1, and P as  the Legendre \nmaxp{px - F(p)}.  The  derivatives  of these  conjugate  convex \ntransform  P(x) \nfunctions  are, \n\nF'(x)  =  l(x)  , \n\n(3) \n\nThe vector versions of these functions are defined componentwise, as in the definition \nof the vector version of 1.  The conjugate convex pair G, (; is  defined similarly. \nThe  Lyapunov  function  requires  generalizations  of  the  standard  kinetic  energies \nTzx2/2  and Tyy2/2.  These are constructed using the functions  ~ : Rm  x  Rm  ~ R \nand r : Rn x  Rn  ~ R, defined by \n= \n\n~(p,x) \nr(q,y) \n\nITF(p) -xTp+lTP(x) , \nITG(q)  _yTq+ IT(;(y)  . \n\n(4) \n(5) \n\n\f332 \n\nH.  S.  Seung,  T.  1. Richardson, J.  C.  Lagarias and J.  J.  Hopfield \n\nThe  components  of the  vector  1  are  all  ones;  its  dimensionality  should  be  clear \nfrom  context.  The  function  ~(p, x)  is  lower  bounded  by  zero,  and  vanishes  on \nthe  manifold  I(p)  =  x,  by  the  definition  of the  Legendre  transform.  Setting p  = \nU + Ax - By, we obtain the generalized kinetic energy T;l~(u + Ax - By, x), which \nvanishes  when  x  =  0  and is  positive otherwise.  It reduces  to T;xx 2 /2 in  the special \ncase where I  is  the identity function. \nTo construct the Lyapunov function,  a  multiple of the saddle function \n\nS  =  _uT x  - !xT Ax + vT Y - !yTCy + ITP(x) + yTBT x  - ITG(y) \n\n(6) \n\n2 \n\n2 \n\nis  added to the kinetic energy.  The reason for  the name  \"saddle function\"  will  be \nexplained later.  Then \n\nL  =  T;l~(U + Ax - By,x) + T;lr(v + BT x  - Cy, y) + rS \n\n(7) \nis a Lyapunov function provided that it is lower bounded, nonincreasing, and t  only \nvanishes at fixed points of the dynamics.  Roughly speaking, this is  enough to prove \nthe global  asymptotic stability of fixed  points,  although  some  additional technical \ndetails  may  be involved. \n\nIn the next section, the Lyapunov function  will  be applied to an example network, \nyielding  sufficient  conditions  for  the  global  asymptotic  stability  of  fixed  points. \nIn  this  particular  network,  the  sufficient  conditions  also  happen  to  be  necessary \nconditions.  Therefore  the  Lyapunov  function  succeeds  in  delineating  the  largest \npossible parameter regime in  which  point attractors are globally stable.  Of course, \nthere is  no guarantee of this  in  general,  but the power of the Lyapunov function  is \nmanifest in this instance. \nBefore proceeding to the example network, we pause to state some general conditions \nfor  L  to  be  nonincreasing.  A  lengthy  but  straightforward  calculation  shows  that \nthe time derivative of L  is  given by \n\nt  =  xT Ax - iJTCiJ \n\n_(T;l + r)j;T(J-l (T;xX + x)  - I-I (x)J \n_(T;l - r)iJT[g-l(TyiJ + y)  - g-l(y)J  . \n\nTherefore, L  is  nonincreasing provided that \n(a-b)TA(a-b) \n\nmax ( \na,b  a - b) \n\nT [ \n\nI-l(a) - I-l(b)] \n\n(a  - b)TC(a - b) \n\n. \nmm \na,b  (a  - b)  g-l(a) - g-l(b)] \n\nT[ \n\n<  1 + rT z  , \n\n>  1 - rTy  . \n\n(8) \n\n(9) \n\n(10) \n\nThe quotients in these inequalities are generalizations of the Rayleigh-Ritz ratios of \nA and C.  If I  and 9  were linear,  the left  hand sides of these inequalities would  be \nequal to the maximum eigenvalue of A and the minimum eigenvalue of C. \n\n3  AN EXAMPLE:  COMPETITIVE NETWORK \n\nThe  competitive or  winner-take-all  network  is  a  classic  example  of an  excitatory(cid:173)\ninhibitory  network[8,  7J . \nIts  population  of  excitatory  neurons  Xi  receives  self(cid:173)\nfeedback of strength a  and recurrent feedback from  a  single inhibitory neuron y, \n\nTzii + Xi \n\nI(Ui + aXi  - y)  , \n\nT.Y + y  =  9 ( ~>i) . \n\n(11) \n\n(12) \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n333 \n\nThis is  a special case of (1), with  A = aI, B  =  1, and C = o. \nThe global inhibitory neuron mediates a  competitive interaction between the exci(cid:173)\ntatory neurons.  If the competition is very strong, a single excitatory neuron  \"wins,\" \nshutting off all the rest.  If the competition is weak, more than one excitatory neuron \ncan win,  usually those corresponding to the larger Ui.  Depending on the choice of f \nand g, self-feedback a, and time scales Tx  and Ty, this network exhibits a  variety of \ndynamical  behaviors, including a  single  point  attractor,  multiple  point  attractors, \nand limit  cycles[5,  7]. \nWe  will  consider  the  specific  case where f  and 9  are the  rectification  nonlinearity \n[x]+  ==  max{ x, o}.  The behavior ofthis network will be described in detail elsewhere; \nonly  a  brief summary  is  given  here.  With  either  of two  convenient  choices  for  r, \nr  =  T;1  or  r  =  a  - T;1,  it can  be  shown  that  the  resulting  L  is  bounded  below \nfor  a  < 2 and nonincreasing for  a  < T;1 + T;1.  These are sufficient  conditions for \nthe global stability of fixed  points.  They also  turn out to be  necessary  conditions, \nas  it can be verified that the fixed  points are locally unstable  if the conditions are \nviolated.  The  behaviors  in  the  parameter  regime  defined  by  these  conditions  can \nbe divided into two rough categories.  For  a  < 1,  there is  a  unique  point attractor, \nat which  more than one excitatory neuron  can  be  active,  in  a  soft  form  of winner(cid:173)\ntake-all.  For  a  > 1,  more than one point attractor may exist.  Only one excitatory \nneuron is  active at each of these fixed  points, a  hard form  of winner-take-all. \n\n4  MINIMAX DYNAMICS \n\nIn the field of optimization, gradient descent-ascent is a standard method for finding \nsaddle points of an objective function.  This section of the paper explains  the close \nrelationship between gradient descent-ascent and excitatory-inhibitory networks[9, \n10].  Furthermore, it reviews existing results on the convergence of gradient descent(cid:173)\nascent  to saddle points[13,  10],  which  are the precedents of the convergence proofs \nof this paper. \nThe similarity of excitatory-inhibitory networks  to gradient  descent-ascent can  be \nseen by comparing the partial derivatives of the saddle function  (6)  to the velocities \nx and ii, \n\nas \n- ax \nas \nay \n\n(13) \n\n(14) \n\nThe notation a  '\" b means that the vectors a and b have the same signs, component \nby component.  Because f  and 9  are monotonic nondecreasing functions, x has the \nsame signs  as  -as/ax, while  iJ  has the same signs  as as/ay.  In  other words,  the \ndynamics of the excitatory neurons tends to minimize S, while that of the inhibitory \nneurons tends to maximize  S. \nIf the  sign  relation\",  is  replaced  by  equality  in  (13),  we  obtain  a  true  gradient \ndescent-ascent dynamics, \n\n. \n\nTxX  =  - ax  ' \n\nas \n\n.  as \n\nTyy  =  ay  . \n\n( 5) \n1 \n\nSufficient  conditions  for  convergence  of  gradient  descent-ascent  to  saddle  points \nare known[13,  10].  The conditions  can  be  derived using a  Lyapunov function  con(cid:173)\nstructed from  the kinetic energy and the saddle function, \nL =  ~Txlxl2 + ~Tylill2 + rS . \n\n(16) \n\n\f334 \n\nH. S.  Seung, T.  1. Richardson, 1.  C.  Lagarias and 1. 1.  Hopfield \n\nThe time derivative of L  is  given by \n\nL\u00b7 \n\n'T82S. \n\n'T82S . \n\n=  -x  8x2 X + y  8y2 Y - rTxx  + rTyy \n\u00b7 2 \n\n\u00b72\n\n. \n\n(17) \n\nWeak sufficient  conditions can be derived with the choice r  =  0,  so that L  includes \nonly kinetic energy terms.  Then L is obviously lower bounded by zero.  Furthermore, \nL  is  nonincreasing if 8 2 S /8x2 is  positive definite  for  all  y and 8 2 S / 8y2  is  negative \ndefinite for  all x.  In this case, the existence of a  unique saddle point is  guaranteed, \nas  S  is  convex in x  for  all y , and concave in  y  for  all x[13,  10]. \nIf there is  more than one saddle point,  the kinetic  energy by  itself is  generally not \na  Lyapunov function.  This is  because the dynamics  may pass through the vicinity \nof more than one saddle point before it finally  converges, so that the kinetic energy \nbehaves nonmonotonically as a function of time.  In this situation, some appropriate \nnonzero r  must be found. \nThe  Lyapunov  function  (7)  for  excitatory-inhibitory  networks  is  a  generalization \nof the  Lyapunov  function  (16)  for  gradient  descent-ascent.  This  is  analogous  to \nthe  way  in  which  the  Lyapunov  function  for  symmetric  networks  generalizes  the \npotential function  of gradient descent. \n\nIt  should  be  noted  that  gradient  descent-ascent  is  an  unreliable  way  of finding  a \nsaddle  point.  It  is  easy  to  construct  situations  in  which  it  leads  to a  limit  cycle. \nThe unreliability of gradient descent-ascent contrasts with the reliability of gradient \ndescent  at  finding  local  minimum  of  a  potential  function.  Similarly,  symmetric \nnetworks converge to fixed  points, but excitatory-inhibitory networks  can converge \nto limit  cycles  as well. \n\n5  HAMILTONIAN  DYNAMICS \n\nThe  dynamics  of an  excitatory-inhibitory network  can  be  written  in  a  dissipative \nHamiltonian form .  To do this, we  define a phase space that is  double the dimension \nofthe state space, adding momenta (Px,Py)  that are canonically conjugate to (x, y). \nThe phase space dynamics \n\nTxX + X  -\nf(Px)  , \nTyY + y  =  g(py)  , \n\n(r+ :t) (u+Ax-By-px)  =  o , \n(r+ !) (v+BTx-Cy-py)  - o , \n\n(18) \n(19) \n\n(20) \n\n(21) \n\nreduces to the state space dynamics (1)  on the affine space A =  {(Px, PY' x, y)  : Px  = \nu + Ax - By,py  =  v + BTx - Cy}.  Provided that r  > 0,  the  affine  space  A  is  an \nattractive invariant manifold. \nDefining the Hamiltonian \n\nH(px, X'PY' y)  =  T;l~(Px, x) + T;lr(py, y)  + rS(x, y)  , \n\nthe phase space dynamics  (18)  can be written as \n\n8H \n8px  ' \n8H \n8py  , \n\n(22) \n\n(23) \n\n(24) \n\n\fMinimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks \n\n- ~~ + Ax - By - (r;l + r)[pX  - i-leX)]  , \n_ BH + BT x _ Gy  _ (r- l  _  r)r~  _ g-l(y)] \n\ny \n\nlJ'y \n\nPy  = \n\nBy \n\n335 \n\n(25) \n\n(26) \n\n(27) \nOn the invariant manifold A, the Hamiltonian is identical to the Lyapunov function \n(7)  defined  previously. \n\n+2r(v+BT x-Gy-py) . \n\nThe rate of change of the energy is  given by \n\nH \n\n-\n\nxT Ax - (r;l + r)xT[px  - i-lex)] \n-yTGy _ (r;l _ r)yT[py  _ g-l(y)] \n+2ryT(v + BT x - Gy - Py)  . \n\n(28) \n\nThe last term vanishes on the invariant manifold,  leaving a  result  identical to  (8). \nTherefore,  if the  noncanonical  terms  in  the  phase  space  dynamics  (18)  dissipate \nenergy, then the Hamiltonian is  nonincreasing.  It is  also possible that the velocity(cid:173)\ndependent  terms  may  pump  energy  into  the  system,  rather  than  dissipate  it,  in \nwhich case oscillations or chaotic behavior may arise. \n\nAcknowledgments  This  work  was  supported  by  Bell  Laboratories.  We  would \nlike to thank Eric Mjolsness for  useful  discussions. \n\nReferences \n[1]  M.  A.  Cohen  and  S.  Grossberg.  Absolute  stability  of global  pattern formation  and \n\nparallel memory storage by competitive neural networks.  IEEE,  13:815-826,  1983. \n\n[2]  J. J.  Hopfield.  Neurons with graded response have collective computational properties \n\nlike those of two-state neurons.  Proc.  Natl.  Acad.  Sci.  USA,  81:3088-3092,  1984. \n\n[3]  J.  J.  Hopfield and D.  W.  Tank.  Computing with  neural  circuits:  a  model.  Science, \n\n233:625-633,  1986. \n\n[4]  H.  R.  Wilson  and  J . D.  Cowan.  A  mathematical theory  of the functional  dynamics \n\nof cortical  and thalamic nervous tissue.  Kybernetik,  13:55-80,  1973. \n\n[5]  Z.  Li  and J.  J.  Hopfield.  Modeling  the olfactory  bulb and its neural  oscillatory  pro(cid:173)\n\ncessings.  Bioi.  Cybern.,  61:379-392,  1989. \n\n[6]  S.  Amari.  Dynamics of pattern formation in lateral-inhibition type neural fields.  Bioi. \n\nCybern.,  27:77-87,  1977. \n\n[7]  B.  Ermentrout. Complex dynamics in winner-take-all neural nets with slow inhibition. \n\nNeural  Networks,  5:415-431,  1992. \n\n[8}  S. Amari and M.  A.  Arbib.  Competition and cooperation in neural nets.  In J.  Metzler, \n\neditor,  Systems  Neuroscience,  pages 119-165.  Academic Press,  New York,  1977. \n\n[9}  E. Mjolsness and C. Garrett.  Algebraic transformations of objective functions.  Neural \n\nNetworks,  3:651-669,  1990. \n\n[10}  J.  C.  Platt and A. H.  Barr.  Constrained differential optimization.  In D.  Z.  Anderson, \neditor,  Neural  Information  Processing  Systems,  page 55,  New York,  1987.  American \nIristitute of Physics. \n\n[11]  1.  M.  Elfadel.  Convex potentials and their conjugates  in analog  mean-field optimiza(cid:173)\n\ntion.  Neural  Computation,  7(5):1079-1104,  1995. \n\n[12]  J.  D.  Cowan.  A  statistical  mechanics  of  nervous  activity.  In  Some  mathematical \n\nquestions  in biology,  volume III.  AMS,  1972. \n\n[13]  K. J. Arrow, L. Hurwicz, and H. Uzawa.  Studies in linear and non-linear programming. \n\nStanford University, Stanford,  1958. \n\n\f", "award": [], "sourceid": 1336, "authors": [{"given_name": "H. Sebastian", "family_name": "Seung", "institution": null}, {"given_name": "Tom", "family_name": "Richardson", "institution": null}, {"given_name": "J.", "family_name": "Lagarias", "institution": null}, {"given_name": "John J.", "family_name": "Hopfield", "institution": null}]}