{"title": "On the K-Winners-Take-All Network", "book": "Advances in Neural Information Processing Systems", "page_first": 634, "page_last": 642, "abstract": null, "full_text": "634 \n\nON THE K-WINNERS-TAKE-ALL NETWORK \n\nE.  Majani \n\nJet Propulsion Laboratory \n\nCalifornia Institute of Technology \n\nR.  Erlanson, Y.  Abu-Mostafa \n\nDepartment of Electrical  Engineering \n\nCalifornia Institute of Technology \n\nABSTRACT \n\nWe present  and  rigorously  analyze a generalization of the Winner(cid:173)\nTake-All  Network:  the  K-Winners-Take-All  Network.  This  net(cid:173)\nwork  identifies  the  K  largest  of a  set  of N  real  numbers.  The \nnetwork  model used  is  the continuous Hopfield  model. \n\nI  - INTRODUCTION \n\nThe Winner-Take-All  Network  is  a  network  which  identifies  the  largest  of N  real \nnumbers.  Winner-Take-All  Networks  have  been  developed  using  various  neural \nnetworks models (Grossberg-73, Lippman-87, Feldman-82, Lazzaro-89).  We present \nhere  a  generalization  of the  Winner-Take-All  Network:  the  K-Winners-Take-All \n(KWTA) Network.  The KWTA Network identifies the K  largest of N  real numbers. \nThe neural network model we  use throughout the paper is  the continuous Hopfield \nnetwork  model  (Hopfield-84).  If the states of the  N  nodes are initialized  to the N \nreal numbers, then, if the gain of the sigmoid is large enough, the network converges \nto the state with  K  positive real numbers in  the positions of the nodes with  the K \nlargest  initial states, and  N  - K  negative  real  numbers everywhere else. \nConsider  the  following  example:  N  = 4,  K  = 2.  There  are  6  =  (~)  stable \nstates:(++-_)T, (+_+_)T, (+--+)T, ( __ ++)T, (_+_+)T, and (_++_)T. \nIf the initial  state of the  network  is  (0.3,  -0.4,  0.7,  O.l)T,  then  the network  will \nconverge to (Vi,V2,V3,v4)T  where Vi> 0,  V2  < 0,  V3  > 0,  V4  < 0  ((+ _ +_)T). \nIn Section II, we  define  the KWTA  Network (connection weights, external inputs). \nIn Section III,  we  analyze  the equilibrium states and  in  Section  IV,  we  identify  all \nthe stable equilibrium states of the KWTA Network.  In Section V, we  describe the \ndynamics  of the  KWTA  Network.  In  Section  VI,  we  give  two  important examples \nof the KWTA  Network and comment on an alternate implementation of the KWTA \nNetwork. \n\n\fOn the K-Winners-Take-All Network \n\n635 \n\nII - THE K-WINNERS-TAKE-ALL  NETWORK \n\nThe continuous Hopfield network model (Hopfield-84)  (also known as the Grossberg \nadditive model  (Grossberg-88)), is characterized by a system of first order differen(cid:173)\ntial equations which governs the evolution of the state of the network (i = 1, .. . , N) : \n\nThe  sigmoid  function  g(u)  is  defined  by:  g(u)  = f(G\u00b7  u),  where  G  >  0  is  the \ngain  of  the  sigmoid,  and  f(u)  is  defined  by:  1.  \"f/u,  0  <  f'(u)  <  f'(O)  = 1, \n2.  limu .... +oo  f( u) = 1,  3.  limu .... -oo f( u) = -l. \nThe KWTA Network is characterized by mutually inhibitory interconnections Taj  = \n-1  for  i  \u00a5=  j, a self connection Tai  = a, \n(Ial < 1) and'an external input (identical \nfor  every node) which  depends on the number  K  of winners desired and the size of \nthe network  N  : ti = 2K - N. \nThe differential equations for  the KWTA  Network are therefore: \n\nfor  all  i,  Cd~i = -Aui + (a + l)g(ui) - (tg(u j )  - t) , \n\nJ=l \n\n(1) \n\nwhere  A = N  - 1 +  lal,  -1  <  a <  +1,  and  t  = 2K - N.  Let  us  now  study  the \nequilibrium states of the dynamical system defined  in  (1).  We  already  know  from \nprevious work  (Hopfield-84)  that the network is  guaranteed to converge to a stable \nequilibrium state if the connection  matrix (T)  is  symmetric (and it is  here). \n\nIII - EQUILIBRIUM STATES  OF THE  NETWORK \n\nThe equilibrium states u\u00b7 of the KWTA network are defined by \n\nfor  all  i,  dUi  - 0 \n, \n\ndt  -\n\nI.e., \n\nfor  all  i, \n\ng(u'!') = --u'!' + \n\nA \n\na+1  I \n\nI \n\n(E. g(u~) - (2K - N)) \n\nJ \n\nJ \n\na+1 \n\n\u2022 \n\n(2) \n\nLet  us  now  develop  necessary  conditions for  a  state u\u00b7  to  be an equilibrium state \nof the network. \nTheorem  1:  For  a  given  equilibrium state u\u00b7, every  component ui  of u\u00b7  can  be \none of at most  three distinct values. \n\n\f636 \n\nMajani, Erlanson and Abu-Mostafa \n\nProof of Theorem 1. \n\nIf we look at equation (2),  we see that the last term of the righthandside expression \nis  independent  of i;  let  us  denote  this  term by  H(u*). Therefore,  the components \nut  of the equilibrium state u*  must  be solutions of the equation: \n\ng(ui) = _A_u; + H(u*). \n\na+1 \n\nSince  the  sigmoid  function  g(u)  is  monotone  increasing  and  A/(a + 1)  >  0,  then \nthe sigmoid  and the line  a~l ut + H(u*) intersect in  at least one point and at most \nthree  (see  Figure  1).  Note  that  the constant  H(u*)  can  be  different  for  different \n\u2022 \nequilibrium states u*. \n\nThe following  theorem  shows  that  the sum of the  node  outputs  is  constrained  to \nbeing close to 2K - N,  as  desired. \n\nTheorem 2:  If u*  is  an equilibrium state of (1),  then we  have: \n\n(a+ 1)maxg(ui) < '\" g(uJ~) -2K +N < (a+ 1) min g(ui). \n\n(3) \n\nu'!'>o \n\u2022 \n\nu~<o \n\u2022 \n\nN \n\nL..J \nj=l \n\nProof of Theorem 2. \n\nLet  us  rewrite equation (2)  in  the following  way: \n\nAut = (a + 1)g(ui) - (Eg(uj) - (2K - N)). \n\nj \n\nSince ut and g( un are of the same sign, the term (Lj g( un - (2K - N)) can neither \nbe too large (if ut > 0)  nor too low  (if ui  < 0).  Therefore,  we  must have \n\n{  (Lj g(uj) - (2K - N)) < (a + 1)g(un, \n(Lj g(uj) - (2K - N)) > (a + 1)g(ut), \n\nfor  all ut > 0, \nfor  all ut < 0, \n\nwhich  yields  (3). \n\nTheorem  1 states that the components of an equilibrium state  can only  be one of \nat most three distinct values.  We  will distinguish between two types of equilibrium \nstates, for  the purposes of our analysis:  those which have one or  more components \n\nut  such  that  g'( un >  a~l' which  we  categorize as  type I,  and  those  which  do  not \n\n(type  II).  We  will  show  in  the  next  section  that  for  a  gain  G  large  enough,  no \nequilibrium state of type II is  stable. \n\n\u2022 \n\n\fOn the K-Winners-Take-All Network \n\n637 \n\nIV - ASYMPTOTIC STABILITY OF \n\nEQUILIBRIUM STATES \n\nWe  will  first  derive  a  necessary  condition  for  an  equilibrium  state  of  (1)  to  be \nasymptotically stable.  Then we will find the stable equilibrium states of the KWTA \nNetwork. \nIV-I.  A  NECESSARY  CONDITION  FOR  ASYMPTOTIC \nSTABILITY \nAn important necessary condition for  asymptotic stability is  given  in  the following \ntheorem. \n\nTheorem 3:  Given any asymptotically stable equilibrium state u*, at most one of \nthe components ut  of u*  may satisfy: \n\n'(  *) \n9  u\u00b7  > --. \n\n,  - a+ 1 \n\nA \n\nProof of Theorem 3. \n\nTheorem 3 is  obtained by proving the following  three lemmas. \n\nLemma 1:  Given  any  asymptotically  stable equilibrium state u*,  we  always  have \nfor  all  i and j  such  that i # j  : \n\ng'(u~) + g'(u~)  Ja 2  (g'(un - g'(ujn 2 + 4g'(ung'(uj) \n. \n\n'2  \n\nJ  + \n\n2 \n\nA> a \n\n(4) \n\nProof of Lemma  l. \n\nSystem (1)  can be linearized around any equilibrium state u*  : \n\nd(u ~ u*)  ~ L(u*)(u _  u*),  where L(u*) = T\u00b7 diag (g'(ui), ... ,g'( uN\u00bb  - AI. \n\nA  necessary  and sufficient  condition for  the asymptotic stability of u*  is  for  L(u*) \nto be negative  definite.  A  necessary  condition for  L(u*)  to  be negative  definite  is \nfor  all  2 X  2 matrices  Lij(U*) of the type \n\n* \n\nLij(U  ) = \n\n(ag'(u~)-A \n\n-g,'(ut) \n\n-g'(U~\u00bb) \n' \n\nag'(uj):'- A \n\n(i # j) \n\nto  be  negative  definite.  This  results  from  an  infinitesimal  perturbation of compo(cid:173)\nnents i  and j  only.  Any  matrix Lij (u*)  has two  real eigenvalues.  Since the largest \none has to be negative,  we  obtain: \n\n~ (ag'(ui) - A + ag'(uj) - A + Ja 2  (g'(ut) - g'(ujn 2  + 49'(Ut)g'(Uj\u00bb)  < 0 .\u2022 \n\n\f638 \n\nMajani, Erlan80n and Abu-Mostafa \n\nLemma 2:  Equation  (4)  implies: \n\nmin (g'(u:),g'(u1))  < 2-1 . \na+ \n\n(5) \n\nProof of Lemma 2. \n\nConsider  the function  h of three variables: \n\n,  *  _  g'(u;) + g'(u;)  va2  (g'(u;) - g'(u;))2 + 4g'(u;)g'(uj) \n\n,  \u2022 \n\nh (a,g  (ua),g (Uj))  - a \n\n2 \n\n2 \n\n. \n\n+ \n\nIf we  differentiate h with  respect  to its third variable g'(uj),  we  obtain: \n\n{)h (a, g'(ut) , g'(uj))  = ~ + \n\n{)g'(uj) \n\na2g'(uj) + (2 - a2)g'(ut) \n\n2 2va2  (g'(un-g'(uj))2 +4g'(ung'(uj) \n\nwhich  can  be shown  to be positive if and only  if a > -1. But  since  lal < 1,  then  if \ng'(u;)  < g'(uj)  (without loss of generality),  we  have: \n\nh (a,g'(ui),g'(u1))  > h(a,g'(ui),g'(ui)) = (a+ 1)g'(ut), \n\nwhich,  with  (4),  yields: \n\nwhich yields  Lemma 2. \nLemma 3:  If for  all  i # j, \n\n'(  *) \n9  Us  < --1' \n\nA \na+ \n\nmin (g'(ut),g'(u1))  < 2-1 , \na+ \n\nthen there can be at most one ui  such  that: \n\n\u2022 \n\nA \n\ng'(u~) >  - - .  \n\n- a+ 1 \n\nI \n\nProof of Lemma 3. \n\nLet  us  assume  there exists  a  pair  (ui, uj)  with  i  # j  such  that  g'( ut)  >  0;1  and \ng'(uj) > 0;1'  then  (5)  would  be violated. \nI \n\n\fOn the K-Winners-Take-All Network \n\n639 \n\nIV-2.  STABLE EQUILmRIUM STATES \nFrom Theorem 3, all stable equilibrium states of type I have exactly one component \n,  (at  least  one  and  at  most  one)  such  that  g' ( ,) ~ 0; l' Let  N + be  the  number \nof components  a  with  g'(a)  <  0;1  and  a  >  0,  and  let  N_  be  the  number  of \ncomponents  (3  with  g'(f3)  < 0;1  and  f3  < 0  (note  that  N+ + N_  + 1 =  N).  For \na  large  enough  gain  G,  g(a)  and  g(f3)  can  be  made  arbitrarily  close  to  +1  and \n-1 respectively.  Using  Theorem 2,  and  assuming a  large  enough gain,  we  obtain: \n-1 < N + - K  < O.  N + and K  being integers, there is therefore no stable equilibrium \nstate of type I. \nFor the equilibrium states of type II, we  have for all i,  ut = a(> 0)  or  f3( < 0)  where \ng'(a) < 0~1 and g'(f3)  < 0;1' For  a  large enough gain,  g(a)  and g(f3)  can be made \narbitrarily close to +1 and  -1 respectively.  Using theorem 2 and  assuming a  large \nenough gain,  we  obtain:  -(a + 1) < 2(N+  - K) < (a + 1),  which yields  N+  = K. \nLet  us  now  summarize our results in  the following  theorem: \n\nTheorem  4:  For  a  large  enough  gain,  the  only  possible  asymptotically  stable \nequilibrium states u\u00b7  of (1)  must have  K  components equal  to a  > 0  and  N  - K \ncomponents equal to f3  < 0,  with \n\n{ \n\n(  ) -.....L  + K(g(a)-g(p)-2)+N(1+g(P\u00bb \ng  a \n-\n0+1 a \n...Lf3 + K(g(a)-g(p)-2)+N(1+g(,8\u00bb \ng({3)  -\n-\n\n0+1 \n\n0+1 \n\n0+1 \n\n, \n\n\u2022 \n\n(7) \n\nSince we  are guaranteed to have at least one stable equilibrium state (Hopfield-84), \nand  since  any  state  whose  components  are  a  permutation  of the  components  of a \nstable equilibrium state,  is  clearly  a  stable equilibrium state, then we  have: \n\nTheorem 5:  There exist at least  (~) stable equilibrium states as  defined in Theo(cid:173)\nrem 4.  They correspond to the (~) different states obtained by the N! permutations \nof one stable state with  K  positive components and N  - K  positive components. \n\nv  - THE DYNAMICS  OF THE KWTA  NETWORK \n\nNow  that we  know the characteristics of the stable equilibrium states of the KWTA \nNetwork, we need to show that the KWTA Network will converge to the stable state \nwhich  has  a  > 0 in  the positions of the K  largest  initial components.  This can be \nseen  clearly  by observing that for  all  i ;/; j  : \n\nd(u' - u\u00b7) \n\nC \n\n'dt \n\nJ  =.>t(ui- uj)+(a+1)(g(Ui)-g(Uj\u00bb. \n\nIf at  some  time  T,  ui(T)  =  uj(T),  then  one  can  show  that  Vt,  Ui(t)  =  Uj(t). \nTherefore, for  all i  ;/;  j, Ui(t) - Uj(t)  always  keeps  the same sign.  This leads  to the \nfollowing  theorem. \n\n\f640 \n\nMajani, Erlan80n and Abu-Mostafa \n\nTheorem 6:  (Preservation of order)  For all nodes i  # j, \n\nWe shall now summarize the results of the last  two sections. \n\nTheorem  7:  Given  an  initial state u-(O)  and  a  gain  G  large enough,  the KWTA \nNetwork  will  converge  to a  stable equilibrium state with  K  components equal to a \npositive  real  number  (Q  > 0)  in  the  positions of the  K  largest  initial  components, \nand N  - K  components equal to a  negative real number (13  < 0)  in  all other N  - K \npositions. \n\nThis can  be  derived  directly  from  Theorems  4,  5  and  6:  we  know  the  form  of all \nstable equilibrium states,  the  order  of the  initial  node  states is  preserved  through \ntime, and there is  guaranteed convergence  to an equilibrium state. \n\nVI - DISCUSSION \n\nThe well-known  Winner-Take-All Network is  obtained by setting K  to 1. \n\nThe N/2-Winners-Take-All Network, given a set gf N  real numbers, identifies which \nnumbers  are  above or  below  the  mediaIl~  This  task is  slightly  more  complex com(cid:173)\nputationally  (~ O(N log(N\u00bb  than  that  of the  Winner-Take-All  (~ O(N\u00bb.  The \nnumber of stable states is  much larger, \n\n( N) \n\n2N \n\nN/2  ~ J21rN' \n\ni.e.,  asymptotically exponential in the size of the network. \n\nAlthough  the number of connection  weights is  N2,  there exists an  alternate imple(cid:173)\nmentation of the KWTA Network which has O(N) connections (see Figure 2).  The \nsum of the outputs of all  nodes  and  the external input  is  computed,  then negated \nand  fed  back  to  all  the  nodes.  In  addition,  a  positive  self-connection  (a + 1)  is \nneeded  at every node. \n\nThe analysis was done for  a  \"large enough\"  gain G. In practice, the critical value of \nGis  a~i for  the N/2-Winners-Take-All  Network,  and slightly higher for  K  # N/2. \nAlso,  the  analysis  was  done  for  an  arbitrary  value  of the  self-connection  weight  a \n(Ial  < 1).  In  general,  if a is  close  to  +1,  this will  lead to faster  convergence  and  a \nsmaller  value  of the critical gain than if a  is  close  to -1. \n\n\fOn the K-Winners-Take-All Network \n\n641 \n\nVII - CONCLUSION \n\nThe  KWTA  Network  lets  all  nodes  compete  until  the  desired  number  of winners \n(K) is  obtained.  The competition is ibatained by  using mutual inhibition  between \nall  nodes,  while  the number of winners  K  is  selected by setting all external inputs \nto 2K - N. This paper illustrates the capability of the continuous Hopfield  Network \nto solve exactly an interesting decision problem, i.e., identifying the K  largest of N \nreal  numbers. \n\nAcknowledgments \n\nThe  authors  would  like  to  thank  John  Hopfield  and  Stephen  DeWeerth  from  the \nCalifornia  Institute  of Technology  and  Marvin  Perlman  from  the  Jet  Propulsion \nLaboratory for  insightful discussions about material presented in this paper.  Part of \nthe research described in this paper was performed at the Jet Propulsion Laboratory \nunder contract with  NASA. \n\nReferences \n\nJ .A. Feldman, D.H. Ballard,  \"Connectionist Models and their properties,\"  Cognitive \nScience,  Vol.  6,  pp.  205-254,  1982 \n\nS.  Grossberg,  \"Contour  Enhancement,  Short  Term  Memory,  and  Constancies  in \nReverberating  Neural  Networks,\"  Studies  in  Applied  Mathematics,  Vol.  LII  (52), \nNo.3, pp.  213-257,  September 1973 \n\nS.  Grossberg,  \"Non-Linear  Neural  Networks:  Principles,  Mechanisms,  and  Archi(cid:173)\ntectures,\"  Neural Networks,  Vol.  1, pp.  17-61,  1988 \n\nJ.J.  Hopfield,  \"Neurons  with graded  response  have collective  computational prop(cid:173)\nerties like  those  of two-state neurons,\"  Proc.  Natl.  Acad.  Sci.  USA,  Vol.  81,  pp. \n3088-3092,  May  1984 \n\nJ. Lazzaro, S. Ryckebusch, M.A. Mahovald, C.A. Mead,  \"Winner-Take-All Networks \nof O(N) Complexity,\"  in this volume,  1989 \n\nR.P.  Lippman,  B.  Gold,  M.L.  Malpass,  \"A  Comparison of Hamming and  Hopfield \nNeural Nets for  Pattern Classification,\"  MIT Lincoln  Lab.  Tech.  Rep.  TR-769,  21 \nMay  1987 \n\n\f642 \n\nMajani, Erlanson and Abu-Mostafa \n\nu \n\n,1 \n/ \n\n, \n\nFj gure  1; I ntersecti on  of si gmoj d and  line, \n\na+1 \n\nFigure  2; An  Implementation of the KWTA  Network, \n\nN-2K \n\n\f", "award": [], "sourceid": 157, "authors": [{"given_name": "E.", "family_name": "Majani", "institution": null}, {"given_name": "Ruth", "family_name": "Erlanson", "institution": null}, {"given_name": "Yaser", "family_name": "Abu-Mostafa", "institution": null}]}