{"title": "On the Non-Existence of a Universal Learning Algorithm for Recurrent Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 431, "page_last": 436, "abstract": null, "full_text": "On the Non-Existence of a Universal Learning \n\nAlgorithm for Recurrent Neural Networks \n\nHerbert Wiklicky \n\nCentrum voor Wiskunde en Informatica \n\nP.O.Box 4079, NL-1009 AB Amsterdam, The Netherlands\u00b7 \n\ne-mail:  herbert@cwi.nl \n\nAbstract \n\nWe prove that the so called \"loading problem\" for (recurrent) neural net(cid:173)\nworks is unsolvable.  This extends several results which already demon(cid:173)\nstrated that training and related design problems for neural networks are \n(at least)  NP-complete.  Our result also implies  that it is impossible  to \nfind  or to  formulate  a  universal training  algorithm,  which  for  any  neu(cid:173)\nral  network  architecture could determine a  correct set of weights.  For \nthe  simple proof of this,  we  will just show  that the loading  problem  is \nequivalent to \"Hilbert's tenth problem\" which is known to be unsolvable. \n\n1  THE NEURAL NETWORK MODEL \n\nIt seems that there are relatively few  commonly accepted general formal  definitions of the \nnotion  of a  \"neural  network\".  Although  our results also  hold  if based on  other formal \ndefinitions  we  will  try  to  stay here  very  close  to  the original  setting  in  which Judd's NP \ncompleteness result was given [Judd,  1990].  But in  contrast to  [Judd,  1990]  we will  deal \nhere with simple recurrent networks instead of feed forward architectures. \n\nOur networks  are constructed from  three different types  of units:  .E-units compute just \nthe sum of all incoming signals; for II -units the activation (node) function is given by the \nproduct of the incoming signals; and with E)-units - depending if the input signal is smaller \nor larger than  a certain  threshold parameter fl  - the output is  zero or one.  Our units are \nconnected or linked by real weighted connections and operate synchronously. \n\nNote that we could base our construction also just on  one general  type of units,  namely \nwhat usually is called .E II -units.  Furthermore, one could replace the II -units in the below \n\n431 \n\n\f432 \n\nWiklicky \n\nconstruction by (recurrent) modules of simple linear threshold units which had to perform \nunary integer multiplication.  Thus, no higher order elements are actually needed. \nAs  we deal  with  recurrent networks, the behavior of a network now is  not just given by a \nsimple mapping from  input space to  output space (as  with feed forward architectures).  In \ngeneml, an  input pattern  now is  mapped to an  (infinite) output sequence.  But note, that if \nwe consider as  the output of a recurrent network a certain final,  stable output pattern, we \ncould return to a more static setting. \n\n2  THE MAIN RESULT \n\nThe question we  will  look at is  how difficult it is to construct or train a neural network of \nthe described type so that it actually exhibits a certain desired behavior, i.e.  solves a given \nlearning task.  We will investigate this by the following decision problem: \n\nDecision 1  Loading Problem \nINSTANCE: A neural network architecture N  and a learning task T . \nQUESTION: Is there a configuration C for N  such that T  is realized by C? \n\nBy  a  network configuration  we just think  of a certain  setting  of the  weights  in  a  neural \nnetwork.  Our main result concerning this problem now just states that it is  undecidable or \nunsolvable. \n\nTheorem 1  There exists no algorithm which could decide for any learning task T  and any \n(recurrent) neural network (consisting of\"\u00a3.. , TI-, and 8-units) if the given architecture can \npeiformT. \n\nThe decision problem (as usual) gives a \"lower bound\" on the hardness of the related con(cid:173)\nstructive problem [Garey and Johnson, 1979]. If we could construct a correct configuration \nfor  all  instances, it would  be trivial  to  decide instantly if a correct configuration exists at \nall.  Thus we have: \n\nCorollary 2  There exists no universal learning algorithm for (recurrent) neural networks. \n\n3  THE PROOF \n\nThe proof of the above theorem is by constructing a class of neural networks for  which it \nis impossible to decide (for all instance) if a certain learning task can be satisfied.  We  will \nrefer for  this  to \"Hilbert's tenth problem\" and show that for  each of its  instances  we can \nconstruct a neuml network, so that solutions to the loading problem would lead to solutions \nto  the original problem  (and vice  versa).  But as  we know that Hilbert's  tenth  problem is \nunsolvable we also have to conclude that the loading problem we consider is unsolvable. \n\n3.1 \n\nfiLBERT'S TENTH PROBLEM \n\nOur reference problem - of which we know it is  unsolvable - is closely related to several \nfamous and classical mathematical problems including for example Fermat's last theorem. \n\n\fOn the Non-Existence of a Universal Learning Algorithm for Recurrent Neural Networks \n\n433 \n\nDefinition 1  A diophantine equation is a polynomial D in  n  variables with  integer coeffi(cid:173)\ncients. that is \n\nD(.1:J, :J:2,  ... ,.1\",,)  = L di(3:1, .T2, ... ,.r n ) \n\nwith each term d i of the form di( 3:1, .1:2, ... , .1:rt ) = r.i  . J: i \u2022 .  J: iz  .... . J : im,  where the indices \n{i I, \u00a32, ... , ; rrt}  are taken from {I , 2, ... , 11 }  and the coefficient r.i  E Z. \n\nt \n\nThe concrete problem, first formulated in [Hilbert, 1900] is to develop a universal algorithm \nhow  to find  the integer solutions for all  D, i.e.  a vector  (3: J, .1:2, ... ,3:,1)  with  .1: i  E  Z  (or \nIN), such that D( 3: 1,3:2, ... , .1: rt)  = O.  The corresponding decision problem therefore is the \nfollowing: \n\nDecision 2  Hilbert's Tenth Problem \nINSTANCE: Given a diophantine equation D. \nQUESTION: Is there an integer solutionfor D? \n\nAlthough this problem might seem to be quite simple - it formulation is actually the shortest \namong D. Hilbert's famous 23 problems - it was not until 1970 when Y.  Matijasevich could \nprove  that it  is  unsolvable  or  undecidable  [Matijasevich,  1970].  There is  no  recursive \ncomputable predicate for diophantine equations which holds if a solution in Z  (or N) exists \nand fails otherwise [Davis, 1973, Theorem 7.4]. \n\n3.2  THE NETWORK ARCIDTECTURE \n\nThe construction  of a  neural  network  IV  for  each diophantine  D  is  now  straight forward \n(see FigJ). It is just a three step construction. \n\nFirst,  each  variable  .1: i  of D  is  represented  in  IV  by a  small  sub-network.  The  structure \nof these  modules  is  quite  simple  (left  side  of Fig.1).  Note  that  only  the  self-recurrent \nconnection for the unit at the bottom  of these modules is \"weighted\" by 0.0  < 'II!  < 1.0. \nAll other connection transmit their signals unaltered (i.e. w  = 1.0). \nSecond, the  terms  di  in  D  are represent by  Il-units in  IV  (as show in Fig.1).  Therefore, \nthe  connections  to  these units  from  the  sub-modules representing  the  variables  .1: i  of D \ncorrespond to the occurrences of these variables in each term d i. \n\nFinally, the output signals of all these Il-units is multiplied by the corresponding coefficients \nC:i  and summed up by the ~-unit at the top. \n\n3.3  THE SUB.MODULES \n\nThe fundamental property of the  networks  constructed in  the  above way  is given  by the \nsimple  fact  that  the  behavior of such  a  neural  network  IV  corresponds  uniquely  to  the \nevaluation of the original diophantine D. \n\nFirst, note that the behavior of N  only depends on the weights Wi  in each of the variable \nmodules.  Therefore,  we  will  take  a  closer  look  at the  behavior of these  sub-modules. \nSuppose,  that at  some  initial  moment a  signal  of value  1.0 is  received  by  each  variable \nmodule.  After that the signal is reset again to 0.0. \n\n\f434 \n\nWiklicky \n\nThe \"seed\" signal starts circling via Wi.  With each update circle this signal becomes a little \nbit smaller.  On  the  other hand,  the  same signal is also  sent to  the  central  8-unit,  which \nsends a  signal  1.0 to  the  top  accumulator unit as  long as  the  \"circling\" activation of the \nbottom unit is larger then the (preset) threshold 0,.  The top unit (which also keeps track of \nits former activiations via a recurrent connection) therefore just counts how many updates \nit takes before the activiation of the bottom unit drops below 0,. \n\nThe final, maximum, value which is emitted by the accumulator unit is some integer .1:, for \nwhich we have: \n\nWe  thus have a  correspondence between Wi  and the integer .1: i =  l ~ I~/i J ' where  L-T J the \nlargest integer which is smaller or equal to .1:.  Given .1: i  we also can construct an appropriate \nweight Wi  by choosing it from  the interval  (exp (~~) ,exp (:r.1~!1))' \n\n3.4  THE EQUIVALENCE \n\nTo  conclude  the  proof,  we  now  have  to  demonstrate  the  equivalence  of Hilbert's  tenth \nproblem and the loading problem for  the discussed class of recurrent networks and some \nlearning task. \n\nThe learning task we will consider is the following:  Map an  input pattern with  all  signals \nequal to 1.0 (presented only once) to an output sequence which after afinite number of steps \n\n\fOn the Non-Existence of a Universal Learning Algorithm for Recurrent Neural Networks \n\n435 \n\nis  constant equal to 0.0.  Note that - as discussed above - we could also consider a  more \nstatic learing task where a final state, which detennines the (single) output of the network, \nwas detennined by the condition that the outgoing signals of all 8-units had to be zero. \n\nConsidering this learing task and with what we said about the behavior of the sub-modules it \nis now trivial to see that the constructed network just evaluates the diophantine polynomial \nfor  a  set of variables  ;r i  corresponding  to  the  (final)  output signals of the  sub-modules \n(which are detennined uniquely by  the weight values  !lii)  if the  input to  the network  is  a \npattern of all  1.0s. \nIf we had a solution  .1.' i  of the original  diophantine equation  D, and if we take the corre(cid:173)\nsponding values Wi  (according to the above relation) as weights in  the sub-modules of N, \nthen this would also solve the loading problem for this architecture.  On the other hand, if \nwe knew the correct weights Wi  for any such network N, then the corresponding integers \n3: i  would also solve the corresponding diophantine equation D. \n\nIn particular, if it would  be possible to decide if a  correct set of weights Wi  for  N  exists \n(for the above learning task), we could also decide if the corresponding diophantine D  had \na solution  3: i  E  :IN  (and vice versa).  As the whole construction was trivial,  we have shown \nthat both problems are equivalent. \n\n4  CONCLUSIONS \n\nWe demonstrated that the loading problem not only is NP-complete - as shown for simple \nfeed fOIward architectures in [Judd, 1990], [Lin and Vitter, 1991], [Blum and Rivest, 1992], \netc.  - but actually unSOlvable, i.e. that the training of (recurrent) neural networks is among \nthose problems  which \"indeed are  intractable  in  an  especially strong sense\"  [Garey and \nJohnson, 1979, P 12].  A related non-existence result concerning the training of higher order \nneural networks with integer weights was shown in  [Wiklicky, 1992, WIklicky,  1994]. \n\nOne should stress once again that the fact that no general algorithm exists for higher order \nor recurrent networks,  which could solve the loading problem  (for all its instances), does \nnot imply that all instances of this problem are unsolvable or that no solutions exist.  One \ncould hope, that in most relevant cases - whatever that could mean - or, when we restrict \nthe problem,  a sub-class of problems things  might become tractable.  But the difference \nbetween solvable and unsolvable problems often can be very small. \n\nIn particular, it is known that the problem of solving linear diophantine equations (instead of \ngeneral ones) is polynomially computable, while if we go to quadratic diophantine equations \nthe problem already becomes;V P  complete [Johnson, 1990].  And for general diophantine \nthe problem is even unsolvable. Moreover, it is also known that this problem is unsolvable if \nwe consider only diophantine equations of maximum degree 4, and there exists a universal \ndiophantine with only 13  variables which is unsolvable [Davis et al.,  1976]. \n\nBut we  think,  that one should interpret the \"negative\" results on  NP-complexity as  well \nas on undecidability of the loading problem not as restrictions for neural networks,  but as \nrelated  to  their computational power.  As  it was  shown that concrete neural networks can \nbe constructed, so that they simulate a universal Turing machine [Siegelmann and Sontag, \n1992, Cosnard et al., 1993].  It is mere the complexity of the problem one attempts to solve \nwhich simply cannot disappear and not some intrinsic intractability of the neural network \napproach. \n\n\f436 \n\nWiklicky \n\nAcknowledgement \n\nThis work was  started during the author's affiliation with  the \"Austrian Research Institute \nfor  Artificial  Intelligence\",  Schottengasse 3,  A-101O  Wien,  Austria.  Further  work  was \nsupported  by  a  grant  from  the  Austrian  \"Fonds  zur  Forderung  der  wissenschaftlichen \nForschung\" as Projekt J0828-PHY. \n\nReferences \n\n[Blum and Rivest, 1992]  Avrim L.  Blum and Ronald L. Rivest.  Training a 3-node neural \n\nnetwork is NP-complete. Neural Networks, 5:117-127,1992. \n\n[Cosnard et al. , 1993]  Michael Cosnard, Max  Garzon, and Pascal Koiran.  Computability \nIn  Symposium  on  Theoretical \n\nproperties  of low-dimensional  dynamical  systems. \nAspects of Computer Science (STACS  '93), pages 365-373, Springer-Verlag, Berlin(cid:173)\nNew York,  1993. \n\n[Davis, 1973]  Martin Davis.  Hilbert's tenth problem is unsolvable. Amer. Math.  Monthly, \n\n80:233-269, March 1973. \n\n[Davis et aI.,  1976]  Martin  Davis, Yuri  Matijasevich, and Julia Robinson.  Hilbert's tenth \nproblem - diophantine equations:  Positive aspects of a negative solution.  In Felix E. \nBrowder,  editor,  Mathematical  developments  arising from Hilbert,  pages 323-378, \nAmerican Mathematical Society, 1976. \n\n[Garey and Johnson, 1979]  Michael R. Garey and David S. Johnson.  Computers and In(cid:173)\ntractability -A Guide to the Theory of NP-Complete ness. W. H. Freeman, New York, \n1979. \n\n[Hilbert,  1900]  David  Hilbert.  Mathematische Probleme.  Nachr.  Ges.  Wiss.  G6ttingen, \n\nmath.-phys.Kl., :253-297, 1900. \n\n[Johnson, 1990]  David  S.  Johnson.  A  catalog  of complexity  classes.  In  Handbook  of \nTheoretical  Computer Science  (Volume  A: Algorithms and Complexity),  chapter 2, \npages 67-161, Elsevier - MIT Press, Amsterdam - Cambridge, Massachusetts, 1990. \n[Judd, 1990]  J.  Stephen Judd.  Neural Network Design and the  Complexity of Learning. \n\nMIT Press, Cambridge, Massachusetts - London, England, 1990. \n\n[Lin and Vitter,  1991]  Jyh-Han Lin and Jeffrey Scott Vitter.  Complexity results on learning \n\nby neural networks.  Machine Learning, 6:211-230,1991. \n\n[Matijasevich, 1970]  Yuri  Matijasevich.  Enumerable sets  are diophantine.  Dokl.  Acad. \n\nNauk.,  191:279-282, 1970. \n\n[Siegelmann and Sontag, 1992]  Hava T. Siegelmann and Eduardo D. Sontag.  On the com(cid:173)\n\nputational power of neural nets.  In Fifth Workshop on Computational Learning Theory \n(COLT 92), pages 440-449, 1992. \n\n[Wiklicky, 1992]  Herbert  Wiklicky.  SyntheSis  and Analysis  of Neural Networks - On \na  Framework for  Artificial Neural  Networks.  PhD  thesis,  University  of Vienna -\nTechnical University of Vienna, September 1992. \n\n[WIklicky,  1994]  Herbert Wiklicky.  The neural network  loading problem is  undecidable. \nIn Euro-COLT '93 - Conference on Computational Learning Theory, page (to appear), \nOxford University Press, Oxford, 1994. \n\n\fPART III \n\nTHEORETICAL \n\nANALYSIS:  DYNAMICS \n\nAND  STATISTICS \n\n\f\f", "award": [], "sourceid": 726, "authors": [{"given_name": "Herbert", "family_name": "Wiklicky", "institution": null}]}