{"title": "Generalized Hopfield Networks and Nonlinear Optimization", "book": "Advances in Neural Information Processing Systems", "page_first": 355, "page_last": 362, "abstract": null, "full_text": "Generalized Hopfield Networks and Nonlinear Optimization \n\n355 \n\nGeneralized Hopfield Networks \n\nand \n\nNonlinear Optimization \n\nGintaras v. Reklaitis \nDept. of Chemical Eng. \nPurdue University \nW. Lafayette, IN. 47907 \n\nAthanasios G. Tsirukis1 \nDept. of Chemical Eng. \n\nPurdue University \n\nW. Lafayette, IN. 47907 \n\nManoel F. Tenorio \nDept of Electrical Eng. \nPurdue University \nW. Lafayette, IN. 47907 \n\nABSTRACT \n\nA nonlinear neural framework, called the Generalized Hopfield \nnetwork, is proposed, which is able to solve in a parallel distributed \nmanner systems of nonlinear equations. The method is applied to the \ngeneral nonlinear optimization problem. We demonstrate GHNs \nimplementing the \nthree most important optimization algorithms, \nnamely the Augmented Lagrangian, Generalized Reduced Gradient and \nSuccessive Quadratic Programming methods. The study results in a \ndynamic view of the optimization problem and offers a straightforward \nmodel for the parallelization of the optimization computations, thus \nsignificantly extending the practical limits of problems that can be \nformulated as an optimization problem and which can gain from the \nintroduction of nonlinearities in their structure (eg. pattern recognition, \nsupervised learning, design of content-addressable memories). \n\n1 To whom correspondence should be addressed. \n\n\f356 \n\nReklaitis, Tsirukis and Tenorio \n\n1 RELATED WORK \nThe ability of networks of highly interconnected simple nonlinear analog processors \n(neurons) to solve complicated optimization problems was demonstrated in a series of \npapers by Hopfield and Tank (Hopfield, 1984), (Tank, 1986). \nThe Hopfield computational model is almost exclusively applied to the solution of \ncombinatorially complex linear decision problems (eg. Traveling Salesman Problem). \nUnfortunately such problems can not be solved with guaranteed quality, (Bruck, 1987), \ngetting trapped in locally optimal solutions. \nJeffrey and Rossner, (Jeffrey, 1986), extended Hopfield's technique to the nonlinear \nunconstrained optimization problem, using Cauchy dynamics. Kennedy and Chua, \n(Kennedy, 1988), presented an analog implementation of a network solving a nonlinear \noptimization problem. The underlying optimization algorithm is a simple transformation \nmethod, (Reklaitis, 1983), which is known to be relatively inefficient for large nonlinear \noptimization problems. \n\n2 LINEAR HOPFIELD NETWORK (LHN) \nThe computation in a Hopfield network is done by a collection of highly interconnected \nsimple neurons. Each processing element, i, is characterized by the activation level, Ui, \nwhich is a function of the input received from the external environment, Ii, and the state \nof the other neurons. The activation level of i is transmitted to the other processors, after \npassing through a filter that converts Ui to a 0-1 binary value, Vi' \nThe time behavior of the system is described by the following model: \n\n~ T\u00b7V\u00b7 - -' + I\u00b7 \n' \n~ 'J J \nJ \n\nU ' \nR . \n' \n\nwhere Tij are the interconnection strengths. The network is characterized as linear, \nbecause the neuron inputs appear linearly in the neuron's constitutive equation. The \nsteady-state of a Hopfield network corresponds to a local minimum of the corresponding \nquadratic Lyapunov function: \n\nE = - ~ ~ ~ TijV 1 Vj + ~IiVi + ~ (;) So sjl(V)dV \n\nV. \n\n, \n\nJ \n\n' \n\n\" \n\nIf the matrix [Tij ] is symmetric, the steady-state values of Vi are binary These \nobservations tum the Hopfield network to a very useful discrete optimization tool. \nNonetheless, the linear structure poses two major limitations: The Lyapunov (objective) \nfunction can only take a quadratic form, whereas the feasible region can only have a \nhypercube geometry (-1 ~ Vi ~ 1). Therefore, the Linear Hopfield Network is limited \nto solve optimization problems with quadratic objective function and linear constraints. \nThe general nonlinear optimization problem requires arbitrarily nonlinear neural \ninteractions. \n\n\fGeneralized Hopfield Networks and Nonlinear Optimization \n\n357 \n\n3 THE NONLINEAR OPTIMIZATION PROBLEM \nThe general nonlinear optimization problem consists of a search for the values of the \nindependent variables Xi. optimizing a multivariable objective function so that some \nconditions (equality. hi. and inequality. gj. constraints) are satisfied at the optimum. \n\noptimize f (Xl. X2 \u2022 \u2022\u2022\u2022\u2022 XII) \nsubject to \n\nhi (X I. X 2. . ..\u2022 XII) = 0 \naj ~ gj (Xl. X2 \u2022 \u2022\u2022\u2022\u2022 XII) ~ bj \n4' ~ Xk ~ xf \n\nl = 1.2 ..... K. K < N \n\nj = 1.2 ..... M \n\nk = 1.2 .... .N \n\nThe influence of the constraint geometry on the shape of the objective function is \ndescribed in a unified manner by the Lagrangian Function: \n\nL = f - v T h \n\nThe Vj variables \u2022 also known as Lagrange multipliers. are unknown weighting \nparameters to be specified. In the optimum. the following conditions are satisfied: \n\n(N equations) \n\n(K equations) \n\n(1) \n\n(2) \n\nFrom (1) and (2) it is clear that the optimization problem is transformed into a nonlinear \nequation solving problem. In a Generalized Hopfield Network each neuron represents an \nindependent variable. The nonlinear connectivity among them is determined by the \nspecific problem at hand and the implemented optimization algorithm. The network is \ndesigned to relax from an initial state to a steady-state that corresponds to a locally \noptimal solution of the problem. \n\nTherefore. the optimization algorithms must be transformed into a dynamic model -\nsystem of differential equations - that will dictate the nonlinear neural interactions. \n\n4 OPTIMIZATION METHODS \nCauchy and Newton dynamics are the two most important unconstrained optimization \n(equation solving) methods. adopted by the majority of the existing algorithms. \n\n4.1 CAUCHY'S METHOD \nThis is the famous steepest descent algorithm. which tracks the direction of the largest \nchange in the value of the objective function. f. The \"equation of motion\" for a Cauchy \ndynamic system is: \n\n\f358 \n\nReklaitis, Tsirukis and Tenorio \n\ndx \ndt \n\n= -VI \n\n.%(0) = .%0 \n\n4.2 NEWTON'S METHOD \nIf second-order information is available, a more rapid convergence is produced using \nNewton' s approximation: \n\n.%(0) = .%0 \n\nThe steepest descent dynamics are very efficient initially, producing large objective(cid:173)\nvalue changes, but close to the optimum they become very small, significantly increasing \nthe convergence time. In contrast, Newton's method has a fast convergence close to the \noptimum, but the optimization direction is uncontrollable. The Levenberg - Marquardt \nheuristic, (Reklaitis, 1983), solves the problem by adopting Cauchy dynamics initially \nand switch to Newton dynamics near the optimum. Figure 1 shows the optimization \ntrajectory of a Cauchy network. The algorithm converges to locally optimal solutions. \n\n6 . 1 r---------------~------__. \n\n3 . 3 \n\n\" a \n\n-3 . 0 \n\n- 0 . a L---\"'_-!..._--L..._-l-_--'--_..L....-..---'_---L_--L..._--'-_-'-------' \n\n-6 . 11 \n\n-2.' \n\nI .' \n\n2.' \n\n~ . II \n\nFigure 1: Convergence to Local Optima \n\n\fGeneralized Hopfield Networks and Nonlinear Optimization \n\n359 \n\n5 CONSTRAINED OPTIMIZATION \nThe constrained optimization algorithms attempt to conveniently manipulate the equality \nand inequality constraints so that the problem is finally reduced to an unconstrained \noptimization, which is solved using Cauchy's or Newton's methods. Three are the most \nimportant constrained optimization algorithms: The Augmented Lagrangian, \nthe \nGeneralized Reduced Gradient (GRG) and the Successive Quadratic Programming \n(SQP). Corresponding Generalized Hopfield Networks will be developed for all of them. \n\n5.1 TRANSFORMATION METHODS - AUGMENTED LAGRANGIAN \n\nAccording to the transformation methods, a measure of the distance from the feasibility \nregion is attached to the objective function and the problem is solved as an unconstrained \noptimization one. A transformation method was employed by Hopfield. These \nalgorithms are proved inefficient because of numerical difficulties implicitly embedded in \ntheir structure, (Reklaitis, 1983). The Augmented Lagrangian is specifically designed to \navoid these problems. The transformed unconstrained objective function becomes: \n\nP (x,a,t) = I (x) + R L \u00abgj(x) + aj>2 - ay} \n\nj \n\n+ R L ([hi(x) + 'ti]2 -\n\n't7 } \n\ni \n\nwhere R is a predetennined weighting factor, and aj' 't; the corresponding inequality -\nequality Lagrange multipliers. The operator returns a for a ~ O. Otherwise it \nreturns O. \n\nThe design of an Augmented Lagrangian GHN requires (N +K) neurons, where N is the \nnumber of variables and K is the number of constraints. The neuron connectivity of a \nGHN with Cauchy performance is described by the following model: \n\ndx \ndt \n\nda \n-\ndt \n\n= \n\n-V P \n\nx \n\n= \n\n-VI - 2R TVg - 2R [h + 'tfVh \n\n= \n\n+VaP \n\n= \n\n2R - 2R a \n\nwhere Vg and Vh are matrices, ego Vh = [Vh t , ... , Vht ]. \n\n5.2 GENERALIZED REDUCED GRADIENT \nAccording to the GRG method, K variables (basics, X) are determined by solving the K \nnonlinear constraint equations, as functions of the rest (N -K) variables (non-basics, i). \nSubsequently the problem is solved as a reduced-dimension unconstrained optimization \nproblem. Equations (1) and (2) are transformed to: \n\n\f360 \n\nReklaitis, Tsirukis and Tenorio \n\n\" \n\n,.. -1 \n\n= Vi - Vi (Vh) Vh = 0 \n\n-\n\nvj \nh(x) = 0 \n\nThe constraint equations are solved using Newton's method. Note that the Lagrange \nmultipliers are explicitly eliminated. The design of a GRG GHN requires N neurons, \neach one representing an independent variable. The neuron connectivity using Cauchy \ndynamics for the unconstrained optimization is given by: \n\ndX \n-\ncit \n\n-vJ = - vI + vj ( Vh )-1 Vh \n\n= \n\n(-+ -\n\ndi \ndt \n\nh (Vh )-1 ) \n\n= \n\nh(x) \n\n= \n\n0 \n\nX (0) = Xo \n\n(3) \n\n(4) \n\nSystem (3)-(4) is a differential - algebraic system, with an inherent sequential character: \nfor each small step towards lower objective values, produced by (3), the system of \nnonlinear constraints should be solved, by relaxing equations (4) to a steady-state. The \nprocedure is repeated until both equations (3) and (4) reach a steady state. \n\nSUCCESSIVE QUADRATIC PROGRAMMING \n\n5.3 \nIn the SQP algorithm equations (1) and (2) are simultaneously solved as a nonlinear \nsystem of equations with both the independent variables, x, and the Lagrange mUltipliers, \nv, as unknowns. The solution is detennined using Newton's method. \nThe design of an SQP GHN requires (N +K) neurons representing the independent \nvariables and the Lagrange multipliers. The connectivity of the network is determined by \nthe following state equations: \n\n= \u00b1 [V2 L ] -1 (V L ) \n\ndz \ndt \nz(O) = Zo \n\nwhere z is the augmented set of independent variables: \n\nz = [x;v] \n\n5.4 COMPARISON OF THE NETWORKS \nThe Augmented Lagrangian network is very easily programmed. Newton dynamics \nshould be used very carefully because the operator is not smooth at a = O. \nThe GRG network requires K fewer neurons compared to the other networks. It requires \nmore programming effort because of the inversion of the constraint Jacobian. \n\n\fGeneralized Hopfield Networks and Nonlinear Optimization \n\n361 \n\nThe SQP network is algorithmically the most effective, because second order information \nis used in the detennination of both the variables and the multipliers. It is the most \ntedious to program because of the inversion of the Lagrange Hessian. All the GHNs are \nproved to be stable, (Tsirukis, 1989). The following example was solved by all three \nnetworks. \n\nminimize f(x) = -Xl X~ X~ 181 \nsubject to \nhi (x) = xi + x~ + X3 - 13 = 0 \nh2(x) = x~ xilf2 - 1 = 0 \n\nConvergence was achieved by all the networks starting from both feasible and infeasible \ninitial points. Figures 2 and 3 depict the algorithmic superiority of the SQP network. \n\nAU~HENTEO LA~RAN~IAN & SQP NET~ORKS \n\n0~~~~~~~~~~~~~~1\u00b7~~~~-r'-~~~~~~-r~ \n\n, \n\n\\ \n\n\\ \n\\ \n\\ \nS~P I \n\n0 \n\no 0 \n\u2022 S~P \n----- \u2022 GRG \n\n-2 \n\n-4 \n-6 \n-8 \n.... \n3 -10 \nc \n> -12 \n~-14 \n~ \n~ -16 \n::> \n... -18 \n\n~ -20 \no \n-22 \n-24 \n-26 \n-28 \n-30 ~~~~~~~~~~~~~~~.-~~~~~~~~~~~~~~ \n.8 1.1 1.2 1.4 1.b 1.8 2 .\u2022 \n\n,-------~---; \n\n0000000 0 0 0 \n\no \no \no \n\n.2 \n\n.4 \n\n.6 \n\n9 \n\n1 0 \n\n00 000 \n\n7 \n\n8 \n\nAL \n\no \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\nf> \n\nTIME \n\nTIME \n\nFigure 2. Feasible Initial State. \n\nFigure 3. Infeasible Initial State. \n\n6 OPTIMIZATION & PARALLEL COMPUTATION \nThe presented model can be directly translated into a parallel nonlinear optimizer -\nnonlinear equation solver - which efficiently distributes the computational burden to a \nlarge number of digital processors (at most N+K). Each one of them corresponds to an \noptimization variable, continuously updated by numerically integrating the state \nequations: \n\nx~r+l) = ~ (x(r) \u2022 x(r+l) ) \n\n\f362 \n\nReklaitis, Tsirukis and Tenorio \n\nwhere 4> depends on the optimization algorithm and the integration method. After each \nupdate the new value is communicated to the network. \n\nThe presented algorithm has some unique features: The state equations are differentials \nof the same function, the Lagrangian. Therefore, a simple integration method (eg. \nexplicit) can be used for the steady-state computation. Also, the integration in each \nprocessor can be done asynchronously, independent of the state of the other processors. \nThus, the algorithm is robust to intercommunication and execution delays. \n\nAcknowledgements \n\nAn extended version of this work has appeared in (fsirukis, 1990). The authors wish to \nthank M.I. T. Press Journals for their permission to publish it in the present form. \n\nReferences \n\nBruck, J. and J. Goodman (1988). On the Power of Neural Networks for Solving Hard \nProblems. Neural Infonnation Processing Systems, D2. Anderson (ed.), American \nInstitute of Physics, New York, NY, 137-143. \n\nHopfield J.1. (1984), Neurons with Graded Response have Collective Computational \nProperties like those of Two-state Neurons, Proc. Natl. Acad. Sci. USA, vol. 81, 3088-\n3092. \n\nJeffrey, W. and R. Rosner (1986), Neural Network Processing as a Tool for Function \nOptimization, Neural Networks for Computing. J.S. Denker (ed.), American Institute of \nPhysics, New York, NY, 241-246. \n\nKennedy, M.P. and L.O. Chua (1988), Neural Networks for Nonlinear Programming, \nIEEE Transactions on Circuits and Systems, vol. 35, no. 5, pp. 554-562. \n\nReklaitis, G.V., A. Ravindran and K.M. Ragsdell (1983), Engineering Optimization: \nMethods and Applications. Wiley - Interscience. \n\nTank, D.W. and JJ. Hopfield (1986), Simple \"Neural\" Optimization Networks: An AID \nConverter. Signal Decision Circuit. and a Linear Programming Circuit. IEEE \nTransactions on circuits and systems, CAS-33, no. 5. \n\nTsirukis. A. G., Reklaitis, G.V., and Tenorio, M.F. (1989). Computational properties of \nGeneralized Hopfie/d Networks applied to Nonlinear Optimization. Tech. Rep. lREE \n89-69, School of Electrical Engineering, Purdue University. \n\nTsirukis, A. G., Reklaitis, G.V., and Tenorio, M.F. (1990). Nonlinear Optimization using \nGeneralized Hopfie/d Networks. Neural Computation, vol. I, no. 4. \n\n\fPART V: \n\nOTHER APPLICATIONS \n\n\f", "award": [], "sourceid": 280, "authors": [{"given_name": "Gintaras", "family_name": "Reklaitis", "institution": null}, {"given_name": "Athanasios", "family_name": "Tsirukis", "institution": null}, {"given_name": "Manoel", "family_name": "Tenorio", "institution": null}]}