{"title": "On the Distribution of the Number of Local Minima of a Random Function on a Graph", "book": "Advances in Neural Information Processing Systems", "page_first": 727, "page_last": 732, "abstract": null, "full_text": "On the Distribution of the Number of Local Minima \n\n727 \n\nOn the Distribution of the Number of Local \nMinima of a Random Function on a Graph \n\nPierre Baldi \nJPL, Caltech \nPasadena, CA 91109 \n\nYosef Rinott \n\nUCSD \n\nLa Jolla, CA 92093 \n\nCharles Stein \nStanford University \nStanford, CA 94305 \n\nINTRODUCTION \n\n1 \nMinimization of energy or error functions has proved to be a useful principle in \nthe design and analysis of neural networks and neural algorithms. A brief list of \nexamples include: the back- propagation algorithm, the use of optimization methods \nin computational vision, the application of analog networks to the approximate \nsolution of NP complete problems and the Hopfield model of associative memory. \n\nIn the Hopfield model associative memory, for instance, a quadratic Hamiltonian of \nthe form \n\nx, = \u00b11 \n\n(1) \n\nis constructed to tailor a particular \"landscape\" on the n- dimensional hypercube \nH n = {-I, l}n and store memories at a particular subset of the local minima of F \non Hn. The synaptic weights Wij are usually constructed incrementally, using a form \nof Hebb's rule applied to the patterns to be stored. These patterns are often chosen \nat random. As the number of stored memories grows to and beyond saturation, the \nenergy function F becomes essentially random. In addition, in a general context of \ncombinatorial optimization, every problem in NP can be (polynomially) reduced to \nthe problem of minimizing a certain quadratic form over Hn. \n\nThese two types of considerations, associative memory and combinatorial optimiza(cid:173)\ntion, motivate the study of the number and distribution of local minima of a ran(cid:173)\ndom function F defined over the hypercube, or more generally, any graph G. Of \ncourse, different notions of randomness can be introduced. In the case where F is a \n\n\f728 \n\nBaldi, Rinott and Stein \n\nquadratic form as in (1), we could take the coefficients Wij to be independent identi(cid:173)\ncally distributed gaussian random variables, which yields, in fact, the Sherrington(cid:173)\nKirkpatrick long-range spin glass model of statistical physics. For this model, the \nexpectation of the number of local minima is well known but no rigorous results \nhave been obtained for its distribution (even the variance is not known precisely). \nA simpler model of randomness can then be introduced, where the values F(x) of \nthe random function at each vertex are assigned randomly and independently from \na common distribution: This is in fact the random energy model of Derrida (1981). \n\n2 THE MAIN RESULT \nIn Baldi, Rinott and Stein (1989) the following general result on random energy \nmodels is proven. \nLet G = (V, E) be a regular d-graph, i.e., a graph where every vertex \nhas the same number d of neighbors. Let F be a random function on V \nwhose values are independentlY distributed with a common continuous \ndistribution. Let W be the number of local minima of F, i.e., the number \nof vertices x satisfying F(x) > F(y) for any neighbor y of x (i.e., (x, Y)fE). \nLet EW = A and Var W = u 2 \u2022 Then \n\nand for any positive real w: \n\nEW= ill \nd+1 \n\n(2) \n\n(3) \n\nwhere 4> is the standard normal distribution and C is an absolute con(cid:173)\nstant. \nRemarks: \n\n(a) The proof of (3) ((2) is obvious) is based on a method developed in Stein (1986). \n\n(b) The bound given in the theorem is not asymptotic but holds also for small \ngraphs. \n(c) If 1 V 1-+ 00 the theorem states that if u -+ 00 then the distribution of the \nnumber of local minima approaches a normal distribution and (3) gives also a bound \nof 0(u- 1/ 2 ) on the rate of convergence. \n\n(d) The function F simply induces a ranking (or a random permutation) of the \nvertices of G. \n\n(e) The bound in (3) may not be optimal. We suspect that the optimal rate should \nscale like u- 1 rather than u- 1/ 2 \u2022 \n\n\fOn the Distribution of the Number of Local Minima \n\n729 \n\n3 EXAMPLES OF APPLICATIONS \n\n(1) Consider a n x n square lattice (see fig.1) with periodic boundary conditions. \nHere, IVnl = n 2 and d = 4. The expected number of local minima is \n\nn 2 \nEWn =-\n5 \n\nand a simple calculations shows that \n\n13n2 \nVarWn = 225 . \n\n(4) \n\n(5) \n\nTherefore Wn is asymptotically normal and the rate of convergence is bounded by \nO(n-l/2). \n\n(2) Consider a n x n square lattice, where this time the neighbors of a vertex v are \nall the points in same row or column as v (see fig.2). This example arises in game \ntheory, where the rows (resp. columns) correspond to different possible strategies of \none of two players. The energy value can be interpreted as the cost of the combined \nchoice of two strategies. Here IVnl = n 2 and d = 2n - 2. The expected number of \nlocal minima (the Nash equilibrium points of game theory) Wn is \n\nEWn = \n\nn 2 \n\n2n-1 \n\nn \n~-\n2 \n\nand \n\nn \nVar Wn = 2(2n _ 1)2 ~ S\u00b7 \n\nn 2(n - 1) \n\n(6) \n\n(7) \n\nTherefore Wn is asymptotically normal and the rate of convergence is bounded by \nO(n- 1/ 4). \n\n(3) Consider the n-dimensional hypercube H n = (Vn, En) (see fig.3). Then 1 Vn 1= \n2n and d = n. The expected number of local minima Wn is: \n\nEWn= - - =An \n\n2n \n\nn+1 \n\nand \n\nVar Wn = \n\n2n - 1(n - 1) \n\n(n + 1)2 = u~. \n\nTherefore Wn is asymptotically normal and in fact: \n\nI \nP{wn < w) - cI> \n\n(w-An)1 \n\nUn \n\ncv'nTI \n\n< (n _ 1)1/42(n-l)/4 = O( V n/2n). \n\n.. ~ \n\n(8) \n\n(9) \n\n(10) \n\nIn contrast, if the edges of H n are randomly and independently oriented with prob(cid:173)\nability .5, then the distribution of the number of vertices having all their adjacent \nedges oriented inward is asymptotically Poisson with mean 1. \n\n\f730 \n\nBaldi, Rinott and Stein \n\nReferences \n\nP. Baldi, Y. Rinott (1989), \"Asymptotic Normality of Some Graph-Related Statis(cid:173)\ntics,\" Journal of Applied Probability, 26, 171-175. \nP. Baldi and Y. Rinott (1989), \"On Normal Approximation of Distribution in Terms \nof Dependency Graphs,\" Annals of Probability, in press. \n\nP. Baldi, Y. Rinott and C. Stein (1989), \"A Normal Approximation for the Number \nof Local Maxima of a Random Function on a Graph,\" In: Probability, Statistics and \nMathematics: Papers in Honor of Samuel Karlin. T.W. Anderson, K.B . Athreya \nand D.L. Iglehard, Editors, Academic Press. \nB. Derrida (1981), \"Random Energy Model: An Exactly Solvable Model of Disor(cid:173)\ndered Systems,\" Physics Review, B24, 2613- 2626. \nC. M. Macken and A. S. Perelson (1989), \"Protein Evolution on Rugged Land(cid:173)\nscapes\", PNAS, 86, 6191-6195. \nC. Stein (1986), \"Approximate Computation of Expectations,\" Institute of Mathe(cid:173)\nmatical Statistics Lecture Notes, S.S. Gupta Series Editor, Volume 7. \n\n\fOn the Distribution of the Number of Local Minima \n\n731 \n\n10 \n\n5 \n\n2.. \n\n.... \n\n8 \n\nI \n\n12. \n\n16 \n\"-._--_. \n\n15 \n\nIt \n\n14 \n\n3 .. \n\n4 ... \n\n.9 \n\n13 \n\nFigure 1: \n\nA ranking of a 4 x 4 square lattice with periodic boundary conditions \nand four local minima (d = 4). \n\n10 \n\n5 \n\n2 \n\n'2 \n\n..., \n\n8 \n\nI \n\n16 \n\n--\n'5 \n\n6 \n\n1\\ \n\n-\n\nU, \n\n3 .. \u2022 \n\n~ \n\"\" \n\n9 \n\n13 \n\nFigure 2: \n\nA ranking of a 4 x 4 square lattice. The neighbors of a vertex are all \nthe points on the same row and column. There are three local minima \n(d = 6). \n\n\f732 \n\nBaldi, Rinott and Stein \n\n8 \n, .\u2022 1 \n. \n\" I \n\" \nI \n/ / \nI \nI \n\n/ \n\n~/ \n\n2. \n\n51 \n\nI\u00b7 -\n\n/ \n\n/ \n\n/ \n\n/ \n\n/ \n\n6 \n\n.., \n\n4 \n\nFigure 3: \n\nA ranking of H3 with two local minima (d = 3), \n\n\f", "award": [], "sourceid": 274, "authors": [{"given_name": "Pierre", "family_name": "Baldi", "institution": null}, {"given_name": "Yosef", "family_name": "Rinott", "institution": null}, {"given_name": "Charles", "family_name": "Stein", "institution": null}]}