{"title": "Stochastic Hillclimbing as a Baseline Method for Evaluating Genetic Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 430, "page_last": 436, "abstract": "", "full_text": "Stochastic Hillclimbing as a Baseline \n\nMethod for Evaluating Genetic \n\nAlgorithms \n\nAri Juels \n\nDepartment of Computer Science \nUniversity of California at Berkeley\u00b7 \n\nMartin Wattenberg \n\nDepartment of Mathematics \n\nUniversity of California at Berkeleyt \n\nAbstract \n\nWe investigate the effectiveness of stochastic hillclimbing as a baseline for \nevaluating the performance of genetic algorithms (GAs) as combinato(cid:173)\nrial function optimizers. In particular, we address two problems to which \nGAs have been applied in the literature: Koza's ll-multiplexer problem \nand the jobshop problem. We demonstrate that simple stochastic hill(cid:173)\nclimbing methods are able to achieve results comparable or superior to \nthose obtained by the GAs designed to address these two problems. We \nfurther illustrate, in the case of the jobshop problem, how insights ob(cid:173)\ntained in the formulation of a stochastic hillclimbing algorithm can lead \nto improvements in the encoding used by a GA. \n\n1 \n\nIntroduction \n\nGenetic algorithms (GAs) are a class of randomized optimization heuristics based \nloosely on the biological paradigm of natural selection. Among other proposed ap(cid:173)\nplications, they have been widely advocated in recent years as a general method \nfor obtaining approximate solutions to hard combinatorial optimization problems \nusing a minimum of information about the mathematical structure of these prob(cid:173)\nlems. By means of a general \"evolutionary\" strategy, GAs aim to maximize an \nobjective or fitness function 1 : 5 --t R over a combinatorial space 5, i.e., to find \nsome state s E 5 for which 1(s) is as large as possible. (The case in which 1 is to \nbe minimized is clearly symmetrical.) For a detailed description of the algorithm \nsee, for example, [7], which constitutes a standard text on the subject. \n\nIn this paper, we investigate the effectiveness of the GA in comparison with that \nof stochastic hillclimbing (SH), a probabilistic variant of hillclimbing. As the term \n\n\u00b7Supported in part by NSF Grant CCR-9505448. E-mail: juels@cs.berkeley.edu \nfE-mail: wattenbe@math.berkeley.edu \n\n\fStochastic Hillclimbing as a Baseline Method for Evaluating Genetic Algorithms \n\n431 \n\n\"hillclimbing\" suggests, if we view an optimization problem as a \"landscape\" in \nwhich each point corresponds to a solution s and the \"height\" of the point corre(cid:173)\nsponds to the fitness of the solution, f(s), then hillclimbing aims to ascend to a \npeak by repeatedly moving to an adjacent state with a higher fitness. \nA number of researchers in the G A community have already addressed the issue \nof how various versions of hillclimbing on the space of bitstrings, {O, l}n, compare \nwith GAs [1] [4] [9] [18] [15]. Our investigations in this paper differ in two important \nrespects from these previous ones. First, we address more sophisticated problems \nthan the majority of these studies, which make use of test functions developed for \nthe purpose of exploring certain landscape characteristics. Second, we consider hill(cid:173)\nclimbing algorithms based on operators in some way \"natural\" to the combinatorial \nstructures of the problems to which we are seeking solutions, very much as GA de(cid:173)\nsigners attempt to do. In one of the two problems in this paper, our SH algorithm \nemploys an encoding exactly identical to that in the proposed GA. Consequently, \nthe hillclimbing algorithms we consider operate on structures other than bitstrings. \n\nConstraints in space have required the omission of a great deal of material found \nin the full version of this paper. This material includes the treatment of two addi(cid:173)\ntional problems: the NP-complete Maximum Cut Problem [11] and an NP-complete \nproblem known as the multiprocessor document allocation problem (MDAP). Also \nin the full version of this paper is a substantially more thorough exposition of the \nmaterial presented here. The reader is encouraged to refer to [10], available on the \nWorld Wide Web at http://www.cs.berkeley.edu/,,-,juelsj. \n\n2 Stochastic Hillclimbing \n\nThe SH algorithm employed in this paper searches a discrete space S with the aim \nof finding a state whose fitness is as high (or as low) as possible. The algorithm \ndoes this by making successive improvements to some current state 0\" E S. As is \nthe case with genetic algorithms, the form of the states in S depends upon how the \ndesigner of the SH algorithm chooses to encode the solutions to the problems to be \nsolved: as bitstrings, permutations, or in some other form. The local improvements \neffected by the SH algorithm are determined by the neighborhood structure and the \nfitness function f imposed on S in the design of the algorithm. We can consider the \nneighborhood structure as an undirected graph G on vertex set S. The algorithm \nattempts to improve its current state 0\" by making a transition to one of the neigh(cid:173)\nbors of 0\" in G. In particular, the algorithm chooses a state T according to some \nsuitable probability distribution on the neighbors of 0\". If the fitness of T is as least \nas good as that of 0\" then T becomes the new current state, otherwise 0\" is retained. \nThis process is then repeated \n\n3 GP and J obshop \n\n3.1 The Experiments \n\nIn this section, we compare the performance of SH algorithms with that of GAs \nproposed for two problems: the jobshop problem and Koza's 11-multiplexer prob(cid:173)\nlem. We gauge the performance of the GA and SH algorithms according to the \nfitness of the best solution achieved after a fixed number of function evaluations, \nrather than the running time of the algorithms. This is because evaluation of the \nfitness function generally constitutes the most substantial portion of the execution \ntime of the optimization algorithm, and accords with standard practice in the GA \ncommunity. \n\n\f432 \n\nA. JUELS, M . WATIENBERG \n\n3.2 Genetic Programming \n\n\"Genetic programming\" (GP) is a method of enabling a genetic algorithm to search \na potentially infinite space of computer programs, rather than a space of fixed(cid:173)\nlength solutions to a combinatorial optimization problem. These programs take the \nform of Lisp symbolic expressions, called S-expressions. The S-expressions in GP \ncorrespond to programs which a user seeks to adapt to perform some pre-specified \ntask. Details on GP, an increasingly common GA application, and on the 11-\nmultiplexer problem which we address in this section, may be found, for example, \nin [13J [12J [14J. \nThe boolean 11-multiplexer problem entails the generation of a program to per(cid:173)\nform the following task. A set of 11 distinct inputs is provided, with la(cid:173)\nbels ao, aI, a2, do, dl\n, ... , d7, where a stands for \"address\" and d for \"data\". Each \ninput takes the value 0 or 1. The task is to output the value dm , where m = \nao + 2al + 4a2. In other words, for any 11-bit string, the input to the \"address\" \nvariables is to be interpreted as an index to a specific \"data\" variable, which the \nprogram then yields as output. For example, on input al = 1, ao = a2 = 0, \nand d2 = l,do = dl = d3 = ... = d7 = 0, a correct program will output a '1', since \nthe input to the 'a' variables specifies address 2, and variable d2 is given input 1. \n\nThe GA Koza's GP involves the use of a GA to generate an S-expression corre(cid:173)\nsponding to a correct ll-multiplexer program. An S-expression comprises a tree of \nLISP operators and operands, operands being the set of data to be processed -\nthe \nleaves of the tree -\nand operators being the functions applied to these data and \ninternally in the tree. The nature of the operators and operands will depend on the \nproblem at hand, since different problems will involve different sets of inputs and \nwill require different functions to be applied to these inputs. For the ll-multiplexer \nproblem in particular, where the goal is to create a specific boolean function, the \noperands are the input bits ao, al, a2, do, dl , ... , d7, and the operators are AND, \nOR, NOT, and IF. These operators behave as expected: the subtree (AND al a2), \nfor instance, yields the value al A a2. The subtree (IF al d4 d3) yields the value d4 \nif al = 0 and d3 if al = 1 (and thus can be regarded as a \"3-multiplexer\"). NOT \nand OR work similarly. An S-expression constitutes a tree of such operators, with \noperands at the leaves. Given an assignment to the operands, this tree is evaluated \nfrom bottom to top in the obvious way, yielding a 0 or 1 output at the root. \nKoza makes use of a \"mating\" operation in his GA which swaps subexpressions \nbetween two such S-expressions. The sub expressions to be swapped are chosen \nuniformly at random from the set of all subexpressions in the tree. For details \non selection in this GA, see [13J. The fitness of an S-expression is computed by \nevaluating it on all 2048 possible inputs, and counting the number of correct outputs. \nKoza does not employ a mutation operator in his GA. \n\nThe SH Algorithm For this problem, the initial state in the SH algorithm is \nan S-expression consisting of a single operand chosen uniformly at random from \n{ ao, al, a2, do, ... , d7}. A transition in the search space involves the random replace(cid:173)\nment of an arbitrary node in the S-expression. In particular, to select a neighboring \nstate, we chose a node uniformly at random from the current tree and replace it \nwith a node selected randomly from the set of all possible operands and operators. \nWith probability ~ the replacement node is drawn uniformly at random from the \nset of operands {ao, al, a2, do, ... , d7}, otherwise it is drawn uniformly at random \nfrom the set of operators, {AND, OR, NOT, IF}. In modifying the nodes of the \nS-expression in this way, we may change the number of inputs they require. By \nchanging an AND node to a NOT node, for instance, we reduce the number of in(cid:173)\nputs taken by the node from 2 to 1. In order to accommodate such changes, we do \n\n\fStochastic Hillclimbing as a Baseline Method for Evaluating Genetic Algorithms \n\n433 \n\nthe following. Where a replacement reduces the number of inputs taken by a node, \nwe remove the required number of children from that node uniformly at random. \nWhere, on the other hand, a replacement increases the number of inputs taken by a \nnode, we add the required number of children chosen uniformly at random from the \nset of operands {ao, at, a2, do, ... , d7}. A similar, though somewhat more involved \napproach of this kind, with additional experimentation using simulated annealing, \nmay be found in [17]. \n\nExperimental Results \nIn the implementation described in [14], Koza performs \nexperiments with a GA on a pool of 4000 expressions. He records the results \nof 54 runs. These results are listed in the table below. The average number of \nfunction evaluations required to obtain a correct program is not given in [14]. In \n[12], however, where Koza performs a series of 21 runs with a slightly different \nselection scheme, he finds that the average number of function evaluations required \nto find a correct S-expression is 46,667. \nIn 100 runs of the SH algorithm, we found that the average time required to obtain a \ncorrect S-expression was 19,234.90 function evaluations, with a standard deviation \nof 5179.45. The minimum time to find a correct expression in these runs was \n3733, and the maximum, 73,651. The average number of nodes in the correct S(cid:173)\nexpression found by the SH algorithm was 88.14; the low was 42, the high, 242, and \nthe standard deviation, 29.16. \n\nThe following table compares the results presented in [14], indicated by the heading \n\"GP\", with those obtained using stochastic hillclimbing, indicated by \"SH\". We \ngive the fraction of runs in which a correct program was found after a given number \nof function evaluations. (As this fraction was not provided for the 20000 iteration \nmark in [14], we omit the corresponding entry.) \n\nI Functionevaluations II GP \n\n20000 \n40000 \n60000 \n80000 \n\n28 % \n78 % \n90 % \n\nSH \n61 % \n98 % \n99 % \n100% \n\nWe observe that the performance of the SH is substantially better than that of the \nGA. It is interesting to note - perhaps partly in explanation of the SH algorithm's \nsuccess on this problem - that the SH algorithm formulated here defines a neigh(cid:173)\nborhood structure in which there are no strict local minima. Remarkably, this is \ntrue for any boolean formula. For details, as well as an elementary proof, see the \nfull version of this paper [10]. \n\n3.3 Jobshop \n\nJobshop is a notoriously difficult NP-complete problem [6] that is hard to solve \neven for small instances. In this problem, a collection of J jobs are to be scheduled \non M machines (or processors), each of which can process only one task at a time. \nEach job is a list of M tasks which must be performed in order. Each task must \nbe performed on a specific machine, and no two tasks in a given job are assigned to \nthe same machine. Every task has a fixed (integer) processing time. The problem is \nto schedule the jobs on the machines so that all jobs are completed in the shortest \noverall time. This time is referred to as the makespan. \nThree instances formulated in [16] constitute a standard benchmark for this prob(cid:173)\nlem: a 6 job, 6 machine instance, a 10 job, 10 machine instance, and a 20 job, 5 \n\n\f434 \n\nA. JUELS. M. WA TIENBERG \n\nmachine instance. The 6x6 instance is now known to have an optimal makespan of \n55. This is very easy to achieve. While the optimum value for the 10x10 problem \nis known to be 930, this is a difficult problem which remained unsolved for over 20 \nyears [2]. A great deal of research has also been invested in the similarly challenging \n20x5 problem, for which an optimal value of 1165 has been achieved, and a lower \nbound of 1164 [3]. \nA number of papers have considered the application of GAs to scheduling problems. \nWe compare our results with those obtained in Fang et al. [5], one of the more recent \nof these articles. \n\nThe GA Fang et al. encode a jobshop schedule in the form of a string of in(cid:173)\ntegers, to which their GA applies a conventional crossover operator. This string \ncontains JM integers at, a2,' .. , aJM in the range 1..J. A circular list C of jobs, \ninitialized to (1,2, ... , J) is maintained. For i = 1,2, ... , JM, the first uncompleted \ntask in the (ai mod ICl)th job in C is scheduled in the earliest plausible timeslot. \nA plausible timeslot is one which comes after the last scheduled task in the current \njob, and which is at least as long as the processing time of the task to be scheduled. \nWhen a job is complete, it is removed from C. Fang et al. also develop a highly \nspecialized GA for this problem in which they use a scheme of increasing mutation \nrates and a technique known as GVOT (Gene-Variance based Operator Targeting). \nFor the details see [5]. \n\nThe SH Algorithm In our SH algorithm for this problem, a schedule is encoded \nin the form of an ordering U1,U2, ... ,UJM of JM markers. These markers have \ncolors associated with them: there are exactly M markers of each color of 1, ... , J. \nTo construct a schedule, U is read from left to right. Whenever a marker with color \nk is encountered, the next uncompleted task in job k is scheduled in the earliest \nplausible timeslot. Since there are exactly M markers of each color, and since every \njob contains exactly M tasks, this decoding of U yields a complete schedule. Observe \nthat since markers of the same color are interchangeable, many different ordering U \nwill correspond to the same scheduling of tasks. \n\nTo generate a neighbor of U in this algorithm, a marker Ui is selected uniformly at \nrandom and moved to a new position j chosen uniformly at random. To achieve \nthis, it is necessary to shift the subsequence of markers between Ui and (J'j (including \nUj) one position in the appropriate direction. Ifi < j, then Ui+1,Ui+2, ... ,(J'j are \nshifted one position to the left in u. If i > j, then (J'j, (J'j+l, ... ,Ui-1 are shifted one \nposition to the right. (If i = j, then the generated neighbor is of course identical to \nu.) For an example, see the full version of this paper [10] . \n\nFang et al. consider the makespan achieved after 300 iterations of their GVOT(cid:173)\nbased GA on a population of size 500. We compare this with an SH for which each \nexperiment involves 150,000 iterations. In both cases therefore, a single execution \nof the algorithm involves a total of 150,000 function evaluations. Fang et al. present \ntheir average results over 10 trials, but do not indicate how they obtain their \"best\". \nWe present the statistics resulting from 100 executions of the SH algorithm. \n\nIIIOXIO Jobshop \nGA I SH \n977 \n\n966.96 \n13.15 \n997 \n938 \n930 \n\n949 \n\nII 20x5 Jobshop II \n\n1215 \n\n1189 \n\n1202.40 \n12.92 \n1288 \n1173 \n\n1165 \n\nMean \nSD \nHigh \nLow \nBest Known \n\n\fStochastic Hillclimbing as a Baseline Method for Evaluating Genetic Algorithms \n\n435 \n\nAs can be seen from the above table, the performance of the SH algorithm appears \nto be as good as or superior to that of the GA. \n\n3.4 A New Jobshop GA \n\nIn this section, we reconsider the jobshop problem in an attempt to formulate a \nnew GA encoding. We use the same encoding as in the SH algorithm described \nabove: (7 is an ordering (71, (72, \u2022.. , (7 J M of the J M markers, which can be used to \nconstruct a schedule as before. We treated markers of the same color as effectively \nequivalent in the SH algorithm. Now, however, the label of a marker (a unique \ninteger in {I, . . . ,J M}) will playa role. \nThe basic step in the crossover operator for this GA as applied to a pair (7, T) \nof orderings is as follows. A label i is chosen uniformly at random from the \nset {I, 2, ... , J M}. In (7, the marker with label i is moved to the position oc(cid:173)\ncupied by i in T; conversely, the marker with label i in T is moved to the position \noccupied by that marker in (7. In both cases, the necessary shifting is performed \nas before. Hence the idea is to move a single marker in (7 (and in T) to a new po(cid:173)\nsition as in the SH algorithm; instead of moving the marker to a random position, \nthough, we move it to the position occupied by that marker in T (and (7, respec(cid:173)\ntively). The full crossover operator picks two labels j ~ k uniformly at random \nfrom {I, 2, . .. , J M}, and performs this basic operation first for label j, then j + 1, \nand so forth, through k. The mutation operator in our GA performs exactly the \nsame operation as that used to generate a neighbor in the SH algorithm. A marker \n(7 i is chosen uniformly at random and moved to a new position j, chosen uniformly \nat random. The usual shifting operation is then performed. Observe how closely \nthe crossover and mutation operators in this GA for the jobshop problem are based \non those in the corresponding SH algorithm. \nOur GA includes, in order, the following phases: evaluation, elitist replacement, \nselection, crossover, and mutation. In the evaluation phase, the fitnesses of all \nmembers of the population are computed. Elitist replacement substitutes the fittest \npermutation from the evaluation phase of the previous iteration for the least fit \npermutation in the current population (except, of course, in the first iteration, \nin which there is no replacement). Because of its simplicity and its effectiveness \nin practice, we chose to use binary stochastic tournament selection (see [8] for \ndetails). The crossover step in our GA selects f pairs uniformly at random without \nreplacement from the population and applies the mating operator to each of these \na given permutation in a single iteration is binomial with parameter p = *. The \npairs independently with probability 0.6. The number of mutations performed on \n\npopulation in our GA is initialized by selecting every individual uniformly at random \nfrom Sn. \nWe execute this GA for 300 iterations on a population of size 500. Results of 100 \nexperiments performed with this GA are indicated in the following table by \"new \nGA\". For comparison, we again give the results obtained by the GA of Fang et al. \nand the SH algorithm described in this paper. \n\nII \n\nMean \nSD \nHigh \nLow \nBest Known \n\nIOxlO Jobshop \nnew GA GA SH \n956.22 \n8.69 \n976 \n937 \n\n965.64 \n10.56 \n996 \n949 \n\n977 \n\n949 \n930 \n\nII \nII \n\n20x5 Jobshop \nSH \n1204.89 \n12.92 \n1241 \n1183 \n\nnew GA GA \n1193.21 \n1215 \n7.38 \n1211 \n1174 \n\n1189 \n1165 \n\n\f436 \n\n4 Conclusion \n\nA. JUELS, M. WAllENBERG \n\nAs black-box algorithms, GAs are principally of interest in solving problems whose \ncombinatorial structure is not understood well enough for more direct, problem(cid:173)\nspecific techniques to be applied. As we have seen in regard to the two problems \npresented in this paper, stochastic hill climbing can offer a useful gauge of the per(cid:173)\nformance of the GA. In some cases it shows that a GA-based approach may not \nbe competitive with simpler methods; at others it offers insight into possible design \ndecisions for the G A such as the choice of encoding and the formulation of mating \nand mutation operators. In light of the results presented in this paper, we hope that \ndesigners of black-box algorithms will be encouraged to experiment with stochastic \nhillclimbing in the initial stages of the development of their algorithms. \n\nReferences \n[1] D. Ackley. A Connectionist Machine for Genetic Hillclimbing. Kluwer Academic \n\nPublishers, 1987. \n\n[2] D. Applegate and W. Cook. A computational study of the job-shop problem. ORSA \n\nJournal of Computing, 3(2), 1991. \n\n[3] J. Carlier and E. Pinson. An algorithm for solving the jobshop problem. Mngmnt. \n\nSci., 35:(2):164-176, 1989. \n\n[4] L. Davis. Bit-climbing, representational bias, and test suite design. In Belew and \n\nBooker, editors, ICGA-4, pages 18-23, 1991. \n\n[5] H. Fang, P. Ross, and D. Corne. A promising GA approach to job-shop scheduling, \nrescheduling, and open-shop scheduling problems. In Forrest, editor, ICGA-5, 1993. \n[6] M. Garey and D. Johnson. Computers and Intractability. W .H. Freeman and Co., \n\n1979. \n\n[7] D. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. \n\nAddison Wesley, 1989. \n\n[8] D. Goldberg and K. Deb. A comparative analysis of selection schemes used in GAs. \n\nIn FOGA-2, pages 69-93, 1991. \n\n[9] K. De Jong. An Analysis of the Behavior of a Class of Genetic Adaptive Systems. \n\nPhD thesis, University of Michigan, 1975. \n\n[10] A. Juels and M. Wattenberg. Stochastic hillclimbing as a baseline method for eval(cid:173)\n\nuating genetic algorithms. Technical Report CSD-94-834, UC Berkeley, CS Division, \n1994. \n\n[11] S. Khuri, T. Back, and J. Heitk6tter. An evolutionary approach to combinatorial \n\noptimization problems. In Procs. of CSC 1994, 1994. \n\n[12] J. Koza. FOGA, chapter A Hierarchical Approach to Learning the Boolean Multi(cid:173)\n\nplexer Function, pages 171-192. 1991. \n\n[13] J. Koza. Genetic Programming. MIT Press, Cambridge, MA, 1991. \n[14] J. Koza. The GP paradigm: Breeding computer programs. In Branko Soucek and \nthe IRIS Group, editors, Dynamic, Genetic, and Chaotic Prog., pages 203-221. John \nWiley and Sons, Inc., 1992. \n\n[15] M. Mitchell, J. Holland, and S. Forrest. When will a GA outperform hill-climbing? In \nJ.D. Cowen, G. Tesauro, and J. Alspector, editors, Advances in Neural Inf. Processing \nSystems 6, 1994. \n\n[16] J. Muth and G. Thompson. Industrial Scheduling. Prentice Hall, 1963. \n[17] U. O'Reilly and F. Oppacher. Program search with a hierarchical variable length \nrepresentation: Genetic programing, simulated annealing and hill climbing. In PPSN-\n3, 1994. \n\n[18] S. Wilson. GA-easy does not imply steepest-ascent optimizable. In Belew and Booker, \n\neditors, ICGA-4, pages 85-89, 1991. \n\n\f", "award": [], "sourceid": 1172, "authors": [{"given_name": "Ari", "family_name": "Juels", "institution": null}, {"given_name": "Martin", "family_name": "Wattenberg", "institution": null}]}