{"title": "When will a Genetic Algorithm Outperform Hill Climbing", "book": "Advances in Neural Information Processing Systems", "page_first": 51, "page_last": 58, "abstract": null, "full_text": "When Will a Genetic Algorithm \n\nOutperform Hill Climbing? \n\nMelanie Mitchell \nSanta Fe Institute \n\n1660 Old Pecos Trail, Suite A \n\nSanta Fe, NM 87501 \n\nJohn H. HoUand \nDept. of Psychology \nUniversity of Michigan \nAnn Arbor, MI 48109 \n\nStephanie Forrest \n\nDept. of Computer Science \nUniversity of New Mexico \nAlbuquerque, NM 87131 \n\nAbstract \n\nWe analyze a simple hill-climbing algorithm (RMHC) that was pre(cid:173)\nviously shown to outperform a genetic algorithm (GA) on a simple \n\"Royal Road\" function. We then analyze an \"idealized\" genetic \nalgorithm (IGA) that is significantly faster than RMHC and that \ngives a lower bound for GA speed. We identify the features of the \nIGA that give rise to this speedup, and discuss how these features \ncan be incorporated into a real GA. \n\n1 \n\nINTRODUCTION \n\nOur goal is to understand the class of problems for which genetic algorithms (GA) \nare most suited, and in particular, for which they will outperform other search \nalgorithms. Several studies have empirically compared GAs with other search and \noptimization methods such as simple hill-climbing (e.g., Davis, 1991), simulated \nannealing (e.g., Ingber & Rosen, 1992), linear, nonlinear, and integer programming \ntechniques, and other traditional optimization techniques (e.g., De Jong, 1975). \nHowever, such comparisons typically compare one version of the GA with a second \nalgorithm on a single problem or set of problems, often using performance criteria \nwhich may not be appropriate. These comparisons typically do not identify the \nfeatures that led to better performance by one or the other algorithm, making it \nhard to distill general principles from these isolated results. In this paper we look in \ndepth at one simple hill-climbing method and an idealized form of the GA, in order \nto identify some general principles about when and why a GA will outperform hill \nclimbing. \n\n51 \n\n\f52 \n\nMitchell, Holland, and Forrest \n\n81 = 11111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... j C1 =8 \n82 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 ...... j C2 = 8 \n83 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... j C3 =8 \n84 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... ; C4 =8 \n85 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 ........ ; Cs = 8 \n86 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... ; C6 =8 \n87 = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7 ....... ; C7 = 8 \n8S = ...................................................... \u00b7\u00b711111111; Cs = 8 \n8~t=1111111111111111111111111111111111111111111111111111111111111111 \n\nFigure 1: Royal Road function Rl. \n\nIn previous work we have developed a class of fitness landscapes (the \"Royal Road\" \nfunctions; Mitchell, Forrest, & Holland, 1992; Forrest & Mitchell, 1993) designed to \nbe the simplest class containing the features that are most relevant to the perfor(cid:173)\nmance of the GA. One of our purposes in developing these landscapes is to carry \nout systematic comparisons with other search methods. \nA simple Royal Road function, Rl, is shown in Figure 1. Rl consists of a list of \npartially specified bit strings (schemas) Si in which '*' denotes a wild card (either \no or 1). Each schema 8, is given with a coefficient Ci. The order of a schema is \nthe number of defined (non-'*') bits. A bit string x is said to be an instance of a \nschema 8, x E 8, if x matches s in the defined positions. The fitness Rl(X) of a bit \nstring x is defined as follows: \n~ \n, \n\n{I if x E Si \nRl(X) = ~ CiOi(X), where o,(x) = 0 otherwise. \n\nFor example, if x is an instance of exactly two of the order-8 schemas, Rl (x) = 16. \nLikewise, Rl (111 ... 1) = 64. \nThe Building Block Hypothesis (Holland, 1975/1992) states that the GA works well \nw hen instances of low-order, short schemas (\"building blocks\") that confer high fit(cid:173)\nness can be recombined to form instances of larger schemas that confer even higher \nfitness. Given this hypothesis, we initially expected that the building-block struc(cid:173)\nture of Rl would layout a \"royal road\" for the GA to follow to the optimal string. \nWe also expected that simple hill-climbing schemes would perform poorly since a \nlarge number of bit positions must be optimized simultaneously in order to move \nfrom an instance of a lower-order schema (e.g., 11111111** ... *) to an instance of a \nhigher-order intermediate schema (e.g., 11111111*****\u00b7*\u00b711111111** ... *). How(cid:173)\never both these expectations were overturned (Forrest & Mitchell, 1993). In our \nexperiments, a simple GA (using fitness-proportionate selection with sigma scaling, \nsingle-point crossover, and point mutation) optimized Rl quite slowly, at least in \npart because of \"hitchhiking\": once an instance of a higher-order schema is discov(cid:173)\nered, its high fitness allows the schema to spread quickly in the population, with Os \nin other positions in the string hitchhiking along with the Is in the schema's defined \npositions. This slows down the discovery of schemas in the other positions, espe(cid:173)\ncially those that are close to the highly fit schema's defined positions. Hitchhiking \ncan in general be a serious bottleneck for the GA, and we observed similar effects \n\n\fWhen Will a Genetic Algorithm Outperform Hill Climbing? \n\nS3 \n\nTable 1: Mean and median number of function evaluations to find the optimum \nstring over 200 runs of the GA and of various hill-climbing algorithms on R1. The \nstandard error is given in parentheses. \n\nin several variations of our original GA. \nOur other expectation-that the GA would outperform simple hill-climbing on \nthese functions-was also proved wrong. Forrest and Mitchell (1993) compared \nthe GA's performance on a variation of Rl with three different hill-climbing meth(cid:173)\nods: steepest ascent hill-climbing (SAHC), next-ascent hill-climbing (NAHC), and a \nzero-temperature Monte Carlo method, which Forrest and Mitchell called ''random \nmutation hill-climbing\" (RMHC). In RMHC, a string is chosen at random and its \nfitness is evaluated. The string is then mutated at a randomly chosen single locus, \nand the new fitness is evaluated. If the mutation leads to an equal or higher fitness, \nthe new string replaces the old string. This procedure is iterated until the optimum \nhas been found or a maximum number of function evaluations has been performed. \nHere we have repeated these experiments for R1. The results (similar to those given \nfor R2 in Forrest & Mitchell, 1993) are given in Table 1. We compare the mean \nand median number of function evaluations to find the optimum string rather than \nmean and median absolute run time, because in almost all GA applications (e.g., \nevolving neural-network architectures), the time to perform a function evaluation \nvastly dominates the time required to execute other parts of the algorithm. For this \nreason, we consider all parts of the algorithm excluding the function evaluations to \ntake negligible time. \nThe results on SAHC and NAHC were as expected-while the GA found the opti(cid:173)\nmum on RI in an average of 61,334 function evaluations, neither SAHC nor NAHC \never found the optimum within the maximum of 256,000 function evaluations. How(cid:173)\never, RMH C found the optimum on Rl in an average of 6179 function evaluations(cid:173)\nnearly a factor often faster than the GA. This striking difference on landscapes orig(cid:173)\ninally designed to be \"royal roads\" for the GA underscores the need for a rigorous \nanswer to the question posed earlier: \"Under what conditions will a GA outperform \nother search algorithms, such as hill climbing?\" \n\n2 ANALYSIS OF RMHC AND AN IDEALIZED GA \n\nTo begin to answer this question, we analyzed the RMHC algorithm with respect to \nR 1 \u2022 Suppose the fitness function c,onsists of N adjacent blocks of K Is each (in RI, \nN = 8 and K = 8). What is the expected time (number of function evaluations) \nE(K, N) to find the optimum string of allIs? We can first ask a simpler question: \nwhat is the expected time E(K, 1) to find a single block of K Is? A Markov-chain \nanalysis (not given here) yields E(K, 1) slightly larger than 2K , converging slowly \nto 2K from above as K -+ 00 (Richard Palmer, personal communication). For \n\n\f54 \n\nMitchell, Holland, and Forrest \n\nexample, for K = 8, E(K, 1) = 301.2. \nNow suppose we want RMHC to discover a string with N blocks of K Is. The \ntime to discover a first block of K Is is E(K, 1), but, once it has been found, the \ntime to discover a second block is longer, since many of the function evaluations are \n\"wasted\" on testing mutations inside the first block. The proportion of non-wasted \nmutations is (K N - K) / K N; this is the proportion of mutations that occur in the \nKN - K positions outside the first block. The expected time E(K, 2) to find a \nsecond block is E(K, 1) + E(K, l)[KN/(KN - K)]. Similarly, the total expected \ntime is: \n\nE(K,N) = \n\nE(K, 1) + E(K, 1) N _ 1 + ... + E(K, 1) N _ (N _ 1) \n\nN \n\nN \n\n1] \nE(K,l)N 1 + \"2 + 3 + ... + N \n\n[ 1 1 \n\n. \n\n(1) \n\n(The actual value may be a bit larger, since E(K,l) is the expected time to the first \nblock, whereas E(K, N) depends on the worst time for the N blocks.) Expression \n(1) is approximately E(K, l)N(logN + r), where r is Euler's constant. For K = \n8, N = 8, the value of expression (1) is 6549. When we ran RMHC on the Rl \nfunction 200 times, the average number of function evaluations to the optimum was \n6179, which agrees reasonably well with the expected value. \nCould a GA ever do better than this? There are three reasons why we might expect \na GA to perform well on Rl. First, at least theoretically the GA is fast because \nof implicit parallelism (Holland, 1975/1992): each string in the population is an \ninstance of many different schemas, and if the population is large enough and is \ninitially chosen at random, a large number of different schemas-many more than \nthe number of strings in the population-are being sampled in parallel. This should \nresult in a quick search for short, low-order schemas that confer high fitness. Second, \nfitness-proportionate reproduction under the GA should conserve instances of such \nschemas. Third, a high crossover rate should quickly combine instances oflow-order \nschemas on different strings to create instances of longer schemas that confer even \nhigher fitness. Our previous experiments (Forrest & Mitchell, 1993) showed that \nthe simple GA departed from this \"in principle\" behavior. One major impediment \nwas hitchhiking, which limited implicit parallelism by fixing certain schema regions \nsub optimally. But if the GA worked exactly as described above, how quickly could \nit find the optimal string of Rl? \n\nTo answer this question we consider an \"idealized genetic algorithm\" (IGA) that \nexplicitly has the features described above. The IGA knows ahead of time what the \ndesired schemas are, and a \"function evaluation\" is the determination of whether a \ngiven string contains one or more of them. In the IGA, at each time step a single \nstring is chosen at random, with uniform probability for each bit. The string is \n\"evaluated\" by determining whether it is an instance of one or more of the desired \nschemas. The first time such a string is found, it is sequestered. At each subsequent \ndiscovery of an instance of one or more not-yet-discovered schemas the new string \nis instantaneously crossed over with the sequestered string so that the sequestered \nstring contains all the desired schemas that have been discovered so far. \nThis procedure is unusable in practice, since it requires knowing a priori which \nschemas are relevant, whereas in general an algorithm such as the GA or RMHC \n\n\fWhen Will a Genetic Algorithm Outperform Hill Climbing? \n\n55 \n\ndirectly measures the fitness of a string, and does not know ahead of time which \nschemas contribute to high fitness. However, the idea behind the GA is to do \nimplicitly what the IGA is able to do explicitly. This idea will be elaborated below. \nSuppose again that our desired schemas consist of N blocks of K 1s each. What is \nthe expected time (number of function evaluations) until the saved string contains \nall the desired schemas? Solutions have been suggested by G. Huber (personal com(cid:173)\nmunication), and A. Shevoroskin (personal communication), and a detailed solution \nis given in (Holland, 1993). The main idea is to note that the probability of finding \na single desired block 8 on a random string is p = 1/2K, and the probability of \n(1 - p)t. Then the probability PN(t) that all N blocks \nfinding s by time t is 1 -\nhave been found by time tis: \n\nPN(t) = (1 - (1 - p)t)N, \n\nand the probability PN(t) that all N blocks are found at exactly time tis: \n\nPN(t) = [1- (1- p)t]N - [1- (1- p)t-l]N. \n\nThe expected time is then \n\n00 \n\nEN = 2:t ([1- (1- p)t]N - [1- (1- p)t-l]N). \n\n1 \n\nThis sum can be expanded and simplified, and with some work, along with the \napproximation (1- p)n ~ 1- np for small p, we obtain the following approximation: \n\nEN ~ (lip) I:; ~ 2K(logN + 1)\u00b7 \n\nN 1 \n\nn=l \n\nThe major point is that the IGA gives an expected time that is on the order of \n2K log N, where RMHC gives an expected time that is on the order of 2K N log N, \na factor of N slower. This kind of analysis can help us predict how and when the \nG A will outperform hill climbing. \nWhat makes the IGA faster than RMHC? A primary reason is that the IGA per(cid:173)\nfectly implements implicit parallelism: each new string is completely independent \nof the previous one, so new samples are given independently to each schema region. \nIn contrast, RMHC moves in the space of strings by single-bit mutations from an \noriginal string, so each new sample has all but one of the same bits as the previ(cid:173)\nous sample. Thus each new string gives a new sample to only one schema region. \nThe IGA spends more time than RMHC constructing new samples, but since we \nare counting only function evaluations, we ignore the construction time. The IGA \n\"cheats\" on each function evaluation, since it knows exactly the desired schemas, \nbut in this way it gives a lower bound on the number of function evaluations that \nthe GA will need on this problem. \nIndependent sampling allows for a speed-up in the IGA in two ways: it allows for \nthe possibility of more than one desirable schema appearing simultaneously on a \ngiven sample, and it also means that there are no wasted samples as there are \nin RMHC. Although the comparison we have made is with RMHC, the IGA will \nalso be significantly faster on Rl (and similar landscapes) than any hill-climbing \n\n\f56 \n\nMitchell, Holland, and Forrest \n\n83 8, \n\n85 8S \n\nLevell: 81 82 \nLevel 2: (81 82) (83 8,) (85 8S) (81 8a) (89 810) (811 812) (813 81') (815 81S) \nLevel 3: (81 82 \n815 81S) \nLevel 4: (81 82 83 8, \n815 81S) \n\n811 812) (813 8H \n811 812 \n813 8H \n\n81 8a) (89 810 \n81 8a) (89 810 \n\n83 8,) (85 8S \n85 8S \n\n813 8H \n\n815 81S \n\n81 8a \n\n89 810 \n\n811 812 \n\nFigure 2: Royal Road Function R4. \n\nmethod that works by mutating single bits (or a small number of bits) to obtain \nnew samples. \nThe hitchhiking effects described earlier also result in a loss of independent samples \nfor the real GA. The goal is to have the real GA, as much as possible, approximate \nthe IGA. Of course, the IGA works because it explicitly knows what the desired \nschemas are; the real GA does not have this information and can only estimate \nwhat the desired schemas are by an implicit sampling procedure. But it is possible \nfor the real GA to approximate a number of the features of the IGA. Independent \nsamples: The population size has to be large enough, the selection process has to \nbe slow enough, and the mutation rate has to be sufficient to make sure that no \nsingle locus is fixed at a single value in every (or even a large majority) of strings in \nthe population. Sequestering desired schemas: Selection has to be strong enough to \npreserve desired schemas that have been discovered, but it also has to be slow enough \n(or, equivalently, the relative fitness of the non-overlapping desirable schemas has \nto be small enough) to prevent significant hitchhiking on some highly fit schemas, \nwhich can crowd out desired schemas in other parts of the string. Instantaneous \ncrossover: The crossover rate has to be such that the time for a crossover to occur \nthat combines two desired schemas is small with respect to the discovery time for \nthe desired schemas. Speed-up over RMHC: The string length (a function of N) has \nto be large enough to make the N speed-up factor significant. \nThese mechanisms are not all mutually compatible (e.g., high mutation works \nagainst sequestering schemas), and thus must be carefully balanced against one \nanother. A discussion of how such a balance might be achieved is given in Holland \n(1993). \n\n3 RESULTS OF EXPERIMENTS \n\nAs a first step in exploring these balances, we designed R3, a variant of our previous \nfunction R2 (Forrest & Mitchell, 1993), based on some of the features described \nabove. In R3 the desired schemas are 81-88 (shown in Fig. 1) and combinations \nof them, just as in R2. However, in R3 the lowest-level order-8 schemas are each \nseparated by \"introns\" (bit positions that do not contribute to fitness-see Forrest \n& Mitchell, 1993; Levenick, 1991) of length 24. \nIn R3, a string that is not an instance of any desired schema receives fitness 1.0. \nEvery time a new level is reached-i.e., a string is found that is an instance of one \nor more schemas at that level-a small increment u is added to the fitness. Thus \nstrings at level 1 (that are instances of at least one level-l schema) have fitness \n1 + u, strings at level 2 have fitness 1 + 2u, etc. For our experiments we set u = 0.2. \n\n\fWhen Will a Genetic Algorithm Outperfonn Hill Climbing? \n\n57 \n\nTable 2: R4: Mean function evaluations (over 37 runs) to attain each level for \nthe GA and for RMHC. In the GA runs, the number of function evaluations is \nsampled every 500 evaluations, so each value is actually an upper bound for an \ninterval of length 500. The standard errors are in parentheses. The percentage of \nruns which reached each level is shown next to the heading \"% runs.\" Only runs \nwhich successfully reached a given level were included in the function evaluation \ncalculations for that level. \n\nThe purpose of the introns was to help maintain independent samples in each schema \nposition by preventing linkage between schema positions. The independence of \nsamples was also helped by using a larger population (2000) and the much slower \nselection scheme given by the function. In preliminary experiments on R3 (not \nshown) hitchhiking in the GA was reduced significantly, and the population was \nable to maintain instances of all the lowest-level schemas throughout each run. \nNext, we studied R4 (illustrated in Figure 2). R4 is identical to R3, except that it \ndoes not have introns. Further, R4 is defined over 128-bit strings, thus doubling the \nsize of the problem. In preliminary runs on R4, we used a population size of 500, \na mutation rate of 0.005 (mutation always flips a bit), and multipoint crossover, \nwhere the number of crossover points for each pair of parents was selected from a \nPoisson distribution with mean 2.816. \n\nTable 2 gives the mean number of evaluations to reach levels 1, 2, and 3 (neither \nalgorithm reached level 4 within the maximum of 106 function evaluations). As \ncan be seen, the time to reach level one is comparable for the two algorithms, but \nthe GA is much faster at reaching levels 2 and 3. Further, the GA discovers level \n3 approximately twice as often as RMHC. As was said above, it is necessary to \nbalance the maintenance of independent samples with the sequestering of desired \nschemas. These preliminary results suggest that R4 does a better job of maintaining \nthis balance than the earlier Royal Road functions. Working out these balances in \ngreater detail is a topic of future work. \n\n4 CONCLUSION \n\nWe have presented analyses of two algorithms, RMHC and the IGA, and have used \nthe analyses to identify some general principles of when and how a genetic algorithm \nwill outperform hill climbing. We then presented some preliminary experimental \nresults comparing the GA and RMHC on a modified Royal Road landscape. These \nanalyses and results are a further step in achieving our original goals-to design the \nsimplest class of fitness landscapes that will distinguish the GA from other search \nmethods, and to characterize rigorously the general features of a fitness landscape \nthat make it suitable for a GA. \n\n\fS8 \n\nMitchell, Holland, and Forrest \n\nOur modified Royal Road landscape R4, like Rl, is not meant to be a realistic \nexample of a problem to which one might apply a GA. Rather, it is meant to be \nan idealized problem in which certain features most relevant to GAs are explicit, \nso that the GA's performance can be studied in detail. Our claim is that in order \nto understand how the GA works in general and where it will be most useful, we \nmust first understand how it works and where it will be most useful on simple yet \ncarefully designed landscapes such as these. The work reported here is a further \nstep in this direction. \n\nAcknowledgments \n\nWe thank R. Palmer for suggesting the RMHC algorithm and for sharing his careful \nanalysis with us, and G. Huber for his assistance on the analysis of the IGA. We \nalso thank E. Baum, L. Booker, T. Jones, and R. Riolo for helpful comments and \ndiscussions regarding this work. We gratefully acknowledge the support of the Santa \nFe Institute's Adaptive Computation Program, the Alfred P. Sloan Foundation \n(grant B1992-46), and the National Science Foundation (grants IRI-9157644 and \nIRI-9224912). \n\nReferences \n\nL. D. Davis (1991). Bit-climbing, representational bias, and test suite design. In R. \nK. Belew and L. B. Booker (eds.), Proceedings of the Fourth International Confer(cid:173)\nence on Genetic Algorithms, 18-23. San Mateo, CA: Morgan Kaufmann. \nK. A. De Jong (1975). An Analysis of the Behavior of a Class of Genetic Adaptive \nSystems. Unpublished doctoral dissertation. University of Michigan, Ann Arbor, \nMI. \nS. Forrest and M. Mitchell (1993). Relative building-block fitness and the building(cid:173)\nblock hypothesis. In D. Whitley (ed.), Foundations of Genetic Algorithms 2, 109-\n126. San Mateo, CA: Morgan Kaufmann. \nJ. H. Holland (1975/1992). Adaptation in Natural and Artificial Systems. Cam(cid:173)\nbridge, MA: MIT Press. (First edition 1975, Ann Arbor: University of Michigan \nPress.) \nJ. H. Holland (1993). Innovation in complex adaptive systems: Some mathematical \nsketches. Working Paper 93-10-062, Santa Fe Institute, Santa Fe, NM. \nL. Ingber and B. Rosen (1992). Genetic algorithms and very fast simulated rean(cid:173)\nnealing: A comparison. Mathematical Computer Modelling, 16 (11),87-100. \nJ. R. Levenick (1991). Inserting introns improves genetic algorithm success rate: \nTaking a cue from biology. In R. K. Belew and L. B. Booker (eds.), Proceedings of \nthe Fourth International Conference on Genetic Algorithms, 123-127. San Mateo, \nCA: Morgan Kaufmann. \nM. Mitchell, S. Forrest, and J. H. Holland (1992). The royal road for genetic algo(cid:173)\nrithms: Fitness landscapes and GA performance. In F. J. Varela and P. Bourgine \n(eds.), Proceedings of the First European Conference on Artificial Life, 245-254. \nCambridge, MA: MIT Press. \n\n\f", "award": [], "sourceid": 836, "authors": [{"given_name": "Melanie", "family_name": "Mitchell", "institution": null}, {"given_name": "John", "family_name": "Holland", "institution": null}, {"given_name": "Stephanie", "family_name": "Forrest", "institution": null}]}