{"title": "When will a Genetic Algorithm Outperform Hill Climbing", "book": "Advances in Neural Information Processing Systems", "page_first": 51, "page_last": 58, "abstract": null, "full_text": "When Will a  Genetic Algorithm \n\nOutperform Hill Climbing? \n\nMelanie Mitchell \nSanta Fe  Institute \n\n1660  Old Pecos Trail, Suite A \n\nSanta Fe,  NM  87501 \n\nJohn H.  HoUand \nDept.  of Psychology \nUniversity of Michigan \nAnn Arbor,  MI 48109 \n\nStephanie Forrest \n\nDept.  of Computer Science \nUniversity of New  Mexico \nAlbuquerque,  NM  87131 \n\nAbstract \n\nWe analyze a simple hill-climbing algorithm (RMHC) that was pre(cid:173)\nviously shown to outperform a genetic algorithm (GA) on a simple \n\"Royal  Road\"  function.  We  then  analyze  an  \"idealized\"  genetic \nalgorithm (IGA)  that is  significantly faster  than RMHC  and that \ngives  a lower  bound for  GA speed.  We  identify the features  of the \nIGA that give rise  to this speedup,  and discuss  how  these features \ncan be incorporated into a real GA. \n\n1 \n\nINTRODUCTION \n\nOur goal  is  to understand the class of problems for which genetic  algorithms (GA) \nare  most  suited,  and  in  particular,  for  which  they  will  outperform  other  search \nalgorithms.  Several studies have empirically compared GAs  with other search  and \noptimization methods  such  as  simple  hill-climbing  (e.g.,  Davis,  1991),  simulated \nannealing (e.g., Ingber & Rosen,  1992), linear, nonlinear, and integer  programming \ntechniques,  and  other  traditional  optimization  techniques  (e.g.,  De  Jong,  1975). \nHowever,  such comparisons typically compare one version of the GA with a second \nalgorithm on a  single  problem or set of problems, often using performance criteria \nwhich  may  not  be  appropriate.  These  comparisons  typically  do  not  identify  the \nfeatures  that led  to better  performance  by  one  or the  other  algorithm,  making it \nhard to distill general principles from these isolated results.  In this paper we look in \ndepth at one simple hill-climbing method and an idealized form of the GA, in order \nto identify some general principles about when  and why  a GA will outperform hill \nclimbing. \n\n51 \n\n\f52 \n\nMitchell, Holland, and Forrest \n\n81  = 11111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... j  C1  =8 \n82  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 ...... j  C2  = 8 \n83  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... j  C3  =8 \n84  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... ; C4  =8 \n85  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 ........ ; Cs  = 8 \n86  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7 .......... ; C6  =8 \n87  = \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b711111111\u00b7 ....... ; C7  = 8 \n8S  = ...................................................... \u00b7\u00b711111111;  Cs  = 8 \n8~t=1111111111111111111111111111111111111111111111111111111111111111 \n\nFigure 1:  Royal Road function  Rl. \n\nIn previous work we  have developed a class of fitness  landscapes (the  \"Royal Road\" \nfunctions;  Mitchell, Forrest, & Holland, 1992; Forrest & Mitchell, 1993) designed to \nbe  the simplest  class  containing the features  that are  most  relevant  to the  perfor(cid:173)\nmance of the  GA.  One  of our  purposes  in developing  these  landscapes  is  to carry \nout systematic comparisons with other search methods. \nA  simple  Royal  Road  function,  Rl, is shown  in  Figure  1.  Rl  consists  of a  list  of \npartially specified  bit strings  (schemas)  Si  in which  '*' denotes  a  wild card (either \no or  1).  Each  schema 8,  is  given  with  a  coefficient  Ci.  The  order of a  schema is \nthe number of defined  (non-'*') bits.  A bit string  x  is  said to be  an  instance of a \nschema 8,  x  E 8,  if x  matches s  in the defined  positions.  The fitness  Rl(X) of a bit \nstring x  is defined  as follows: \n~ \n, \n\n{I  if x E Si \nRl(X) = ~ CiOi(X), where o,(x) =  0  otherwise. \n\nFor example, if x is  an instance of exactly two of the order-8 schemas, Rl (x) = 16. \nLikewise,  Rl (111 ... 1) = 64. \nThe Building Block Hypothesis (Holland, 1975/1992) states that the GA works well \nw hen instances of low-order, short schemas (\"building blocks\")  that confer high fit(cid:173)\nness  can be recombined to form instances of larger schemas that confer even higher \nfitness.  Given  this hypothesis,  we  initially expected  that the  building-block struc(cid:173)\nture of Rl  would layout a  \"royal road\"  for  the GA to follow  to the optimal string. \nWe  also expected  that simple hill-climbing schemes  would  perform  poorly since  a \nlarge  number of bit positions must be optimized simultaneously in  order  to  move \nfrom an instance of a lower-order schema (e.g.,  11111111** ... *)  to an instance of a \nhigher-order  intermediate schema  (e.g.,  11111111*****\u00b7*\u00b711111111** ... *).  How(cid:173)\never  both  these  expectations  were  overturned  (Forrest  & Mitchell,  1993).  In  our \nexperiments, a simple GA (using fitness-proportionate selection with sigma scaling, \nsingle-point  crossover,  and  point  mutation) optimized Rl  quite slowly,  at least  in \npart because of \"hitchhiking\":  once  an instance of a higher-order schema is discov(cid:173)\nered,  its high fitness  allows the schema to spread quickly in the population, with Os \nin other positions in the string hitchhiking along with the Is in the schema's defined \npositions.  This slows  down  the  discovery  of schemas  in  the  other  positions,  espe(cid:173)\ncially those that are close  to the highly fit  schema's defined  positions.  Hitchhiking \ncan in general  be  a serious  bottleneck for  the GA,  and we  observed  similar effects \n\n\fWhen Will a Genetic Algorithm Outperform Hill Climbing? \n\nS3 \n\nTable  1:  Mean  and  median  number of function  evaluations  to find  the  optimum \nstring over  200  runs of the GA and of various hill-climbing algorithms on R1.  The \nstandard error is given in  parentheses. \n\nin several  variations of our original GA. \nOur  other  expectation-that  the  GA  would  outperform  simple  hill-climbing  on \nthese  functions-was  also  proved  wrong.  Forrest  and  Mitchell  (1993)  compared \nthe GA's performance on a variation of Rl  with three  different  hill-climbing meth(cid:173)\nods:  steepest ascent hill-climbing (SAHC), next-ascent hill-climbing (NAHC), and a \nzero-temperature Monte Carlo method, which Forrest  and Mitchell called  ''random \nmutation hill-climbing\"  (RMHC).  In RMHC,  a string is chosen  at random and its \nfitness  is evaluated.  The string is then  mutated at a randomly chosen  single locus, \nand the new fitness  is evaluated.  If the mutation leads to an equal or higher fitness, \nthe new string replaces the old string.  This procedure is iterated until the optimum \nhas been found or a maximum number of function evaluations has been performed. \nHere we have repeated these experiments for  R1.  The results (similar to those given \nfor  R2  in  Forrest  & Mitchell,  1993)  are  given  in  Table  1.  We  compare  the  mean \nand median number of function evaluations to find  the optimum string rather than \nmean and median absolute run  time,  because  in  almost all  GA  applications  (e.g., \nevolving neural-network  architectures),  the  time to  perform a  function  evaluation \nvastly dominates the time required to execute other parts of the algorithm.  For this \nreason,  we  consider all parts of the algorithm excluding the function evaluations to \ntake negligible time. \nThe results on SAHC  and NAHC  were  as expected-while the GA found the opti(cid:173)\nmum on RI  in an average of 61,334 function evaluations, neither SAHC nor NAHC \never found the optimum within the maximum of 256,000 function evaluations.  How(cid:173)\never,  RMH C found the optimum on Rl in an average of 6179 function evaluations(cid:173)\nnearly a factor often faster than the GA. This striking difference on landscapes orig(cid:173)\ninally designed  to be  \"royal roads\"  for  the GA  underscores  the need  for  a rigorous \nanswer to the question posed earlier:  \"Under what conditions will a GA outperform \nother search  algorithms, such as hill climbing?\" \n\n2  ANALYSIS  OF RMHC  AND  AN IDEALIZED  GA \n\nTo begin to answer this question, we analyzed the RMHC algorithm with respect  to \nR 1 \u2022  Suppose the fitness  function c,onsists of N  adjacent blocks of K  Is each (in RI, \nN  = 8 and  K  = 8).  What is  the expected  time  (number of function  evaluations) \nE(K, N) to find  the optimum string of allIs?  We  can first  ask  a simpler question: \nwhat is the expected  time E(K, 1) to find  a single block of K  Is?  A Markov-chain \nanalysis (not given  here)  yields  E(K, 1)  slightly larger than 2K ,  converging slowly \nto  2K  from  above  as  K  -+  00  (Richard  Palmer,  personal  communication).  For \n\n\f54 \n\nMitchell, Holland, and Forrest \n\nexample, for  K  = 8,  E(K, 1) = 301.2. \nNow  suppose  we  want  RMHC  to  discover  a  string  with  N  blocks  of K  Is.  The \ntime to discover  a  first  block of K  Is is  E(K, 1),  but, once  it has been  found,  the \ntime to discover a second block is longer, since many of the function evaluations are \n\"wasted\"  on testing mutations inside the first  block.  The proportion of non-wasted \nmutations is (K N  - K) / K N; this is the proportion of mutations that occur in the \nKN - K  positions outside  the  first  block.  The  expected  time  E(K, 2)  to  find  a \nsecond  block  is  E(K, 1) + E(K, l)[KN/(KN - K)].  Similarly,  the  total expected \ntime is: \n\nE(K,N)  = \n\nE(K, 1) + E(K, 1) N _  1 + ... + E(K, 1) N  _  (N _  1) \n\nN \n\nN \n\n1] \nE(K,l)N  1 + \"2  + 3 + ... + N \n\n[  1 1 \n\n. \n\n(1) \n\n(The actual value may be a bit larger, since E(K,l) is  the expected time to the first \nblock,  whereas  E(K, N) depends on the worst  time for  the N  blocks.)  Expression \n(1)  is  approximately E(K, l)N(logN + r),  where  r  is  Euler's  constant.  For  K  = \n8, N  = 8,  the  value  of expression  (1)  is  6549.  When  we  ran  RMHC  on  the  Rl \nfunction 200 times, the average number of function evaluations to the optimum was \n6179, which  agrees reasonably well  with the expected value. \nCould a GA ever do better than this?  There are three reasons why we  might expect \na  GA  to perform  well  on  Rl.  First,  at  least  theoretically  the  GA  is  fast  because \nof implicit  parallelism  (Holland,  1975/1992):  each  string  in  the  population  is  an \ninstance  of many  different  schemas,  and  if the  population  is  large  enough  and  is \ninitially chosen  at random, a  large number of different  schemas-many more than \nthe number of strings in the population-are being sampled in parallel.  This should \nresult in a quick search for short, low-order schemas that confer high fitness.  Second, \nfitness-proportionate  reproduction under the GA should conserve  instances of such \nschemas.  Third, a high crossover rate should quickly combine instances oflow-order \nschemas on different  strings to create instances of longer schemas that confer even \nhigher  fitness.  Our  previous  experiments  (Forrest  &  Mitchell,  1993)  showed  that \nthe simple GA departed from this  \"in principle\"  behavior.  One major impediment \nwas hitchhiking, which limited implicit parallelism by fixing certain schema regions \nsub optimally.  But if the GA worked exactly as  described  above,  how  quickly could \nit find  the optimal string of Rl? \n\nTo answer  this  question  we  consider  an  \"idealized  genetic  algorithm\"  (IGA)  that \nexplicitly has the features described above.  The IGA knows ahead of time what the \ndesired schemas are, and a  \"function evaluation\" is the determination of whether  a \ngiven string contains one or more of them.  In the IGA,  at each  time step  a  single \nstring  is  chosen  at  random,  with  uniform  probability for  each  bit.  The  string  is \n\"evaluated\"  by determining whether it is  an instance of one or more of the  desired \nschemas.  The first  time such a string is found, it is sequestered.  At each subsequent \ndiscovery  of an instance of one or more not-yet-discovered  schemas the  new  string \nis instantaneously crossed  over with the sequestered string so  that the sequestered \nstring contains all the desired schemas that have been discovered  so far. \nThis  procedure  is  unusable  in  practice,  since  it  requires  knowing  a  priori which \nschemas  are  relevant,  whereas  in general  an algorithm such  as  the  GA  or  RMHC \n\n\fWhen Will a Genetic Algorithm Outperform Hill Climbing? \n\n55 \n\ndirectly  measures  the  fitness  of a  string,  and  does  not  know  ahead  of time which \nschemas  contribute  to  high  fitness.  However,  the  idea  behind  the  GA  is  to  do \nimplicitly what the IGA is able to do explicitly.  This idea will be elaborated below. \nSuppose again that our desired schemas consist of N  blocks of K  1s each.  What is \nthe expected  time (number of function evaluations) until the saved string contains \nall the desired schemas?  Solutions have been suggested by G. Huber (personal com(cid:173)\nmunication), and A. Shevoroskin (personal communication), and a detailed solution \nis given in (Holland, 1993).  The main idea is to note that the probability of finding \na  single  desired  block  8  on  a  random  string  is  p  =  1/2K,  and  the  probability  of \n(1  - p)t.  Then  the  probability PN(t)  that  all  N  blocks \nfinding  s  by  time t  is  1 -\nhave been found by time tis: \n\nPN(t) = (1  - (1  - p)t)N, \n\nand the probability PN(t) that all N  blocks are found  at exactly  time tis: \n\nPN(t) = [1- (1- p)t]N - [1- (1- p)t-l]N. \n\nThe expected  time is  then \n\n00 \n\nEN = 2:t ([1- (1- p)t]N - [1- (1- p)t-l]N). \n\n1 \n\nThis  sum  can  be  expanded  and  simplified,  and  with  some  work,  along  with  the \napproximation (1- p)n  ~ 1- np for small p, we obtain the following approximation: \n\nEN  ~ (lip) I:; ~ 2K(logN + 1)\u00b7 \n\nN  1 \n\nn=l \n\nThe  major point  is  that  the  IGA  gives  an  expected  time that  is  on  the  order  of \n2K log N, where  RMHC gives an expected  time that is  on the order of 2K N  log N, \na factor of N  slower.  This kind of analysis can help  us  predict  how  and when  the \nG A will outperform hill climbing. \nWhat makes the  IGA faster  than RMHC?  A  primary reason  is  that the  IGA per(cid:173)\nfectly  implements implicit parallelism:  each  new  string is  completely independent \nof the previous one, so new samples are given independently to each schema region. \nIn  contrast,  RMHC  moves  in the space  of strings by single-bit  mutations from  an \noriginal string,  so  each  new  sample has  all  but one  of the same bits as  the  previ(cid:173)\nous sample.  Thus each  new  string gives  a  new  sample to only one schema region. \nThe  IGA  spends  more  time than  RMHC  constructing  new  samples,  but since  we \nare  counting only function  evaluations, we  ignore the construction time.  The IGA \n\"cheats\"  on  each  function  evaluation,  since  it knows  exactly  the  desired  schemas, \nbut in  this way  it gives a  lower  bound on  the number of function evaluations that \nthe GA  will need  on this problem. \nIndependent  sampling allows for  a  speed-up  in  the IGA  in  two  ways:  it allows for \nthe  possibility of more  than  one  desirable  schema appearing  simultaneously on  a \ngiven  sample,  and  it  also  means  that  there  are  no  wasted  samples  as  there  are \nin  RMHC.  Although  the  comparison  we  have  made  is  with  RMHC,  the  IGA  will \nalso  be  significantly  faster  on  Rl  (and  similar landscapes)  than  any  hill-climbing \n\n\f56 \n\nMitchell, Holland, and Forrest \n\n83  8, \n\n85  8S \n\nLevell:  81  82 \nLevel  2:  (81  82)  (83  8,) (85  8S)  (81  8a)  (89  810)  (811  812)  (813  81')  (815  81S) \nLevel  3:  (81  82 \n815  81S) \nLevel  4:  (81  82  83  8, \n815  81S) \n\n811  812)  (813  8H \n811  812 \n813  8H \n\n81  8a)  (89  810 \n81  8a)  (89  810 \n\n83  8,)  (85  8S \n85  8S \n\n813  8H \n\n815  81S \n\n81  8a \n\n89  810 \n\n811  812 \n\nFigure  2:  Royal Road Function R4. \n\nmethod  that  works  by  mutating single  bits  (or  a  small number  of bits)  to  obtain \nnew samples. \nThe hitchhiking effects described earlier also result in a loss of independent samples \nfor  the real  GA. The goal is to have the real  GA,  as much as possible,  approximate \nthe  IGA.  Of course,  the  IGA  works  because  it explicitly  knows  what  the  desired \nschemas  are;  the real  GA  does  not  have  this  information  and  can  only  estimate \nwhat the desired schemas are  by  an implicit sampling procedure.  But it is possible \nfor  the real  GA to approximate a  number of the features  of the IGA.  Independent \nsamples:  The population size  has  to be  large enough,  the selection  process  has  to \nbe  slow  enough,  and  the  mutation rate  has  to  be  sufficient  to  make sure  that  no \nsingle locus is fixed  at a single value in every (or even a large majority) of strings in \nthe population.  Sequestering  desired schemas: Selection has to be strong enough to \npreserve desired schemas that have been discovered, but it also has to be slow enough \n(or,  equivalently,  the relative  fitness  of the  non-overlapping desirable schemas  has \nto be small enough)  to  prevent  significant hitchhiking on some highly fit  schemas, \nwhich  can  crowd  out  desired  schemas  in  other  parts of the  string.  Instantaneous \ncrossover:  The crossover  rate has to be such that the time for  a crossover  to occur \nthat combines two  desired  schemas is  small with respect  to the  discovery  time for \nthe desired schemas.  Speed-up  over RMHC: The string length (a function of N) has \nto be large enough to make the N  speed-up factor significant. \nThese  mechanisms  are  not  all  mutually  compatible  (e.g.,  high  mutation  works \nagainst  sequestering  schemas),  and  thus  must  be  carefully  balanced  against  one \nanother.  A discussion of how  such  a balance might be achieved is given in Holland \n(1993). \n\n3  RESULTS  OF EXPERIMENTS \n\nAs a first step in exploring these balances, we designed  R3, a variant of our previous \nfunction  R2  (Forrest  &  Mitchell,  1993),  based  on  some  of the  features  described \nabove.  In  R3  the  desired  schemas  are  81-88  (shown  in  Fig.  1)  and  combinations \nof them, just as  in  R2.  However,  in  R3  the  lowest-level order-8  schemas  are  each \nseparated by  \"introns\"  (bit positions that do not contribute to fitness-see  Forrest \n& Mitchell,  1993; Levenick,  1991) of length 24. \nIn  R3, a  string  that  is  not  an instance  of any  desired  schema receives  fitness  1.0. \nEvery  time a  new  level  is reached-i.e., a string is found that is  an instance of one \nor  more schemas  at that level-a small increment  u  is  added  to the fitness.  Thus \nstrings  at  level  1  (that  are  instances  of at  least  one  level-l  schema)  have  fitness \n1 + u, strings at level  2 have fitness  1 + 2u, etc.  For our experiments we set u = 0.2. \n\n\fWhen Will a Genetic Algorithm Outperfonn Hill Climbing? \n\n57 \n\nTable  2:  R4:  Mean  function  evaluations  (over  37  runs)  to  attain  each  level  for \nthe  GA  and  for  RMHC.  In  the  GA  runs,  the  number  of function  evaluations  is \nsampled  every  500  evaluations,  so  each  value  is  actually  an  upper  bound  for  an \ninterval of length  500.  The standard errors  are  in parentheses.  The percentage  of \nruns  which  reached  each  level  is shown  next  to the  heading  \"%  runs.\"  Only  runs \nwhich  successfully  reached  a  given  level  were  included  in the function  evaluation \ncalculations for  that level. \n\nThe purpose of the introns was to help maintain independent samples in each schema \nposition  by  preventing  linkage  between  schema  positions.  The  independence  of \nsamples was  also helped  by  using  a  larger  population (2000)  and the  much slower \nselection  scheme  given  by  the  function.  In  preliminary experiments  on  R3  (not \nshown)  hitchhiking  in  the  GA  was  reduced  significantly,  and  the  population  was \nable  to maintain instances of all the lowest-level schemas throughout each  run. \nNext,  we  studied  R4  (illustrated in  Figure 2).  R4 is identical to R3, except  that it \ndoes not have introns.  Further, R4 is defined over 128-bit strings, thus doubling the \nsize  of the problem.  In preliminary runs  on  R4,  we  used  a  population size  of 500, \na  mutation rate  of 0.005  (mutation always flips  a  bit),  and  multipoint crossover, \nwhere  the  number of crossover  points for  each  pair of parents was selected  from  a \nPoisson distribution with mean 2.816. \n\nTable  2 gives  the  mean number of evaluations to reach  levels  1,  2,  and 3  (neither \nalgorithm reached  level  4  within  the  maximum of  106  function  evaluations).  As \ncan be seen,  the time to reach  level one is comparable for  the two  algorithms, but \nthe  GA  is  much faster  at reaching  levels  2 and 3.  Further,  the GA  discovers  level \n3  approximately  twice  as  often  as  RMHC.  As  was  said  above,  it  is  necessary  to \nbalance  the  maintenance of independent  samples  with  the sequestering  of desired \nschemas.  These preliminary results suggest that R4 does a better job of maintaining \nthis balance than the earlier Royal Road functions.  Working out these  balances in \ngreater detail is a  topic of future work. \n\n4  CONCLUSION \n\nWe have presented analyses of two algorithms, RMHC and the IGA, and have used \nthe analyses to identify some general principles of when and how a genetic algorithm \nwill  outperform  hill  climbing.  We  then  presented  some  preliminary experimental \nresults comparing the GA and RMHC on a modified Royal Road landscape.  These \nanalyses and results are a further step in achieving our original goals-to design the \nsimplest  class  of fitness  landscapes  that will distinguish  the  GA from other search \nmethods,  and to characterize  rigorously the general features  of a fitness  landscape \nthat make it suitable for  a  GA. \n\n\fS8 \n\nMitchell, Holland, and Forrest \n\nOur  modified  Royal  Road  landscape  R4,  like  Rl,  is  not  meant  to  be  a  realistic \nexample of a  problem to which  one  might  apply a  GA.  Rather,  it is  meant  to  be \nan idealized  problem in  which  certain  features  most relevant  to GAs  are  explicit, \nso  that the GA's performance can  be  studied in detail.  Our claim is  that in order \nto understand  how  the  GA  works  in  general  and  where  it will  be most useful,  we \nmust first  understand how  it works  and where  it will  be  most useful  on simple yet \ncarefully  designed  landscapes  such  as  these.  The  work  reported  here  is  a  further \nstep in this direction. \n\nAcknowledgments \n\nWe thank R. Palmer for suggesting the RMHC algorithm and for sharing his careful \nanalysis with  us,  and  G.  Huber for  his  assistance  on the analysis of the IGA.  We \nalso thank E.  Baum,  L.  Booker, T. Jones,  and R.  Riolo for  helpful  comments and \ndiscussions regarding this work.  We gratefully acknowledge the support of the Santa \nFe  Institute's  Adaptive  Computation  Program,  the  Alfred  P.  Sloan  Foundation \n(grant  B1992-46),  and  the  National  Science  Foundation  (grants  IRI-9157644  and \nIRI-9224912). \n\nReferences \n\nL.  D. Davis (1991).  Bit-climbing, representational bias, and test suite design.  In R. \nK.  Belew  and  L.  B.  Booker  (eds.),  Proceedings  of the  Fourth  International  Confer(cid:173)\nence  on  Genetic Algorithms, 18-23.  San Mateo, CA:  Morgan Kaufmann. \nK.  A.  De  Jong (1975).  An Analysis of the  Behavior of a  Class  of Genetic Adaptive \nSystems.  Unpublished  doctoral  dissertation.  University  of Michigan,  Ann  Arbor, \nMI. \nS.  Forrest and M.  Mitchell (1993).  Relative building-block fitness  and the building(cid:173)\nblock  hypothesis.  In  D.  Whitley (ed.),  Foundations  of Genetic  Algorithms 2,  109-\n126.  San  Mateo,  CA:  Morgan Kaufmann. \nJ.  H.  Holland  (1975/1992).  Adaptation  in  Natural  and  Artificial Systems.  Cam(cid:173)\nbridge,  MA:  MIT  Press.  (First  edition  1975,  Ann  Arbor:  University  of Michigan \nPress.) \nJ. H.  Holland (1993).  Innovation  in  complex adaptive  systems:  Some  mathematical \nsketches.  Working Paper 93-10-062, Santa Fe Institute, Santa Fe,  NM. \nL.  Ingber and  B.  Rosen  (1992).  Genetic  algorithms and  very  fast  simulated rean(cid:173)\nnealing:  A comparison.  Mathematical  Computer Modelling,  16 (11),87-100. \nJ.  R.  Levenick  (1991).  Inserting  introns  improves genetic  algorithm success  rate: \nTaking a  cue from biology.  In R.  K.  Belew  and L.  B.  Booker  (eds.),  Proceedings  of \nthe  Fourth  International  Conference  on  Genetic  Algorithms,  123-127.  San Mateo, \nCA:  Morgan  Kaufmann. \nM.  Mitchell, S.  Forrest,  and J. H.  Holland (1992).  The royal road for  genetic algo(cid:173)\nrithms:  Fitness landscapes and GA  performance.  In  F.  J. Varela and P.  Bourgine \n(eds.),  Proceedings  of the  First  European  Conference  on  Artificial Life,  245-254. \nCambridge, MA:  MIT  Press. \n\n\f", "award": [], "sourceid": 836, "authors": [{"given_name": "Melanie", "family_name": "Mitchell", "institution": null}, {"given_name": "John", "family_name": "Holland", "institution": null}, {"given_name": "Stephanie", "family_name": "Forrest", "institution": null}]}