{"title": "Genetic Algorithms and Explicit Search Statistics", "book": "Advances in Neural Information Processing Systems", "page_first": 319, "page_last": 325, "abstract": null, "full_text": "Genetic Algorithms and Explicit Search Statistics \n\nShumeet 8a1uja \nbaluja@cs.cmu.edu \n\nJustsystem Pittsburgh Research Center & \n\nSchool of Computer Science, Carnegie Mellon University \n\nAbstract \n\nThe  genetic  algorithm  (GA) is  a  heuristic  search  procedure  based  on  mechanisms \nabstracted from population genetics. In a previous paper [Baluja & Caruana,  1995], \nwe  showed  that  much  simpler  algorithms,  such  as  hillcIimbing  and  Population(cid:173)\nBased Incremental  Learning (PBIL), perform comparably  to  GAs on  an  optimiza(cid:173)\ntion  problem  custom  designed  to  benefit  from  the  GA's  operators.  This  paper \nextends these results in two directions. First, in a large-scale empirical comparison \nof problems that have been reported  in GA literature, we show that on many prob(cid:173)\nlems,  simpler  algorithms  can  perform  significantly  better than  GAs.  Second,  we \ndescribe when crossover is useful, and show how it can be incorporated into PBIL. \n\n1  IMPLICIT VS. EXPLICIT SEARCH STATISTICS \n\nAlthough there has recently been controversy in the genetic algorithm (GA) community as \nto whether GAs should be used for static function optimization, a large amount of research \nhas been, and continues to be, conducted in this direction [De Jong,  1992]. Since much of \nGA research focuses  on optimization (most often in static environments), this study exam(cid:173)\nines the performance of GAs in these domains. \nIn the standard GA, candidate solutions are encoded as fixed length binary vectors. The ini(cid:173)\ntial group of potential solutions is chosen randomly. At each generation, the fitness of each \nsolution  is  calculated;  this  is  a  measure  of how  well  the  solution  optimizes  the  objective \nfunction.  The subsequent generation is created through a process of selection, recombina(cid:173)\ntion,  and mutation. Recombination operators merge the information contained within pairs \nof selected \"parents\" by  placing random subsets of the information from  both parents into \ntheir respective  positions  in  a  member of the  subsequent  generation.  The  fitness  propor(cid:173)\ntional  selection  works  as  selective  pressure;  higher fitness  solution  strings  have  a  higher \nprobability of being selected for recombination. Mutations are used to help preserve diver(cid:173)\nsity in the population by introducing random changes into the solution strings. The GA uses \nthe population to implicitly maintain statistics about the search space. The selection, cross(cid:173)\nover, and mutation operators can be viewed as mechanisms of extracting the implicit statis(cid:173)\ntics from  the population to choose the  next set of points to sample.  Details of GAs can be \nfound in [Goldberg,  1989] [Holland,  1975]. \n\nPopulation-based incremental learning  (PBIL) is  a combination of genetic  algorithms  and \ncompetitive  learning  [Baluja,  1994].  The  PBIL  algorithm  attempts  to  explicitly  maintain \nstatistics about the search space to decide where to sample next. The object of the algorithm \nis to create a real valued probability vector which, when sampled, reveals high quality solu(cid:173)\ntion  vectors  with  high  probability.  For example,  if a  good  solution can  be  encoded  as  a \nstring  of alternating  O's  and  l's,  a  suitable  final  probability  vector  would  be  0.01,  0.99, \n0.01, 0.99, etc. The PBIL algorithm and parameters are shown in Figure  1. \nInitially, the values of the probability vector are initialized to 0.5. Sampling from this vec(cid:173)\ntor yields random solution vectors because the  probability  of generating a  I  or 0  is equal. \nAs search progresses,  the values in the probability vector gradually shift to represent high \n\n\f320 \n\nS. Baluja \n\n\u2022\u2022\u2022\u2022\u2022\u2022 Initialize Probability Vector \u2022\u2022\u2022\u2022\u2022 \nfor i :=1  to LENGTH do P[i] = 0.5; \n\nwhile (NOT tennination condition) \n\n..... Generate Samples ..... \nfor i := 1 to SAMPLES do \n\nsample_vectors[i] := generate_sample_vector_according_to-probabilities (P); \nevaluations[I] := evaluate(sample_vectors[i)); \n\nbesLvector:= find_vectocwith_besLevaluation (sample_vectors, evaluations); \nworsLvector := find_vectocwith_worsLevaluation (sample_vectors, evaluations); \n\n\u2022\u2022\u2022\u2022\u2022 Update Probability Vector Towards Best Solution \u2022\u2022\u2022\u2022\u2022 \nfor i :=1  to LENGTH do \n\nP[i] := P[I] \u2022 (1.0 \u2022 LA) + besLvector[i] \u2022 (LA); \n\nPBIL: USER DEFINED CONSTANTS (Values Used In this Study): \nSAMPLES: the number of vectors generated before update of the probability vector (100). \nLA: the leaming rate,  how fast to exploit the search perfonned (0.1). \nNEGATIVE_LA: negative leaming rate,  how much to leam from negative examples (PBIL 1=0.0, PBIL2= 0.075). \nLENGTH: the number of bits In a generated vBCtor (problem specific). \nFigure 1: PBILIIPBIL2 algorithm for a binary alphabet. PBIL2 includes shaded region. Mutations not shown. \n\nevaluation  solution vectors through  the following  process.  A number of solution  vectors \nare generated based upon the probabilities specified in the probability vector. The proba(cid:173)\nbility vector is pushed towards  the generated solution vector with  the highest evaluation. \nAfter the probability vector is updated, a new set of solution vectors is produced by sam(cid:173)\npling  from  the  updated  probability  vector,  and  the  cycle  is  continued.  As  the  search \nprogresses,  entries  in  the  probability  vector move away  from  their initial  settings  of 0.5 \ntowards either 0.0 or 1.0. \nOne key  feature of the early generations of genetic optimization is the parallelism in the \nsearch;  many  diverse  points  are  represented  in the population of points during  the early \ngenerations.  When  the  population  is  diverse,  crossover  can  be  an  effective  means  of \nsearch, since it provides a method to explore novel solutions by combining different mem(cid:173)\nbers of the population. Because PBIL uses a single probability vector, it may seem to have \nless expressive power than a GA using a full population, since a GA can represent a large \nnumber of points simultaneously. A traditional single population GA, however, would not \nbe able to maintain a large number of points. Because of sampling errors, the population \nwill converge around a single point. This phenomenon is summarized below: \n\n\" ... the theorem [Fundamental Theorem of Genetic Algorithms [Goldberg,  1989]], assumes \nan infinitely large population size. In a finite size population, even when there is no selective \nadvantage  for  either  of two  competing  alternatives ...  the  population  will  converge  to  one \nalternative  or  the  other  in  finite  time  (De Jong,  1975;  [Goldberg  &  Segrest,  1987]).  This \nproblem of finite populations is so important that geneticists have given it a special name, \ngenetic drift. Stochastic errors tend to accumulate, ultimately causing the population to con(cid:173)\nverge to one alternative or another\" [Goldberg & Richardson, 1987]. \n\nDiversity in  the population is crucial for GAs. By maintaining a population of solutions, \nthe GA is  able-in theory at least-to maintain samples in many different regions. Cross(cid:173)\nover is used to merge these different solutions. A necessary (although not sufficient) con(cid:173)\ndition  for  crossover to  work  well  is  diversity  in  the popUlation.  When  diversity  is  lost, \ncrossover begins to behave like a mutation operator that is sensitive to the convergence of \nthe  value  of each bit  [Eshelman,  1991].  If all  individuals  in  the  population converge  at \n\n\fGenetic Algorithms and Explicit Search Statistics \n\n321 \n\nsome bit position, crossover leaves those bits unaltered. At bit positions where individuals \nhave not converged, crossover will effectively mutate values in those positions. Therefore, \ncrossover creates new  individuals that differ from  the individuals it  combines only at the \nbit positions where the mated  individuals disagree. This is  analogous to PBIL which cre(cid:173)\nates new trials that differ mainly in positions where prior good performers have disagreed. \nAs an example of how the PBIL algorithm works, we can examine the values in the prob(cid:173)\nability  vector through  multiple  generations.  Consider the  following  maximization  prob(cid:173)\nlem:  1.0/1(366503875925.0 - X)I, 0 ~ X  < 240. Note that 366503875925 is  represented  in \nbinary as a string of 20 pairs of alternating  '01'. The evolution of the probability vector is \nshown  in  Figure  2.  Note  that  the  most significant  bits  are  pinned  to  either 0  or  1 very \nquickly,  while  the  least significant bits are  pinned  last.  This  is  because during  the  early \nportions of the search, the most significant bits yield more information about high evalua(cid:173)\ntion regions of the search space than the least significant bits. \n\no \n5 \n\n'0  \n~\"'5 \n\"0 \nSl  2 0  \n6: \ng  25 \n30 \n\n35  \n\nFigure 2: Evolution of the probability vector over successive generations. White represents a high \nprobability of generating a 1. black represents a high probability of generating a O.  Intennediate grey represent \nprobabilities close to 0.5 - equal chances of generating a 0 or 1.  Bit 0 is the most significant. bit 40 the least. \n\nGeneration \n\n2  AN EMPIRICAL COMPARISON \n\nThis section provides a summary of the results obtained from a large scale empirical com(cid:173)\nparison  of seven  iterative and  evolution-based  optimization  heuristics.  Thirty-four static \noptimization  problems,  spanning  six  sets  of  problem  classes  which  are  commonly \nexplored  in  the  genetic  algorithm  literature,  are  examined.  The  search  spaces  in  these \nproblems  range  from  2128  to  22040.  The results  indicate  that,  on  many  problems,  using \nstandard GAs for optimizing static functions does not yield a benefit, in terms of the final \nanswer obtained, over simple hillclimbing or PBIL. Recently, there have been other stud(cid:173)\nies which have examined the perfonnance of GAs in comparison to hillclimbing on a few \nproblems; they have shown similar results [Davis,  1991][Juels &  Wattenberg, 1996]. \nThree  variants  of Multiple-Restart Stochastic Hillclimbing (MRS H)  are  explored  in  this \npaper.  The first  version,  MRSH-l,  maintains  a  list of the  position of the  bit flips  which \nwere attempted without improvement. These bit flips are not attempted again until a better \nsolution is found. When a better solution is found, the list is emptied. If the list becomes as \nlarge as  the solution encoding,  MRSH-l  is restarted at  a random solution with an empty \nlist.  MRSH-2  and  MRSH-3  allow  moves  to  regions  of higher  and  equal  evaluation.  In \nMRSH-2, the number of evaluations before restart depends upon the length of the encoded \nsolution.  MRSH-2  allows  1O*(length  of  solution)  evaluations  without  improvement \nbefore search is restarted. When a solution with  a higher evaluation is found,  the count is \nreset. In MRSH-3, after the total number of iterations is specified, restart is forced 5 times \nduring search, at equally spaced intervals. \nTwo  variants of the standard GA are tested  in  this study. The first,  tenned SGA, has the \nfollowing  parameters:  Two-Point crossover,  with  a crossover rate of 100%  (%  of times \ncrossover occurs, otherwise the individuals are copied without crossover), mutation prob(cid:173)\nability  of 0.001  per bit,  population  size of 100,  and  elitist selection (the  best solution  in \n\n\f322 \n\ns. Haluja \n\ngeneration N replaces the worst solution in generation N+ 1). The second GA used, termed \nGA-Scale,  uses the same parameters except:  uniform crossover with  a  crossover rate  of \n80% and the fitness of the worst member in a generation is subtracted from the fitnesses of \neach member of the generation before the probabilities of selection are determined. \n\nTwo variants of PBIL are tested. Both move the probability vector towards the best exam(cid:173)\nple in each generated population. PBIL2 also moves the probability vector away from the \nworst example in each generation. Both variants are shown in Figure 1.  A small mutation, \nanalogous  to  the  mutation  used  in  genetic  algorithms,  is  also  used  in  both  PBILs.  The \nmutation is directly applied to the probability vector. \nThe  results  obtained  in  this  study  should  not  be  considered  to  be  state-of-the-art.  The \nproblem encodings were chosen  to be easily reproducible  and  to  allow  easy comparison \nwith  other studies. Alternate encodings  may yield  superior results.  In  addition,  no prob(cid:173)\nlem-specific  information  was used  for  any  of the  algorithms.  Problem-specific  informa(cid:173)\ntion, when available, could help all of the algorithms examined. \n\nAll  of the  variables  in  the problems were encoded in binary, either with standard Gray(cid:173)\ncode or base-2 representation. The variables were represented in non-overlapping, contig(cid:173)\nuous  regions  within  the  solution encoding.  The results reported  are  the  best evaluations \nfound  through  the search of each algorithm,  averaged  over at  least 20 independent runs \nper algorithm per problem; the results for GA-SCALE and PBIL2 algorithms are the aver(cid:173)\nage of at least 50 runs. All algorithms were given 200,000 evaluations per run. In each run, \nthe GA and PBIL algorithms were given 2000 generations, with  100 function evaluations \nper generation.  In each run,  the MRSH algorithms were restarted  in random locations as \nmany times as needed until  200,000 evaluations  were performed. The best answer found \nin the 200,000 evaluations was returned as the answer found  in the run. \n\nBrief notes about the encodings are given below. Since the numerical results are not useful \nwithout the exact problems, relative results are provided in Table I. For most of the prob(cid:173)\nlems, exact results and encodings are in [Baluja,  1995). To measure the significance of the \ndifference  between  the  results  obtained  by  PBIL2  and  GA-SCALE,  the  Mann-Whitney \ntest is used. This is a non-parametric equivalent to the standard two-sample pooled t-tests. \n128,200 & 255 city problems were tried. The \"sort\" encoding [Syswerda, 1989] \n\n\u2022 \nwas used. The last problem was tried with the encoding in binary and Gray-Code. \n\nTSP: \n\nJobshop:  Two standard JS  problems were tried with two encodings. The first encoding is \n\n\u2022 \ndescribed  in  [Fang  et. ai,  1993].  The  second  encoding  is  described  in  [Baluja,  1995].  An  addi(cid:173)\ntional, randomly generated, problem was also tried with the second encoding. \n\n\u2022  Knapsack:  Problem  1&2: a unique element is represented by each bit. Problem 3&4: there \nare 8 and 32 copies of each element respectively. The encoding specified the number of copies of \neach  element  to  include.  Each  element  is  assigned  a  \"value\"  and  \"weight\".  Object:  maximize \nvalue while staying under pre-specified weight. \n\nBin-Packing/EquaI Piles: The solution is encoded in  a bit  vector of length M  * log2N (N \n\u2022 \nbins,  M  elem.).  Each  element  is  assigned  a  substring  of length  log2N,  which  specifies  a  bin. \nObject:  pack the given bins as tightly as possible. Because of the large variation in results which is \nfound by varying the number of bins and elements, the results from  8 problems are reported. \n\n\u2022  Neural-Network Weight Optimization: Problem 1&2: identify the parity of7 inputs. Prob(cid:173)\nlem 3&4: determine whether a point falls within the middle of 3 concentric squares. For problems \n3&4,  5  extra  inputs,  which  contained  noise,  were  used. The  networks  had  8  inputs  (including \nbias), 5 hidden units, and 1 output. The network was fully connected between sequential layers. \n\n\u2022  Numerical Function Optimization (FI-FJ): Problems  1&2:  the variables in  the  first  por(cid:173)\ntions of the  solution string have a large influence on the quality of the rest of the solution. In the \nthird problem, each variable can be set independently. See [Baluja,  1995] for details. \n\n\u2022  Graph Coloring: Select 1 of 4 colors for nodes of a partially connected graph such that con-\nnected nodes are not the same color. The graphs used were not necessarily planar. \n\n\fGenetic Algorithms and Explicit Search Statistics \n\n323 \n\nTable I:  Summary of Empirical Results - Relative Ranks (l=best, 7=worst). \n\n3  EXPLICITL Y PRESERVING DIVERSITY \n\nAlthough the results  in  the  previous  section  showed  that PBIL often  outperformed  GAs \nand  hillclimbing,  PBIL  may  not  surpass  GAs  at  all  population  sizes. As  the  population \nsize increases, the observed behavior of a GA more closely approximates the ideal behav(cid:173)\nior predicted by  theory  [Holland,  1975]. The population  may  contain  sufficient samples \nfrom distinct regions for crossover to effectively combine \"building blocks\" from multiple \nsolutions. However, the desire to minimize the total number of function evaluations often \nprohibits the use of large enough populations to make crossover behave ideally. \nOne method of avoiding  the cost of using a very  large population is  to  use a parallel GA \n(pGA).  Many  studies  have found  pGAs to  be  very  effective for  preserving diversity  for \nfunction optimization [Cohoon et al.,  1988][Whitley et ai.,  1990]. In the pGA, a collection \nof independent GAs, each maintaining separate populations, communicate with each other \n\n\f324 \n\nS.  Baluja \n\nvia infrequent inter-population (as opposed to intra-population) matings.  pGAs suffer less \nfrom premature convergence than single population GAs. Although the individual popula(cid:173)\ntions typically  converge,  different  populations converge  to  different  solutions,  thus  pre(cid:173)\nserving  diversity  across  the  populations.  Inter-population  mating  permits  crossover  to \ncombine solutions found in different regions of the search space. \nWe would  expect that  employing  mUltiple  PBIL evolutions,  parallel  PBIL  (pPBIL),  has \nthe potential to yield performance improvements similar to those achieved in pGAs. Mul(cid:173)\ntiple PBIL evolutions are simulated by using multiple probability vectors to generate solu(cid:173)\ntions.  To keep the evolutions  independent,  each  probability  vector is only  updated  with \nsolutions which are generated by sampling it. \nThe  benefit of parallel  populations  (beyond just  multiple  runs)  is  in  using  crossover to \ncombine dissimilar solutions.  There are  many ways  of introducing crossover into PBIL. \nThe method  which  is  used  here is  to  sample  two  probability  vectors  for  the creation  of \neach  solution  vector,  see Figure 3.  The figure  shows  the algorithm  with uniform  cross(cid:173)\nover; nonetheless, many other crossover operators can be used. \nThe randomized nature of crossover often yields unproductive results. If crossover is to be \nused, it is important to  simulate the crossover operation many times.  Therefore, crossover \nis  used  to  create each  member of the  population  (this  is  in contrast to crossing  over the \nprobability  vectors  once,  and  generating  the  entire  population  from  the  newly  created \nprobability vector). More details on integrating crossover and PBIL, and its use in combi(cid:173)\nnatorial problems in robotic surgery can be found in [Baluja & Simon,  1996]. \nResults  with  using  pPBIL in  comparison to  PBIL,  GA,  and  pGA are shown  in Table II. \nFor many  of the problems explored here,  parallel  versions of GAs and PBIL work better \nthan  the  sequential  versions,  and  the parallel  PBIL  models  work better than  the parallel \nGA models. In each of these experiments, the parameters were hand-tuned for each algo(cid:173)\nrithms.  In  every  case,  the  GA  was  given at least  twice  as  many function  evaluations as \nPBIL.  The  crossover  operator  was  chosen  by  trying  several  operators  on  the  GA,  and \nselecting the best one. The same crossover operator was then used for PBIL. For the pGA \nand pPBIL experiments,  10 subpopulations were always used . \n\n..... Generate Samples With Two  Probability Vectors ..... \nfor i :=1  to SAMPLES do \n\nvector I  := generate_sample_vector_with_probabilities (PI); \nvector2 := generate_sample_vector_with_probabilities (P2); \nfor j  :=  I  to LENGTH_do \n\nif (random (2) = 0) sample_ vector{i]lil  := vector I [j] \nelse  sample_ vector{i][j]  := vector2[j] \n\nevaluations[i] := Evaluate_Solution (sample[i]); \n\nbesevector := best_evaluation (sample_vectors. evaluations) ; \n\n..... Update Both Probability Vectors  Towards Best Solution  ..... \nfor i :=1  to LENGTH do \n\nPI[i] := Pl[i] \u2022  (1.0 - LR) + best_vector[i]  \u2022  (LR); \nP2[i]  := P2[i] * (1 .0 - LR) + besevector[i] \u2022  (LR); \n\nFigure 3: Generating samples based \non two probability vectors.  Shown \nwith uniform crossover [Syswerda, \n1989] (50% chance of using \nprobability vector 1 or vector 2 for \neach bit position). Every 100 \ngenerations, each population makes \na local copy of another population's \nprobability vector (to replace \nvector2). In these experiments, there \nare a total of 10 subpopulations. \n\nTable IT:  Sequential &  Parallel, GA &  PBIL, Avg. 25 runs \n\n- 200 city (minimize tour length) \n\n\"'lIIont:lI' Optim. Highly Correlated Parameters - Base-2 Code (max) \n\nOptim. Highly Correlated Parameters - Gray Code (max) \n\nOptim. Independent Parameters - Base-2 Code (max) \n\n... lIlmo'\",,,,,.  Optim. Independent Parameters - Gray Code (max) \n\n(Problem with many maxima, see [Baluja, 1994]) (max) \n\n\fGenetic Algorithms and Explicit Search Statistics \n\n325 \n\n4  SUMMARY &  CONCLUSIONS \n\nPBIL was  examined  on  a very  large  set of problems drawn  from  the  GA  literature.  The \neffectiveness  of PBIL  for  finding  good  solutions  for  static  optimization  functions  was \ncompared  with  a variety  of GA and  hillclimbing  techniques.  Second,  Parallel-PBIL  was \nintroduced.  pPBIL  is  designed  to  explicitly preserve diversity  by  using  multiple parallel \nevolutions. Methods for reintroducing crossover into pPBIL were given. \nWith  regard  to  the empirical results,  it should be noted  that it  is  incorrect to say  that one \nprocedure will  always perform better than  another.  The results do  not indicate that PBIL \nwill  always outperform  a GA.  For example,  we  have presented problems on  which GAs \nwork better. Further, on problems such as binpacking, the relative results can change dras(cid:173)\ntically depending upon the number of bins and elements. The conclusion which should be \nreached from these results is that algorithms, like PBIL and MRSH, which are much sim(cid:173)\npler than GAs, can outperform standard GAs on many problems of interest. \nThe  PBIL  algorithm  presented  here  is  very  simple  and  should  serve  as  a  prototype  for \nfuture  study. Three directions for future  study are presented here.  First, the most obvious \nextension to  PBIL is to track more detailed statistics, such as pair-wise covariances of bit \npositions  in  high-evaluation  vectors.  ~eliminary work  in  this  area has  been  conducted, \nand  the  results  are  very  promising.  Second,  another  extension  is  to  quickly  determine \nwhich probability vectors, in  the pPBIL model,  are unlikely to yield  promising answers; \nmethods  such  as  Hoeffding Races  may  be  adapted  here  [Maron  &  Moore,  1994].  Third, \nthe  manner in  which  the updates  to  the  probability  vector occur is  similar to  the  weight \nupdate rules used in Learning Vector Quantization (LVQ). Many of the heuristics used in \nL VQ can be incorporated into the PBIL algorithm. \nPerhaps the most important contribution of the PBIL algorithm is a novel way of examin(cid:173)\ning GAs. In many previous studies of the GA, the GA was examined at a micro-level, ana(cid:173)\nlyzing the preservation of building blocks and frequency  of sampling hyperplanes. In  this \nstudy, the statistics at the population level were examined. In the standard GA, the popula(cid:173)\ntion serves to implicitly maintain statistics about the search space. The selection and cross(cid:173)\nover  mechanisms  are  ways  of  extracting  these  statistics  from  the  population.  PBIL's \npopulation  does  not  maintain  the  information  that  is  carried  from  one  generation  to  the \nnext. The statistics of the search are explicitly kept in the probability vector. \n\nReferences \nBaluja, S.  (1995) \"An Empirical Comparison of Seven Iterative and Evolutionary Function Optimization Heuristics,\" CMU-CS-\n\n95-193. Available via. http://www.cs.cmu.edul-baluja. \n\nBaluja, S. (1994) \"Population-Based Incremental Learning\". Carnegie MeUon  University. Technical Repon. CMU-CS-94-163. \nBaluja, S.  & Caruana, R.  (1995) \"Removing the Genetics from the Standard Genetic Algorithm\", Imer.Con! Mach.  uarning-12. \nBaluja,  S.  &  Simon,  D.  (1996)  \"Evolution-Based  Methods  for  Selecting  Point  Data  for  Object  Localization:  Applications  to \n\nComputer Assisted Surgery\". CMU\u00b7CS\u00b796 -183. \n\nCohoon, J., Hedge, S., Martin, W. , Richards,  D., (1988)  \"Distributed Genetic  Algorithms for  the Floor Plan  Design  Problem,\" \n\nSchool of Engineering and Applied Science, Computer Science Dept., University of Virginia, TR-88-12. \n\nDavis, L.1.  (1991) \"Bit-Climbing, Representational Bias and Test Suite Design\".lntemational Con! on Genetic Algorilhms 4. \nDe Jong, K.  (1975) An Analysis of the Behavior of a Class of Genetic Adaptive Systems.  Ph.D. Dissenation. \nDe Jong, K. (1993) \"Genetic Algorithms are NOT Function Optimizers\". In Whitley (ed.) Foundations of GAs-2. 5-17. \nEshelman, L.J. (1991) \"The CHC Adaptive Search Algorithm,\" in Rawlings (ed.) Foundations of GAs-I. 265-283. \nFang,  H.L,  Ross,  P.,  Come, D.  (1993) \"A Promising Genetic Algorithm Approach  to Job-Shop Scheduling,  Rescheduling, and \n\nOpen- Shop Scheduling Problems\". In Forrest, S. Imernational Conference on Genetic Algorithms 5. \n\nGOldberg, D.E. (1989) Genetic Algorithms in Search,  Optimization,  and Machine uarning. Addison-Wesley. \nGoldberg &  Richardson (1987) \"Genetic Algorithms with Sharing for Multimodal Function Optimization\" - Proceedings of the \n\nSecond International Conference on Genetic Algorithms. \n\nHoUand, J. H. (1975) Adaptation in  Natural and Ani/icial Systems. Ann Arbor: The University of Michigan Press. \nJuels,  A.  &  Wattenberg, M.  (1994) \"Stochastic Hillclimbing as a Baseline Method for Evaluating Genetic Algorithms\" NIPS 8. \nMaron, O. &  Moore, A.(1994) \"Hoeffding Races:Accelerating Model Selection for Classification and Function Approx.\" NIPS 6 \nMitchell, M., Holland, 1.  &  Forrest, S.  (1994) \"When will a Genetic Algorithm Outperform Hill Climbing\" NIPS 6. \nSyswerda, G. (1989) \"Uniform Crossover in Genetic Algorithms,\" International Conference on Genetic Algorithms 3.2-9. \nWhitley, D.,  &  Starkweather, T.  \"Genitor II: A Distributed Genetic Algorithm\". }ETAl2:  189-214. \n\n\f", "award": [], "sourceid": 1247, "authors": [{"given_name": "Shumeet", "family_name": "Baluja", "institution": null}]}