{"title": "Designing Application-Specific Neural Networks Using the Genetic Algorithm", "book": "Advances in Neural Information Processing Systems", "page_first": 447, "page_last": 454, "abstract": null, "full_text": "Designing Application-Specific Neural Networks \n\n447 \n\nDesigning Application-Specific \n\nNeural Networks \n\nUsing the Genetic Algorithm \n\nSteven A. Harp, Tariq Samad, Aloke Guha \n\nHoneywell SSDC \n\n1000 Boone Avenue North \nGolden Valley, MN 55427 \n\nABSTRACT \n\nWe present a general and systematic method for neural network \ndesign based on the genetic algorithm. The technique works in \nconjunction with network learning rules, addressing aspects of \nthe network's gross architecture, connectivity, and learning rule \nparameters. Networks can be optimiled for various application(cid:173)\nspecific criteria, such as learning speed, generalilation, robustness \nand connectivity. The approach is model-independent. We \ndescribe a prototype system, NeuroGENESYS, that employs the \nbackpropagation learning rule. Experiments on several small \nproblems have been conducted. In each case, NeuroGENESYS \nhas produced networks that perform significantly better than the \nrandomly generated networks of its initial population. The com(cid:173)\nputational feasibility of our approach is discussed. \n\n1 INTRODUCTION \nWith the growing interest in the practical use of neural networks, addressing the \nproblem of customiling networks for specific applications is becoming increas(cid:173)\ningly critical. It has repeatedly been observed that different network structures \nand learning parameters can substantially affect performance. Such important \naspects of neural network applications as generalilation, learning speed, connec(cid:173)\ntivity and tolerance to network damage are strongly related to the choice of \n\n\f448 \n\nHarp, Samad and Guha \n\nnetwork architecture. Yet there are few analytic results, and few heuristics, that \ncan help the application developer design an appropriate network. \nWe have been investigating the use of the genetic algorithm (Goldberg, 1989; \nHolland, 1975) for designing application-specific neural networks (Harp, Samad \nand Guha, 1989ab). In our approach, the genetic algorithm is used to evolve \nappropriate network structures and values of learning parameters. In contrast, \nother recent applications of the genetic algorithm to neural networks (e.g., Davis \n[1988], Whitley [1988]) have largely restricted the role of the genetic algorithm to \nupdating weights on a predetermined network structure-another \nlogical \napproach. \nSeveral first-generation neural network application development tools already \nexist. However, they are only partly effective: \nthe complexity of the problem, \nour limited understanding of the interdependencies between various network \ndesign choices, and the extensive human effort involved permit only limited \nexploration of the design space. An objective of our research is the development \nof a next-generation neural network application development tool that can syn(cid:173)\nthesise optimised custom networks. The genetic algorithm has been distinguished \nby its relative immunity to high dimensionality, local minima and noise, and it is \ntherefore a logical candidate for solving the network optimilation problem. \n\n2 GENETIC SYNTHESIS OF NEURAL NETWORKS \nFig. 1 outlines our approach. A network is represented by a blueprint-a bit(cid:173)\nstring that encodes a number of characteristics of the network, including struc(cid:173)\ntural properties and learning parameter values. Each blueprint directs the crea(cid:173)\ntion of an actual network with random initial weights. An instantiated network \nis trained using some predetermined training algorithm and training data, and \nthe trained network can then be tested in various ways-e.g., on non-training \ninputs, after disabling some units, and after perturbing learned weight values. \nMter testing, a network is evaluated-a fitneu estimate is computed for it based \non appropriate criteria. This process of instantiation, training, testing and \nevaluation is performed for each of a population of blueprints. \nMter the entire population is evaluated, the next generation of blueprints is pro(cid:173)\nduced. A number of genetic operator3 are employed, the most prominent of these \nbeing crouotler, in which two parent blueprints are spliced together to produce a \nchild blueprint (Goldberg, 1989). The higher the fitness of a blueprint, the \ngreater the probability of it being selected as a parent for the subsequent genera(cid:173)\ntion. Characteristics that are found useful will thereby tend to be emphasized in \nthe next generation, whereas harmful ones will tend to be suppressed. \nThe definition of network performance depends on the application. If the appli(cid:173)\ncation requires good generalilation capabilities, \nthe results of testing on \n(appropriately chosen) non-training data are important. If a network capable of \nreal-time learning is required, the learning rate must be optimiled. For fast \nresponse, the sile of the network must be minimized. If hardware (especially \nVLSI) implementation is a consideration, low connectivity is essential. In most \napplications several such criteria must be considered. This important aspect of \nIn our \napplication-specific network design is covered by the fitness function. \napproach, the fitness of a network can be an arbitrary function of several distinct \n\n\fDesigning Application-Specific Neural Networks \n\n449 \n\nSampling & Synthesis \n\nof Network \n-Blueprints\u00b7 \n\nGenetic \nAlgorithm \n\nblueprint \n\nfitness \n\nestimates \n\nNetwork \n\nPerformance \nEvaluation \n\ntesting \n\nI \n\nTest Stimuli L...-_--l \n\nFigure 11 A population ot network ~lueprint8\" 18 eyelically \n\nupdated by the genetic algorithm baaed on their fitne88. \n\nperformance and cost criteria, some or all of which can thereby be simultaneously \noptimized. \n\n3 NEUROGENESYS \nOur approach is model-independent: it can be applied to any existing or future \nneural network model (including models without a training component). As a \nfirst prototype implementation we have developed a working system called Neu(cid:173)\nroGENESYS. The current implementation uses a variant (Samad, 1988) of the \nbackpropagation learning algorithm (Werbos, 1974; Rumelhart, Hinton, and \nWilliams, 1985) as the training component and is restricted to feedforward net(cid:173)\nworks. \nWithin these constraints, NeuroGENESYS is a reasonably general system. Net(cid:173)\nworks can have arbitrary directed acyclic graph structures, where each vertex oC \nthe graph corresponds to an 4re4 or layer oC units and each edge to a projection \nCrom one area to another. Units in an area have a spatial organization; \nthe \ncurrent system arrays units in 2 dimensions. Each projection specifies indepen(cid:173)\ndent radii oC connectivity, one Cor each dimension. The radii of connectivity \nallow localized receptive field structures. Within the receptive fields connection \ndensities can be specified. Two learning parameters are associated with both pro(cid:173)\njections and areas. Each projection has a learning rate parameter (\"11\" in back(cid:173)\npropagation) and a decay rate Cor 11. Each area has 11 and 11-decay parameters \nfor threshold weights. \nThese network characteristics are encoded in the genetic blueprint. This bitstring \nis composed oC several segments, one Cor each area. An area segment consists of \nan area parameter specification (APS) and a variable number of projection \n\n\f450 \n\nHarp, Samad and Guha \n\nspecification fields (PSFs), each of which describes a projection from the area to \nsome other area. Both the APS and the PSF contain values for several parame(cid:173)\nters Cor areas and projections respectively. Fig. 2 shows a simple area segment. \nNote that the target of a projection can be specified through either Ab\"olute or \nRelative addressing. More than one projections are possible between two given \nareas; this allows the generation of receptive field structures at different scales \nand with different connection densities, and it also allows the system to model the \neffect of larger initial weights. In our current implementation, all initial weights \nare randomly generated small values from a fixed uniform distribution. In the \nnear future, we intend to incorporate some aspects of the distribution in the \ngenetic blueprint. \n\n~ AroaN \n-\n\n~ \n\nPROJEdTioN \n~arameters \n\nX-Share \nV -Share----' \n\nInitial Threhsold Eta-----' \nThreshold Eta Decay ----....I \n\nstart of ProjectiOn Marker - -..... \n\nConnection Density \nInitial Eta \nEla Decay \n\n--\n\n--\n\nX-Radius \nV-Radius \nT arget Address \nAddress Mode \n\nFigure 3. Network Blueprint Representation \n\nIn NeuroGENESYS, the score of a blueprint is computed as a linear weighted \nsum of several performance and cost criteria, including learning speed, the results \nof testing on a \"test set\", the numbers of units and weights in the network, the \nresults of testing (on the training set) after disabling some of the units, the \nresults of testing (on the training set) after perturbing the learned weight values, \nthe average fanout of the network, and the maximum fanout for any unit in the \nnetwork. Other criteria can be incorporated as needed. The user of Neuro(cid:173)\nGENESYS supplies the weighting factors at the start of the experiment, thereby \ncontrolling which aspects of the network are to be optimized. \n\n4 EXPERIMENTS \nNeuroGENESYS can be used for both classification and function approximation \nproblems. We have conducted experiments on three classification problems-digit \nrecognition from 4x 8 pixel images, exclusive-OR (XOR), and simple convexity \n\n\fDesigning Application-Specific Neural Networks \n\n451 \n\ndetection; and one function approximation problem-modeling one cycle of a sine \nfunction. Various combinations of the above criteria have been used. In most \nexperiments NeuroGENESYS has produced appropriate network designs in a \nrelatively small number of generations \u00ab 50). \nOur first experiment was with digit recognition, and NeuroGENESYS produced a \nsolution that surprised us: The optimized networks had no hidden layers yet \nlearned perfectly. It had not been obvious to us that this digit recognition prob(cid:173)\nlem is linearly separable. Even in the simple case of no-hidden-Iayer networks, \nour earlier remarks on application-specific design can be appreciated. When Neu(cid:173)\nroGENESYS was asked to optimile for average fanout for the digit recognition \ntask as well as for perfect learning, the best network produced learned perfectly \n(although comparatively slowly) and had an average fanout of three connections \nper unit; with learning speed as the sole optimization criterion, the best network \nproduced learned substantially faster (48 iterations) but it had an average fanout \nof almost an order of magnitude higher. \nIn this \nThe XOR problem, of course, is prototypically non-linearly-separable. \ncase, NeuroGENESYS produced many \nthat had a \n\"bypass\" connection from the input layer directly to the output layer (in addition \nto connections to and from hidden layers); it is an as yet unverified hypothesis \nthat these bypass connections accelerate learning. \nIn one of our experiments on the sine function problem, NeuroGENESYS was \nasked to design networks for moderate accuracy-the error cutoff during training \nwas relatively high. The networks produced typically had one hidden layer of \ntwo units, which is the minimum possible configuration for a sufficiently crude \napproximation. When the experiment was repeated with a low error cutoil', intri(cid:173)\ncate multilayer structures were produced that were capable of modeling the train(cid:173)\ning data very accurately (Fig. 3). Fig. 4 shows the learning curve for one sine \nfunction experiment. The\" Average\" and \"Best\" scores are over all individuals in \nthe generation, while \"Online\" and \"amine\" are running averages of Average \nand Best, respectively. Performance on this problem is quite sensitive to initial \nweight values, hence the non-monotonicity oC the Best curve. Steady progress \noverall was still being observed when the experiment was terminated. \nWe have conducted control studies using random search (with best retention) \ninstead of the genetic algorithm. The genetic algorithm has consisten tly proved \nsuperior. Random search is the weakest possible optimilation procedure, but on \nthe other hand there are few sophisticated alternatives for this problem-the \nsearch space is discontinuous, largely unknown, and highly nonlinear. \n\nfast-learning networks \n\n5 COMPUTATIONAL EFFICIENCY \nOur approach requires the evaluation of a large number of networks. Even on \nsome of our small-scale problems, experiments have taken a week or longer, the \nbottleneck being the neural network training ~lgorithm. While computational \nfeasibility is a real concern, Cor several reasons we are optimistic that this \napproach will be practical for realistic applications: \n\u2022 \n\nThe hardware platform for our experiments to date has been a Symbolics \ncomputer without any floating-point support. This choice has been ideal \n\n\f-.! ... ~~ \n\n' . 9a \n\nP . 88 \n\nI \n1.4' \n1 . 34 \n\nJ.69 \n\n4'48 \n\n18.6' \n\n29 . 65 \n12.43 \n19.'8 \n29 . 89 \n\n14 \n18 \n~~~~ \n, \n22 \n2 \n, \n1'4 \n2 \n14 \n19 \n18 \n2 \n8 \n\n.'.41 \n21 . 3' U_ \n\n5 . 9' \nJ . 18 \n5.98 \n5 . 9' \n7 5~ \n5 . 98 \n5.98 \n5 . 83 \nJ.39 \n5 . 88 \nS . 99 \n\n9 . 58 \n9 . 2' \na.45 \n1.4' \n1 . 6' \n1 . 46 \nI.\" \n1 . 4' \n9 . 31 \n1 . 4' \n8 . a9 \n\n19999 \n2956 \n19999 \n19999 \n4632 \n19099 \n5\"4 \n19999 \n5384 \n\n29.99 \n15 . 31 \n21.93 \n21 . 54 \n\n9.11 \n\nS \n\n\u2022 \n\n11 \n34 \n\nCJ \n13' \n1 \n8 \n36 \n2 \n32 \n11 \n15 \n2 \nQ \n\n5.09 \n12.99 \n1 . 99 \n6 . SU \n'\n. 00 \n2.99 \n5 . 91 \n'\n. 91 \n5 . 9a \n2. 99 \n9.89 \n\nt 92 \n9 . 99 \na . 99 \n9.99 \nB.la \n8 . S9 \n9 . 99 \n9 . 99 \na.a \n9 . 89 \n9.81 \n\nr PROJ-'7\u00b0U'PUr-AilEil \n\nPROJ- 8 \n\n/, \n\nPJPOJ-4 A A-\n\n452 \n\nHarp, Samad and Guha \n\nGENESYS \n\n\u2022 \n\ntc IU90~ teNt \n\nIon' pe-r IluP\\ : 49 \n.lton ~ he: 39 \nC\"0'50\\l.\" ) : a.8 \nof c:rO'SO\\le'r pt s : 1 Z \n\n\"'-.JtetlC)f\"l ): a,31 \non Rete: 9.81 \nI\"trons: T., \n\n1'10 \n\n81n eac.h ~e\"e .... t ion: Ye, \"0 \n\n~ PROJ-9 \n\nPROJ-I \n\nHPUI-\n\n-1 \n\n~,of-\nI PROJ\n\n-\n\n6 \n\n~q\u00a7i~::AZJGiibL::miC:::::=========:) \njAr \u2022\u2022 II: \n\n1't!OJ-2 \n\nI \n\nPIfOJ - 31 \n\n/ \n' \n\n/ \n\ntot.I \n\n.. h.' 12 .. II te 3214128 \n\nItf'enaton 1 : \n\nt 2\" I 18321114 128 \nDt.....,.to\" 2: '2\"\" til 32 S4 1'8 \nIntti.l Et. n\"lre.hold : 0. 10.20\" a., 1 II 3. 21. ' \n,,,,, \u2022 .nold (t.. &1008 : ' \u2022\u2022 0.002 0004 0008 a.QUI 0.032 a. olU 0. t21 \n\n' 2.1 \n\n(Mtt \n\nAbort \n\nAbort \nIlun \n\nBral..., \u2022\u2022 h \n\nSav. \n\nChart \nShe... \n\nCl.... \nStAtu. \n\nContinue \n\nLaM \n\nFigure I. The NeuroGENESYS interfaee, showing a network strueture \n\noptimised tor the sine tUnetion problem \n\n\u2022 \n\n\u2022 \n\nfor program development, and NeuroGENESYS' user interface features \nwould not have been possible without it, but the performance penalty has \nbeen severe (relative to machines with floating point hardware). \nThe genetic algorithm is an inherently parallel optimization procedure, a \nfeature we soon hope to take advantage of. We have recently implemented \na networked version of NeuroGENESYS that will allow us to retain the \ndesirable aspects of the Symbolics version and yet achieve substantial \nspeedup in execution (we expect two to three orders of magnitude): up to \n30 Apollo workstationst a VAX, and 10 Symbolics computers can now be \nevaluating different networks in parallel (Harp, Samad and Guha, 1990). \nThe current version of NeuroGENESYS employs the backpropagation \nlearning rule, which is notoriously slow for many applications. However, \nfaster-learning extensions of backpropagation are continually being \ndeveloped. We have incorporated one recent extension (Samad, 1988), but \nothers, especially common ones such as including a \"momentum n term in \nthe weight update rule (Rumelhart, Hinton and Williams, 1985), could also \nbe considered. More generally, learning in neural networks is a topic of \nintensive research and it is likely that more efficient learning algorithms \nwill become popular in the near future. \n\n\f\u2022 /; . \n\n;' ~ \nr. \n, \n, \ni i \n~ ;, \n; \ni ! ; , \n, ! ; , \n! ; i \n.. \n.. \n~; \n~ \n!..i \n~ \n! i \n! i , \n! , . \n,/ \\ / \n, . \n~, \n. ; \n.. \n;! \n\ni \n\\ \n\ni i \ni \nI \n\\ \n\\ \n~ , \nI \ni \ni \n\nI t , , , , \n.. .. , \n\n., \n\n.'~ \n\n,-, \ni \n' \n,~. \n\n,. \n.\", .\u2022 / \n\n. ' \n\nI \n\n, \n., '. '~ \n\n\u00b7\u00b70- best \n- 0- average \n-+- offline \n-+- online \n\n6 \n\n2 \n\n\" \n...... \n\nt \n; \n\n.\"'. \ni \n, \n; \n\n,-, \n\\ \n.' \n\nA\" \n\n'e .... \n\n......... \n,.. \n. \n\nDesigning Application-Specific Neural Networks \n\n453 \n\n8~----------------------------------------------------~,~,----~ \n\nAccuracy on the SINE Function \n\n_ \n\n...a. 'a-\", 4.a.. -.... \n\n.,0- .... \n\n, \nA.. \n\n\u2022 \n\n\u2022 \n\n\u2022 \n\no \n\n10 \n\n20 \n\nGeneration \n\n30 \n\nFigure 41 A learning curve for the Bine function problem \n\nThe genetic algorithm is a.n active field of research itself. Improvements, \nmany or which are concerned with convergence properties, are frequently \nbeing reported a.nd could reduce the computational requirements (or its \napplication significantly. \nThe genetic algorithm is an iterative optimization procedure that, on the \naverage, produces better solutions with each passing generation. Unlike \nsome other optimilation techniques, userul results can be obtained during a \nrun. The genetic algorithm can thus take advantage of whatever time and \ncomputational resources are available ror an application. \nJust as there is no strict termination requirement for the genetic algorithm, \nthere is no constraint on its initialilation. In our experimen ts, the zeroth \ngeneration consisted or randomly generated networks. Not surprisingly, \nalmost all or these are poor perrormers. However, better better ways of \nselecting the initial population are possible. In particular, the initial popu(cid:173)\nlation can consist or manually optimiled networks. Manual optimization of \nneural networks is currently the norm, but it leaves much or the design \nspace unexplored. Our approach would allow a human application \ndeveloper to design one or more networks that could be the starting point \nfor further, more systematic optimization by the genetic algorithm. Other \ninitialization approaches are also possible, such as using optimized networks \nfrom similar applications, or using heuristic guidelines to generate net(cid:173)\nworks. \n\nIt should be emphasized that computational efficiency is not the only factor that \nmust be considered in evaluating this (or any) approach. Others such as the \npotential for improved perrormance or neural network applications and the costs \n\n\f454 \n\nHarp, Samad and Guha \n\nand benefits associated with alternative approaches for designing network appli(cid:173)\ncations are also critically important. \n\n6 FUTURE RESEARCH \nIn addition to running further experiments, we hope in the future to develop ver(cid:173)\nsions of NeuroGENESYS for other network models, including hybrid models that \nincorporate supervised and unsupervised learning components. \n\nSpace restrictions have precluded a detailed description of NeuroGENESYS and \nour experiments. The interested reader is referred to (Harp, Samad, and Guha, \n1989ab, 1990). \n\nReferences \nIn \nDavis, L. (1988). Properties of a hybrid neural network-classifier system. \nAdvcuz.cu in Neura.l Information Proceuing Sydem8 1, D.S. Touretlky (Ed.). \nSan Mateo: Morgan Kaufmann. \nGoldberg, D.E. (1989). Genetic Algorithm8 in Search, Optimization and Machine \nLearning. Addison-Wesley. \nHarp, S.A., T. Samad, and A. Guha (1989a). Towards the genetic synthesis of \nneural networks. Proceeding8 of the Third International Conference on Genetic \nAlgorithm8, J.D. Schaffer (ed.). San Mateo: Morgan Kaufmann. \nHarp, S.A., T. Samad, and A. Guha (1989b). Genetic Synthui8 of Neura.l Net(cid:173)\nwork8. Technical Report 14852-CC-1989-2. Honeywell SSDC, 1000 Boone Ave(cid:173)\nnue North, Golden Valley, MN 55427. \nHarp, S.A., T. Samad, and A. Guha (1990). Genetic synthesis of neural network \narchitecture. \nIn The Genetic Algorithm8 Handbook, L.D. Davis (Ed.). New \nYork: Van Nostrand Reinhold. (To appear.) \nHolland, J. (1975). Adaptation in Natural and Artificial Sydem,. Ann Arbor: \nUniversity of Michigan Press. \nRumelhart, D.E., G.E. Hinton, and R.J. Williams (1985). Learning Interna.l \nRepruentation, by Error-Propagation, ICS Report 8506, Institute for Cognitive \nScience, UCSD, La Jolla, CA. \nSamad, T. (1988). Back-propagation is significantly faster if the expected value \nof the source unit is used for update. Neural Network8, 1, Sup. 1. \nWerbos, P. (1974). Beyond Regru8ion: New Tool8 for Prediction and AnalY8i8 \nin the Behavioral Sciencu. Ph.D. Thesis, Harvard University Committee on \nApplied Mathematics, Cambridge, MA. \nWhitley, D. (1988). Applying Genetic Algorithm8 to Neural Net Learning. \nTechnical Report CS-88-128, Department of Computer Science, Colorado State \nUniversity. \n\n\f", "award": [], "sourceid": 263, "authors": [{"given_name": "Steven", "family_name": "Harp", "institution": null}, {"given_name": "Tariq", "family_name": "Samad", "institution": null}, {"given_name": "Aloke", "family_name": "Guha", "institution": null}]}