{"title": "Adaptive Range Coding", "book": "Advances in Neural Information Processing Systems", "page_first": 486, "page_last": 492, "abstract": null, "full_text": "Adaptive Range Coding \n\nBruce E. Rosen, James M. Goodwin, and Jacques J. Vidal \n\nDistributed Machine \n\nIntelligence Laboratory \n\nComputer Science Department \n\nUniversity of California, Los Angeles \n\nLos Angeles, CA 90024 \n\nAbstract \n\nthese \n\nto neuron-like processing elements. \n\"neurons\" \n\nThis paper examines a class of neuron based \nthat rely on \nlearning systems for dynamic control \nadaptive range coding of sensor inputs. \nSensors are \nassumed \nto provide binary coded range vectors that \ncoarsely describe the system state. These vectors are \nOutput \ninput \ndecisions generated by \nturn \nthe system state, subsequently producing new \naffect \ninputs. \nthe \nintervals and \nenvironment are \nevaluated. The neural weights as well as the ran g e \nb 0 u n dar i e s determining \nthe output decisions are \nthen altered with \nfuture \nPreliminary \nreinforcement \nfrom \nthe promise of adapting \"neural \nexperiments show \nreceptive \nlearning dynamical control. \nThe observed performance with this method exceeds \nthat of earlier approaches. \n\nthe goal of maximizing \nthe environment. \n\nReinforcement \n\nsignals \nreceived at various \n\nfields\" when \n\nin \n\nfrom \n\n486 \n\n\f1 INTRODUCTION \n\nAdaptive Range Coding \n\n487 \n\nRange coding \n\nin \n\ntask \n\naltering \n\nthe \n\nlearning \n\nand \n\nthat \n\ncontract \n\nfrequently active \n\ninitial partitioning \n\nto \n\nand \n\nexpand \n\nregions/ranges. \n\nregions/ranges contractt \n\n(Barto 1982) for \n\nthe same control \n\nleast adequatet regions \nrequire \n\nthat \n\ncriticism of unsupervised \n\nA major \ncontrol \ntechniques such as those used by Barto et al. (Bartot 1983) and by \nAlbus (Albus t 1981) is the need for a priori selection of region \nin principle generalizes \nsizes for range coding. \ninputs and reduces computational and storage overheadt but \nthe \nboundary partitioningt determined a priori t is often non-optimal \nin (Barto t 1983) differ from \n(for examplet the ranges described \ntask differ). \nthose used \nDetermination of nearly optimalt or at \nis \nleft as an additional \nthe system \nthat would \ndynamics be analyzedt which is not always possible. \nthis problem t we move region boundaries adaptively t \nTo address \nprogressively \na more \nappropriate representation with no need for a priori knowledge. \nUnlike previous work (Michiet 1968)t (Barto t 1983)t (Andersont \n1982) which used fixed coderSt this approach produces adaptive \nDuring \ncoders \nadaptation t \nreducing \nthey will be activated, and \nthe number of situations \nreceive \nincreasing \ninput \nThis class of self-organization \nin \nThe resulting \nKohonen (Kohonen t 1984)t (Rittert 1986t 1988). \nthe environmental \nself-organizing mapping will \nrange coding \ninput probability density \ncreates \na \nare distributed \naccording \nresources can be \nallocated to critical areas of the state space. Concentrated activity \nis more finely discriminated and corresponding control decisions \nare more finely \ntuned. \nDynamic shaping of \nwithout sacrificing memory or learning speed. \nregion boundaries \nfinally determined \nenvironmental dynamics t optimal a priori ranges and \nspecifications are not necessary. \nAs an examplet consider a one dimensional state spacet as shown \nIt is is partitioned into three regions by the \nin figures 1 a and 1 b. \nvertical \nlines shown. \ntheoretical \n(unknown a priori) of a state space \noptimal control \nwhich the weight in each region should approximate. The dashed \nhorizontal \nthe \n\nregion boundaries can be achieved \nthe \nthe \nregIOn \n\nthe best \n\nlearned weight values \n\nThe heavy curve \n\nindicates a \n\nto \n\nregional activity \n\nlevel. More \n\nthe chances \n\nthat neighboring \n\nAlso t since \nsolely \nby \n\nfocusing mechanism. \n\ntrack \nAdaptive \n\ntend \n\nto \n\nfunction. \n\nregions will \n\nis discussed \n\nsurface \n\nlines show \n\nthe \n\nare \n\nin which \n\ninstead. \n\nResources \n\nfor \n\n\f488 \n\nRosen, Goodwin, and Vidal \n\nrespective partitionings. Weight values approximate \nvalue of the true control surface weight in each of the regions. \n\nthe mean \n\nWeight \n\nWeight \n\nstate space \n\nFigure 1a \n\nEven Region Partition \n\nstate space \n\nFigure \n\n1b \n\nAdapted Region Partition \n\nin \nAn evenly partitioned space produces \nfigure 1 a. \nFigure 1 b shows the regions after the boundaries have \nbeen adjusted. and the final weight values. Although the weights \nin both 1a and 1b reflect the mean of the true control surface (in \ntheir \nto \nrepresent the ideal surface with a smaller mean squared error. \n\nadaptive partitioning \n\nthe weights shown \n\nrespective \n\nregions). \n\nable \n\nis \n\n2 ADAPTIVE RANGE CODING RULE \n\nis determined by \n\nits average activity. \n\nThe polytope shape \n\ninitial n dimensional prism \n\nthe more general n dimensional control problem using \nFor \nthe shape of each region can change \nadaptive range boundaries. \nan n dimensional \nfrom \nan \nthe current \npolytope. \nThe heuristic for our \nactivation state and \nrange coding is to move each region vertex towards or \nadaptive \nthe \naway \nthe \nrei nf 0 r c e men t. \nregIOn \nthe weight alteration formula \nboundary \nused by Kohonen's \nEach \nregion (i) consists of 2n vertices (V ij 0.95). \nFigure 4 shows a comparison of the average performance values \nof the 100 ASE/ACE and Adaptive Range Coding (ARC) runs. Pole \nbalancing time is shown as a function of the number of learning \ntrials experienced. \n\ntwo performance sets \n\nthe \n\nPole Balancing Average Performances \n\n20000 \n\n18000 \n\n16000 \n\n14000 \n\n12000 \n\nRun Time \n\n10000 \n\n8000 \n\n6000 \n\n4000 \n\n2000 \n\n0 \n\n1.'1 \".\"\" \u2022\u2022 \n\n. \" ... .\" .. \n\n... . \n\nI'\u00b7 ... \n\n. ' \n... \n\n..... n. t \u2022 ttlL \n\nI\" \n\n.1 \u2022\u2022 \n\n\u2022 ' I .' \n\nt: .\" \n\nI \n..... \n\" \n0, \n\n.1111-\n\nEt e \n\n,.(cid:173)\n............. \n\n- ASE/ACE \n\na ARC \n\n0 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n60 \n\n70 \n\n80 \n\n90 \n\n100 \n\nTrial Number \n\nFigure 4: Comparison of the ASE/ACE and Adaptive \nRange Coding learning rates on \ntask. \nfunction of \nPole balancing \nlearning trials. Results are averaged over 100 runs. \n\nthe cart pole \n\nshown \n\ntime \n\nas \n\nis \n\nto \n\nthe \n\nrun \n\ntwo different \nThe disparity between \nlarge number of failures \nalgorithms is due \nindicates no \nof \nthe ASE/ ACE \nrates or performance \nsignificant difference \nlevels of the successful \nto \nbelieve that adaptive range coding may lead to an \"all or none\" \n\nruns between categories, \n\nStatistical analysis \n\nthe comparatively \n\nleading us \n\ntimes of \n\nlearning \n\nsystem. \n\nthe \n\nthe \n\nin \n\n\f492 \n\nRosen, Goodwin, and Vidal \n\nbehavior, and that there is a mInImum area of the state space that \nthe system must explore to succeed. \n\n4 CONCLUSION \n\nresearch has \n\nthat neuron-like \n\nThe \nadjustable regions can dynamically create \neffect maps reflecting the control laws of dynamic systems. \nanticipated from \nadaptive range coding will be more effective \nregion approaches \nunknown dynamics. \n\nelements with \ntopological cause and \nIt is \nthe results of the examples presented above, that \nthan earlier static \nsystems with \n\nthe control of complex \n\nshown \n\nin \n\nReferences \n\nJ. S. Albus. (1981) Brains, Behavior, and Robotics, Peterburough, \nNH: McGraw-Hill Byte Books. \nC. W. Anderson. \n(1982) Feature generation and Selection by a \nLayered Network of Reinforcement Learning Elements: Some \nInitial Experiments, Technical Report COINS 82-12. Amherst, MA: \nUniversity of Massachusetts, Department of Computer \nand \nInformation Science. \nA. Barto, R. Sutton, and C. Anderson. (1982) Neuron-like elements \nlearning control problems. Coins Tech. \nthat can solve difficult \nRept. No. 82-20. Amherst, MA: University of Massachusetts, \nDepartment of Computer and Information Science. \nA. G. Barto, R. S. Sutton, and C. W. Anderson. (1983) Neuron-like \nelements that can solve difficult \nlearning control problems, lEE E \nTransactions on Systems, Man, and Cybernetics, 13(5): 834-846. \nT. Kohonen. (1984) Self-Organization and Associative Memory, \nNew York: Springer-Verlag. \nD. Michie and R. Chambers. \nEdinburgh: Oliver and Boyd. \nH. Ritter and K. Schulten. (1986) Topology Conserving Mappings \nIn J. S. Denker (ed.), Neural Networks \nfor Learning Motor Tasks. \nfor Computing. Snowbird, Utah: AlP. \nH. Ritter and K. Schulten. \nOrganizing Mapping Algorithm \nR. Eckmiller (ed.), Neural Computers. Springer-Verlag. \n\n(1988) Extending Kohonen's Self(cid:173)\nto Learn Ballistic Movements. \n\n(1968) Machine \n\nIntelligence \n\nIn \n\n\f", "award": [], "sourceid": 321, "authors": [{"given_name": "Bruce", "family_name": "Rosen", "institution": null}, {"given_name": "James", "family_name": "Goodwin", "institution": null}, {"given_name": "Jacques", "family_name": "Vidal", "institution": null}]}