{"title": "Learning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance", "book": "Advances in Neural Information Processing Systems", "page_first": 342, "page_last": 349, "abstract": null, "full_text": "Learning Spatio-Temporal Planning from \n\na Dynamic Programming Teacher: \n\nFeed-Forward N eurocontrol for Moving \n\nObstacle A voidance \n\nGerald Fahner * \n\nDepartment of Neuroinformatics \n\nUniversity of Bonn \n\nRomerstr. 164 \n\nRolf Eckmiller \n\nDepartment of Neuroinformatics \n\nUniversity of Bonn \n\nRomerstr. 164 \n\nW -5300 Bonn 1, Germany \n\nW-5300 Bonn 1, Germany \n\nAbstract \n\nWithin a simple test-bed, application of feed-forward neurocontrol \nfor short-term planning of robot trajectories in a dynamic environ(cid:173)\nment is studied. The action network is embedded in a sensory(cid:173)\nmotoric system architecture that contains a separate world model. \nIt is continuously fed with short-term predicted spatio-temporal \nobstacle trajectories, and receives robot state feedback. The ac(cid:173)\ntion net allows for external switching between alternative plan(cid:173)\nning tasks. It generates goal-directed motor actions - subject to \nthe robot's kinematic and dynamic constraints - such that colli(cid:173)\nsions with moving obstacles are avoided. Using supervised learn(cid:173)\ning, we distribute examples of the optimal planner mapping over \na structure-level adapted parsimonious higher order network. The \ntraining database is generated by a Dynamic Programming algo(cid:173)\nrithm. Extensive simulations reveal, that the local planner map(cid:173)\nping is highly nonlinear, but can be effectively and sparsely repre(cid:173)\nsented by the chosen powerful net model. Excellent generalization \noccurs for unseen obstacle configurations. We also discuss the limi(cid:173)\ntations of feed-forward neurocontrol for growing planning horizons. \n\n*Tel.: (228)-550-364 \n\nFAX: (228)-550-425 \n\ne-mail: gerald@nero.uni-bonn.de \n\n342 \n\n\fLearning Spatio-Temporal Planning from a Dynamic Programming Teacher \n\n343 \n\n1 \n\nINTRODUCTION \n\nGlobal planning of goal directed trajectories subject to cluttered spatio-temporal, \nstate-dependent constraints - as in the kinodynamic path planning problem (Don(cid:173)\nald, 1989) considered here - is a difficult task, probably best suited for systems with \nembedded sequential behavior; theoretical insights indicate that the related prob(cid:173)\nlem of connectedness is of unbounded order (Minsky, 1969). However, considering \npractical situations, there is a lack of globally disposable constraints at planning \ntime, due to partially unmodelled environments. The question then arises, to what \nextent feed-f )rward neurocontrol may be effective for local planning horizons. \nIn this paper, we put aside problems of credit assignment, and world model identi(cid:173)\nfication. We focus on the complexity of representing a local version of the generic \nkinodynamic path planning problem by a feed-forward net. We investigate the \ncapacity of sparse distributed planner representations to generalize from example \nplans. \n\n2 ENVIRONMENT AND ROBOT MODELS \n\n2.1 ENVIRONMENT \n\nThe world around the robot is a two-dimensional scene, occupied by obstacles mov(cid:173)\ning all in parallel to the y-axis, with randomly choosen discretized x-positions, and \nwith a continuous velocity spectrum. The environment's state is given by a list \nreporting position (Xi,Yi) E (X,Y), X E {0, ... ,8}, Y = [y-,y+], and velocity \n(0, Vi) ; Vi E [v- ,v+] of each obstacle i. The environment dynamics is given by \n\n(1) \n\nObstacles are inserted at random positions, and with random velocities, into some \nregion distant from the robot's workspace. At each time step, the obstacle's posi(cid:173)\ntions are updated according to eqn.(l), so that they will cross the robot's workspace \nsome time. \n\n2.2 ROBOT \n\nWe consider a point-like robot of unit mass, which is confined to move within some \ninterval along the x-axis. Its state is denote~. by (xr,xr) E (X,X);X = {-1,0, I}. \nAt each time step, a motor command u E X = {-I, 0, I} is applied to the robot. \nThe robot dynamics is given by \n\nXr(t + 1) \nzr(t + 1) \n\n= xr(t) + u(t) \n= zr(t) + xr(t + 1) . \n\n(2) \n\nNotice that the set of admissible motor commands depends on the present robot \nstate. With these settings, the robot faces a fluctuating number of obstacles crossing \nits baseline, similar to the situation of a pedestrian who wants to cross a busy street \n(Figure 1). \n\n\f344 \n\nFahner and Eckmiller \n\ndyno.MiC \nobsto.cles \n\no \n\no \n\nrobot \n\ngOo.l \n\nFigure 1: Obstacles Crossing the Robot's Workspace \n\n3 SYSTEM ARCHITECTURE AND FUNCTIONALITY \n\nAdequate modeling of the perception-action cycle is of decisive importance for the \ndesign of intelligent reactive systems. We partition the overall system into two \nmodules: an active Perception Module (PM) with built-in capabilities for short-term \nenvironment forecasts, and a subsequent Action Module (AM) for motor command \ngeneration (Figure 2). Either module may be represented by a 'classical' algorithm, \nor by a neural net. PM is fed with a sensory data stream reporting the observed \n\nsens~ \n\ninfor~ \n\nPerception \n\nMoclule \n\nlon9-\nterM \ngoal \n\nroloot \nstate \n\nJJJJ \n\ninterno.l \n\nrepresenta tion \n\nAction \nMoclule \n\nJJ Motor \n\nCOMMancl \n\nFigure 2: Sensory-Motoric System Architecture \n\ndynamic scene of time-varying obstacle positions. From this, it assembles a spatio-\n\n\fLearning Spatio-Temporal Planning from a Dynamic Programming Teacher \n\n345 \n\ntemporal internal representation of near-future obstacle trajectories. At each time \nstep t, it actnalizes the incidence function \noccupancy(x, k) = { _11 \n\n(x = Xi and - s < Yi(t + k) < s) for any obstacle i \notherwise, \n\nwhere s is some safety margin accounting for the y-extension of obstacles. The \nincidence furlction is defined on a spatio-temporal cone-shaped cell array, based at \nthe actual rc bot position: \n\nIx - xr(t)1 ~ k ; k = I, .'\" HORIZON \n\n(3) \nThe opening angle of this cone-shaped region is given by the robot's speed limit \n(here: one cell per time step). Only those cells that can potentially be reached by \nthe robot within the local prediction-/planning horizon are thus represented by PM \n(see Figure 3). The functionality of AM is to map the current PM representation to \n\nx \n\nI, \n\n,. ..... \n\ni--';\" \n\n/ \n(~o 0 \n\nx ... \n\n/ \n\n4Ir \nr2J \n\nT \n\n~ 0 \n\n~@] 0 \n\n,~I---I\u00a3J \nT \n1 \n\no \n\n~ ./ [5J \n\nT \n\n2 \n\n3 \n\n[3]-[3]-\n\n.... , \n\nFigure 3: Space-Time Representation with Solution Path Indicated \n\nan appropriate robot motor command, taking into account the present robot state, \nand paying regard to the currently specified long-term goal. Firstly, we realize \nthe optimal AM by the Dynamic Programming (DP) algorithm (Bellman, 1957). \nSecondly, we use supervised learning to distribute optimal planning examples over \na neural network. \n\n4 DYNAMIC PROGRAMMING SOLUTION \n\nGiven PM's internal representation at time t, the present robot state, and some \nspecification of the desired long-term goal, DP determines a sequence of motor \ncommands minimizing some cost functional. Here we use \n\ncost{u(t), ... ,u(t+HORIZON)} = L:: \n\nHORIZON \n\n(xr(t + k) - xo)2 + c u(t + k)2 , \n\n(4) \n\nk=O \n\n\f346 \n\nFahner and Eckmiller \n\nwith xr(t + k) given by the dynamics eqns.(2) (see solution path in Figure 3). By \nxo, we denote the desired robot position or long-term goal. Deviations from this \nposition are punished by higher costs, just as are costly accelerations. Obstacle \ncollisions are excluded by restricting search to admissible cells (x, X, t + k )admiuible \nin phase-space-time (obeying occupancy(x,t+k) = -1). Training targets for timet \nare constituted by the optimal present motor actions uopt(t), for which the minimum \nis attained in eqn.( 4). For cases with degenerated optimal solutions, we consistently \nbreak symmetry, in order to obtain a deterministic target mapping. \n\n5 NEURAL ACTION MODEL \n\nFor neural motor command generation, we use a single layer of structure-adapted \nparsimonious Higher Order Neurons (parsiHONs) (Fahner, I992a, b), computing \noutputs Yi E [0,1] ; i = 1,2,3. Target values for each single neuron are given by \nyfe& = 1, if motor-action i is the optimal one, otherwise, yfe& = 0. As input, each \nneuron receives a bit-vector x = Xl, ... ,XN E {-I, I}N, whose components specify \nthe values of PM's incidence function, the binary encoded robot state, and some \ntask bits encoding the long-term goal. Using batch training, we maximize the log(cid:173)\nlikelihood criterion for each neuron independently. For recall, the motor command \nis obtained by a winner-takes-all decision: the index of the most active neuron yields \nthe motor action applied. \nGenerally, atoms for nonlinear interactions within a bipolar-input HON are mod(cid:173)\nelled by input monomials of the form \n\nN \n\n1]Ot = II xji ; Cl' = Cl'l ... Cl'N E n = {O, I}N . \n\ni=1 \n\n(5) \n\nHere, the ph bit of Cl' is understood as exponent of Xi. It is well known that the \ncomplete set of monomials forms a basis for Boolean functions expansions (Kar(cid:173)\npovski, 1976). Combinatorial growth of the number of terms with increasing input \ndimension renders allocation of the complete basis impractical in our case. More(cid:173)\nover, an action model employing excessive numbers of basis functions would overfit \ntrainig data, thus preventing generalization. \nWe therefore use a structural adaptation algorithm, as discussed in detail in (Fah(cid:173)\nner, I992a, b), for automatic identification and inclusion of a sparse set of relevant \nnonlinearities present in the problem. In effect, this algorithm performs a guided \nstochastic search exploring the space of nonlinear interactions by means of an in(cid:173)\ntertwined process of weight adaptation, and competition between nonlinear terms. \nThe parsiHON model restricts the number of terms used, not their orders: instead \nof the exponential size set {1]Ot : Cl' En}, just a small subset {1]{3 : /3 ESC n} of \nterms is used within a parsimonious higher order function expansion \n\nye,t(x) = f [2: w{31]{3(X)] \n\n{3ES \n\n; w{3 E 1R . \n\n(6) \n\nHe~'e, f denotes the usual sigmoid transfer function. \nparsiHONs with high degrees of sparsity were effectively trained and emerged robust \ngeneralization for difficult nonlinear classification benchmarks (Fahner, I992a, b). \n\n\fLearning Spatia-Temporal Planning from a Dynamic Programming Teacher \n\n347 \n\n6 SIMULATION RESULTS \n\nWe performed extensive simulations to evaluate the neural action network's capabil(cid:173)\nities to generalize from learned optimal planning examples. The planner was trained \nwith respect to two alternative long-term goals: XO = 0, or XO = 8. Firstly, optimal \nDP planner actions were assembled over about 6,000 time steps of the simulated en(cid:173)\nvironment (fa.irly crowded with moving obstacles), for both long-term goals. At each \ntime step, optimd motor commands were computed for all 9 x 3 = 27 available robot \nstates. From this bunch of situations we excluded those, where no collision-free \npath existed within the planning horizon considered: (HORIZON = 3). A total \nof 115,000 admissible training situations were left, out of the 6,000 x 27 = 162,000 \none's generated. Thus, out of the full spectrum of robot states which were checked \nevery time step, just about 19 states were not doomed to collide, at an average. \nThese findings corrobate the difficulty of the choosen task. \nMany repetitions are present in these accumulated patterns, reflecting the statistics \nof the simulated environment. We collapsed the original training set by remov(cid:173)\ning repeated patterns, providing the learner with more information per pattern: a \nworking data base containing about 20.000 different patterns was left. \nInput to the neural action net consisted of a bit-vector of length N = 21, where \n3 + 5 + 7 bits encode PM's internal representation (cone size in Figure 3), 6 bits \nencode the robot's state, and a single task bit reports the desired goal. For train(cid:173)\ning, we delimited single neuron learning to a maximum of 1000 epochs. In most \ncases, this was sufficient for successful training set classification for any of the three \nneurons (Yi < .8 for yfe& = 0, and Yi > .8 for yfe& = 1 ; i = 1,2,3). But even if \nsome training patterns were misclassified by individual motor neurons, additional \nrobustness stemming from the winner-takes-all decision rescued fault-free recall of \nthe voting community. To test generalization of the neural action model, we par-\n\n6 \n\n5 \n\n3 \n\n2 \n\nc \n\n+ \n\no \n\n\" ... \nII> .(cid:173).(cid:173).. a. \n\n'0 \n\nII> ... ... .. \n\" ., .. .... \n\" ., \ne \n'\" o \n0-.. ... \nII> \n\" ... \nII> a. \n\nC \nII> \n\na)-HON \n9)-HON \n9)-HON \n93-HON \nllO-HON \n\n'\" \nllO-HON \u2022 \n\nUO-HON \n\n0 \n+ \nC \n)( \n\n0 \n\no \n+ \n\no \n+ \n\n)( \n\nt \n\n.. \n\nO~----~----~----~----~----~----~----~~ \n\n10 \n\"1000 \n\n12 \n\n14 \n\no \n\n2 \n\na \n4 \n5ize of tra~nin9 set \n\n6 \n\nFigure 4: Generalization Behavior \n\ntitioned the data base into two parts, one containing training patterns, the other \n\n\f348 \n\nFahner and Eckmiller \n\ncontaining new test patterns, not present in the training set. Several runs were \nperformed with parsiHONs of sizes between 83 and 110 terms. Results for varying \ntraining set sizes are depicted in Figure 4. Test error decreases with increasing \ntraining set size, and falls as low as about one percent for about 12,000 training \npatterns. It continues to decrease for larger training sets. These findings corrobate \nthat the trained architectures emerge sensible robust generalization. \nTo get some insight into the complexity of the mapping, we counted the number \nof terms which carry a given order. The resulting distribution has its maximum at \norder 3, exhibits many terms of orders 4 and higher, and finally decreases to zero for \noruers exceeding 10 (Figure 5). This indicates that the planner mapping considered \nis highly nonlinear. \n\no .25 r------~----_,_----__,_----_..,.-___, \naveraged over several ne~~orks ~ \n\n'\" u \nc \n(1/ \n::I \nIT \n(II ... \n.... \n\n(II \n\n> ... .., ., .... \n(II .. \n\n0.2 \n\n0.15 \n\n0.1 \n\n0.05 \n\no~------~----~~~ ______ -+ ________ ~ __ ~ \n\no \n\n5 \n\n10 \n\norder \n\n15 \n\n20 \n\nFigure 5: Distribution of Orders \n\n7 DISCUSSION AND CONCLUSIONS \n\nSparse representation of planner mappings is desirable when representation of com(cid:173)\nplete policy look-up tables becomes impracticable (Bellman's \"curse of dimensional(cid:173)\nity\"), or when computation of plans becomes expensive or conflicting with real-time \nrequirements. For these reasons, it is urgent to investigate the capacity of neurocon(cid:173)\ntrol for effective distributed representation and for robust generalization of planner \nmappmgs. \nHere, we focused on a new type of shallow feed-forward action network for the local \nkinodynamic trajectory planning problem. An advantage with feed- forward nets \nis their low-latency recall, which is an important requirement for systems acting in \nrapidly changing environments. However, from theoretical considerations concern(cid:173)\ning the related problem of connectedness with its inherent serial character (Minsky, \n1969), the planning problem under focus is expected to be hard for feed-forward \nnets. Even for rather local planning horizons, complex and nonlinear planner map-\n\n\fLearning Spatio-Temporal Planning from a Dynamic Programming Teacher \n\n349 \n\npings must be expected. Using a powerful new neuron model that identifies the \nrelevant nonlinearities inherent in the problem, we determined extremely parsimo(cid:173)\nnious architectures for representation of the planner mapping. This indicates that \nsome compact set of important features determines the optimal plan. The adapted \nnetworks emerged excellent generalization. \nWe encourage use of feed-forward nets for difficult local planning tasks, if care is \ntaken that the models support effective representation of high-order nonlinearities. \nFor growing planning horizons, it is expected that feed-forward neurocontrol will \nrun into limitatioml (Werbos, 1992). The simple test-bed presented here would al(cid:173)\nlow for inser tion a.Dd testing also of other net models and system designs, including \nrecurrent networks. \n\nAcknowledgements \n\nThis work was supported by Federal Ministry of Research and Technology (BMFT(cid:173)\nproject SENROB), grant 01 IN 105 AID) \n\nReferences \n\nE. B. Baum, F. Wilczek (1987). Supervised Learning of Probability Distributions \nby Neural Networks. In D. Anderson (Ed.), Neural Information Processing Systems, \n52-61. Denver, CO: American Institute of Physics. \nR. E. Bellman (1957). Dynamic Programming. Princeton University Press. \nB. Donald (1989). Near-Optimal Kinodynamic Planning for Robots With Coupled \nDynamic Bounds, Proc. IEEE Int. Conf. on Robotics and Automation. \nG. Fahner, N. Goerke, R. Eckmiller (1992). Structural Adaptation of Boolean \nHigher Order Neurons: Superior Classification with Parsimonious Topologies, Proc. \nICANN, Brighton, UK. \n\nG. Fahner, R. Eckmiller. Structural Adaptation of Parsimonious Higher Order \nClassifiers, subm. to Neural Networks. \nM. G. Karpovski (1976). Finite Orthogonal Series in the Design of Digital Devices. \nNew York: John Wiley & Sons. \nM. Minsky, S. A. Papert (1969). Perceptrons. Cambridge: The MIT Press. \nP. Werbos (1992). Approximate Dynamic Programming for Real-Time Control and \nNeural Modeling. In D. White, D. Sofge (eds.) Handbook of Intelligent Control, \n493-525. New York: Van Nostrand. \n\n\f", "award": [], "sourceid": 595, "authors": [{"given_name": "Gerald", "family_name": "Fahner", "institution": null}, {"given_name": "Rolf", "family_name": "Eckmiller", "institution": null}]}