{"title": "Neural Network On-Line Learning Control of Spacecraft Smart Structures", "book": "Advances in Neural Information Processing Systems", "page_first": 303, "page_last": 310, "abstract": null, "full_text": "Neural Network On-Line Learning Control \n\nof Spacecraft Smart Structures \n\nDr. Christopher Bowman \nBall Aerospace Systems Group \n\nP.O. Box 1062 \n\nBoulder. CO 80306 \n\nAbstract \n\nThe overall goal is to reduce spacecraft weight. volume, and cost by on(cid:173)\nline adaptive non-linear control of flexible structural components. The \nobjective of this effort is to develop an adaptive Neural Network (NN) \ncontroller for the Ball C-Side 1m x 3m antenna with embedded actuators \nand the RAMS sensor system. A traditional optimal controller for the \nmajor modes is provided perturbations by the NN to compensate for \nunknown residual modes. On-line training of recurrent and feed-forward \nNN architectures have achieved adaptive vibration control with \nunknown modal variations and noisy measurements. On-line training \nfeedback to each actuator NN output is computed via Newton's method \nto reduce the difference between desired and achieved antenna positions. \n\n1 ADAPTIVE CONTROL BACKGROUND \nThe two traditional approaches to adaptive control are 1) direct control (such as perfonned \nin direct model reference adaptive controllers) and 2) indirect control (such as performed by \nexplicit self-tuning regulators). Direct control techniques (e.g. model-reference adaptive \ncootrul) provide good stability however are susceptible to noise. Whereas indirect control \ntechn;'q~es (e.g. explicit self-tuning regulators) have low noise susceptibility and good \nconvergence rate. However they require more control effort and have worse stability and \nare less roblistto mismodeling. NNs synergistically augment traditional adaptive control \ntechniques by providing improved mismodeling robustness both adaptively on-line for \ntime-varying dynamics as well as in a learned control mode at a slower rate. \n\nThe NN control approaches which correspond to direct and indirect adaptive control are \ncommonly known as inverse and forward modeling. respectively. More specifically, aNN \nwhich maps the plant state and its desired perfonnance to the control command is called \nan inverse model, a NN mapping both the current plant state and control to the next state \nand its performance is called the forward model. \n\nWhen given a desired performwce and the current state. the inverse model generates the \ncontrol. see Figure 1. The actual perfonnance is observed and is used to train/update the \ninverse model. A significant problem occurs when the desired and achieved perfonnance \ndiffer greatly since the model near the desired slate is not changed. This condition is \ncorrected by adding random noise to the control outputs so as to extend the state space \n\n303 \n\n\f304 \n\nBowman \n\nbeing explored. However, this correction has the effect of slowing the learning and \nreducing broadband stability. \n\neasurements \n\n, .. ---=.,:-:-=.....n..:=; __ ~-----' \n\nTrainin \n\nn;o:uu;u.;k \n\n\"\" \n\n,.\" \n\nlnV_~;\"\"1 \nNel!Pll'Controller ~-~ Structures \n... \" \n\nCon \n\n1 Nonlinear r~ s .... \"\"' 1 L.!!.:.~-- ents \n\nx \n\nFilters \n\nY \n\nPrevious controls and stale measurements \n\nFigure 1: Direct Adaptive Control Using Inverse Modeling Neural Network Controller \n\nTrainin \nFeedback \n\n\" \n\" \n\" \" \n\nements \ny \n\nII~F~=fI-..1 Current and \n\nProvisions state \n\n\"\" \n\nI':\"\" -lnv-I-~-'M\"'I\"\"ode-I\" Control \nNet.Tl!JtControlier \n\nu \n\nI' \n\npr;.::ev.;.;l;.;;,O.::.;Us;..;c;.;o;;.;n.;;.tro~ ___ .....;;meas=;.;;urements \n\n1 N _ ~i -I ~M=eas=ur=e=m=en=ts=~ \n\nStructures \n\nFillers \n\nx \n\ny \n\nPrevious controls and stale measurements \n\nFigure 2: Dual (Indirect and Direct) Adaptive Control Using Forward Modeling Neural \n\nNetwork State Predictor To Aid Inverse Model Convergence \n\nFor forward modeling the map from the current control and state to the resulting state and \nperformance is learned, see Figure 2. For cases where the performance is evaluated at a \nfuture time (i.e. distal in time), a predictive critic [Barto and Sutton, 1989] NN model is \nlearned. In both cases the Jacobian of this performance can be computed to iteratively \ngenerate the next control action. However, this differentiating of the critic NN for back(cid:173)\npropagation training of the controller network is very slow and in some cases steers the \nsearching the wrong direction due to initial erroneous forward model estimates. As the NN \nadapts itself the performance flattens which results in the slow halting of learning at an \n\n\fNeural Network On-Line Learning Control of Spacecraft Smart Structures \n\n305 \n\nunacceptable solution. Adding noise to the controller's output [Jordan and Jacobs, 1990] \nbreaks the redundancy but forces the critic to predict the effects of future noise. This \nproblem has been solved by using a separately trained intermediate plant model to predict \nthe next state from the prior state and control while having an independent predictor model \ngenerate the performance evaluation from the plant model predicted state [Werbos, 1990] \nand [Brody, 1991]. The result is a 50-100 fold learning speed improvement over \nreinforcement training of the forward model controller NN. \nHowever, this method still relies on a \"good\" forward model to incrementally train the \ninverse model. These incremental changes can still lead to undesirable solutions. For \ncontrol systems which follow the stage 1,2 or 3 models given in [Narendra, 1991) the \ncontrol can be analytically computed from a forward-only model. For the most general, \nnon-linear (stage 4) systems, an alternative is the memory-based forward model [Moore, \n1992]. Using only a forward NN model, a direct hill-climbing or Newton's method search \nof candidate actions can be applied until a control decision is reached. The resulting state \nand its performance are used for on-line training of the forward model. Judicial random \ncontrol actions are applied to improve behavior only where the forward model error is \npredicted to be large (e.g. via cross-validation). Also using robust regression, experiences \ncan be deweighted according to their quality and their age. The high computational burden \nof these cross-validation techniques can be reduced by parallel on-line processing \nproviding the \"policy\" parameters for fast on-line NN control. \nFor control problems which are distal in time and space, a hybrid of these two forward(cid:173)\nmodeling approaches can be used. Namely, a NN plant model is added which is trained \noff-line in real-time and updated as necessary at a slower rate than the on-line forward \nmodel which predicts performance based upon the current plant model. This slower rate \ntrained forward-model NN supports learned control (e.g. via numerical inversion) whereas \nthe on-line forward model provides the faster response adaptive control. Other NN control \ntechniques such as using a Hopfield net to solve the optimal-control quadratic(cid:173)\nprogramming problem or the supervised training of ART II off-line with adaptive \nvigilance for on-line pole placement have been proposed. However, their on-line \nrobustness appears limited due to their sensitivity to a priori parameter assumptions. \nA forward model NN which augments a traditional controller for unmodeled modes and \nunforeseen situations is presented in the following section. Performance results for both \nfeed-faward and current learning versions are compared in Section 3. \n\n2 RESIDUAL FORWARD MODEL NEURAL NETWORK \n\n(RFM-NN) CONTROLLER \n\nA type of forward model NN which acts as a residual mode mter to support a reduced-order \nmodel (ROM) traditional optimal state controller has been evaluated. see Figure 3. The \nROM determines the control based upon its model coordinate approximate representation \nof the structure. Model coordinates are obtained by a transformation using known primary \nvibration modes, [Young, 1990]. The transformation operator is a set of eigenvectors \n(mode shapes) generated by finite element modeling. The ROM controller is traditionally \naugmented by a residual-mode mter (RMF). Ball's RFM-NN Ball's RFM-NN replaces the \nRMF in order to better capture the mismodeled. unmodeled and changing modes. \nThe objective of the RFM-NN is to provide ROM controller with ROM derivative state \nperturbations, so that the ROM controls the structure as desired by the user. The RFM(cid:173)\nNN is trained on-line using scored supervised feedback to generate these desired ROM \nstate perturbations. The scored supervised training provides a score for each perturbation \noutput based upon the measured position of the structure. The measured deviations, Y*(t), \nfrom the desired structure position are converted to errors in the estimated ROM state \nusing the ROM. transformation. Specifically, the training score, S(t), for each ROM \nderivative state XN (t) is expressed in the following discrete equation: \n\nS(t) = BN Y * (t) - xN(t) \n\n.. \n\nwhere *N(t) = [AN + BNGN - KNCN]XN(t -1) + KN Y(t -1) \n\n\f306 \n\nBowman \n\n;\" ,\" .. :,::..-:.;\" .. :: \n\n. \n\n\" \n\n, . . ; .} ::: ': .. : \", \n\nFigure 3: Residual Forward Model Neural Network Adaptive Controller Replaces \n\nTraditional Residual Mode Filter \n\nNewton's method is then applied to find lbe 0* (1) ROM state ~ations which zero \nthe score. First, the score is smoothed, Set) = ~S(t -1) + (1- o)S(t) and the neural \nnetwork output is smoothed similarly. Second, Newton's method computes the \nadjusbnents needed to zero the scores, \n\n~(O*N(t\u00bb = -S(t)(8iN(t) - 8iN(t -1\u00bb I [S(t) - Set -1)] \n\n= -EXN(t) (if either difference = 0) \n\nThird, the NN is trained, 8*T(t + 1) = ~(8iN(t\u00bb + 8iN(t) with the appropriate \nlearning rate, a (e.g. approximation to inverse of largest eigenvalue of the Hessian \nweight matrix). \n\n3 RFM-NN ON-LINE LEARNING RESULTS \nBoth feed-forward and recurrent RFM-NNs have been incorporated into an interactive \nsimulation of Ball's Control-Structure Interaction Demonstration Experiment (C-SIDE) \nsee Figure 4. This 1m x 3m lightweight antenna facesheet has 8 embedded actuators plus \nthree auxiliary input actuators and uses 8 remote angular measurement sensors (RAMS) \nplus 4 displacement and 3 velocity auxiliary sensors. In order to evaluate the on-line \nperformance of the RFM-NNs the ROM controller was given insufficient and partially \nincorrect modes. The ROM without the RFM-NN grew unstable (i.e. greater than 10 \nmillimeter C-SIDE displacements) in 13 seconds. The initial feed-forward RFM-NN used \n8 sensor and 6 ROM state feedback estimate inputs as well as 5 hidden units and 3 ROM \nvelocity state perturbation outputs. This RFM-NN had random initial weights, logistic \n\n\fNeural Network On-Line Learning Control of Spacecraft Smart Structures \n\n307 \n\nactivation functions. and back-propagation training using one sixth the learning rate for \nthe output layers (e.g .. 06 and .01). Newton RFM-NN training search used a step size of \none with smoothing factor of one tenth. \n\nFigure 4: 1m x 3m C-SIDE Antenna Facesheet With Embedded Actuators. \n\nThis RFM-NN learned on-line to stabilized and reduce vibration to less than \u00b1Imm \nwithin 20 seconds, see Figure 5. A five Newton force applied a few seconds later is \ncompensated for within nine seconds, see Figure 6. This is accomplished with learning \noff as well as when on. To test the necessity of the RFM-NN the ROM was given the \nscored supervised training (Le. Newton's search estimates) directly instead of the RFM(cid:173)\nNN outputs. This caused immediate unstable behavior. To test the RFM-NN sensitivity \nto measurement accuracy a unifonn error of \u00b15% was added. Starting from the same \nrandom weight start the RFM-NN required 25 seconds to learn to stabilize the antenna, \nsee Figure 7. The best stability was achieved when the product of the Newton and BPN \nsteps was approximately .01. This feed-forward NN was compared to an Elman-type \nrecurrent NN (i.e. hidden layer feedback to itself with one-step BP training). The recurrent \nRFM-NN on-line learning stability was much less sensitive to initial weights. The \nrecurrent RFM-NN stabilized C-SIDE with up to 10% - 20% measurement noise versus \n5% - 10% limit for feed-forward RFM-NN. \n\n4 SUMMARY AND RECOMMENDATIONS \nAdaptive smart sbUctures promise to reduce spacecraft weight and dependence on extensive \nground monitoring. A recurrent forward model NN is used as a residual mode fllter to \naugment a traditional reduced-order model (ROM) controller. It was more robust than the \nfeed-forward NN and the traditional-only controller in the presence of unmodeled modes \nand noisy measurements. Further analyses and hardware implementations will be \nperfonned to better quantify this robustness including the sensitivity to the ROM \ncontroller mode fidelity, number of output modes. learning rates, measurement-to-state \nerrors, and time quantization effects. \nTo improve robustness to ROM mode changes a comparison to the dual forward/inverse \nNN control approach is recommended. The forward model will adjust the search used to \ntrain an inverse model which provides control augmentations to the ROM controller. This \nwill enable control searches to occur both off-line faster than real-time using the forward \nmodel (Le. imagination) and on-line using direct search trials with varying noise levels. \nThe forward model will adapt using quality experiences (e.g. via cross validation) which \nimproves inverse models searches. The inverse model reliance on forward model will \nreduce until forward model prediction errors increase. Future challenges. include solving \n\n\f308 \n\nBowman \n\n(-SIll APtU'lcial \"8UNl ttetuaI'k Reai411&1 no .. 1 Cantrall ... \n\nROft State Esti .... tea \n\nROft State Ed inat. AcIjud_nts \n\nI \nI \u00b7 \u2022 \n\nI \n\n/ ) \n/\"f'Xi \nr\\ \n\\ X \\ i i \\, \n\n'0xYJ \n\nFigure 5: RFM-NN On-Line Learning To Achieve Stable Control \n\n(-Sill APtltlcial \"lW'al Hetwal'k Reai411&1 I10MI Cantrall ... \n\nDiapllClIII8IIt lteuvennta (+.(-18I111d \n\n'/\\X~, \n. , \n\n\\ ; \\ . \n\n, \n\nFigure 5: RFM-NN On-Line Learning To Achieve Stable Control (concluded) \n\nthe temporal credit assignment problem, partitioning to restricted chip sizes, combining \nwith incomplete a priori knowledge, and balancing adaptivity of response with long-term \nlearning. The goal is to extend stabiJity-dominated, fixed-goal traditional control with \nadaptive robotic-type neural control to enable better autonomous control where fully(cid:173)\njustified fixed models and complete system knowledge are not required. The resultant \nrobust autonomous control will capitalize on the speed of massively parallel analog \nneural-like computations (e.g. with NN pulse stream chips). \n\n\fNeural Network On-Line Learning Control of Spacecraft Smart Structures \n\n309 \n\nC-SIJI APtificial tteuNl IIetuaI'k Resiaul IIoUI Cantrall ... \nlletworll: \n\nPaus. - Hit \n\nto conti.... \n\nlb.,: 36.4& \n\nFigure 6: 5 Newton Force Vibration Removed Using RFM-NN Learned Forward Model \n\nC-SIJlI APtificial ltauPal tIIriwarIc Reaiwl twal Cantrall ... \nnetwork: \n\nPause.t - Hit \n\nto contb... \n\nfl,.: 25.28 \n\n. ~,~ .. \n... ~.; \n\n' . \n\n.. \nr~ \n\" \n~. l \\. \n\\l \n\n.. \n\\i \n.\\ \n~. \\,: \n,J... \n: -- '. \n\n\"-. / \n\n/ ~ .. , \"t\u00b7\"'\" \n\nI \n\n\\, ... ---/./ \n\nR~ Est inat. Acljustaents \n\n\",. ~~- . \":::=*~-::::. :::::.:::;::::::::::::::-=:-==--=~O:::=' =.::..-.:.-= ..... ~==::::.::::: ..... :::_== \n\n--.~ \n..: ~ --... \n\n~; ~~~ __ ~ ______ .~ __ ~ __ ~.~- .-~. ~.---~.-~--\n\nFigure 7: RFM\u00b7NN Learning to Remove Vibrations in C\u00b7SIDE With \u00b115% Noisy \n\nDisplacement Measurements \n\n\f310 \n\nBowman \n\n5 REFERENCES \nBarto, A.G., Sutton, R.S .\u2022 and Watkins, CJ.C.H., Learning and Sequential Decision \nMaking. Univ. of Mass. at Amherst COINS Technical Report 89-95, September 1989 \nBowman, C.L., Adaptive Neural Networks Applied to Signal Recognition, 3rd Tri(cid:173)\nService Data fusion Symposium, May 1989 \n\nBrody, Carlos, Fast Learning With Predictive Forward Models. Neural Information \nProcessing Systems 4 (NIPS4), 1992 \nJorden, M.I., and Jacobs, R.A., Learning to Control and Unstable System with Forward \nModeling, in D.S. Touretzky, ed., Advances in NIPS 2, Morgan Kaufmann 1990. \n\nMoore, A.W., Fast. Robust Adaptive Control by Learning Only Forward Models. \nNIPS 4, 1992 \n\nMukhopadhyay S. and Narendra, D.S .\u2022 Disturbance Rejection in Nonlinear Systems \nUsing Neural Networks Yale University Report No. 9114 December 1991 \n\nWerbos, P., Architectures For Reinforcement Learning, in Miller, Sutton and Werbos. \ned., Neural Networks for Control, MIT Press 1990 \nYoung, D.O., Distributed Finite-Element Modeling and Control Approach for Large \nFlexible Structures, J. of Guidance, Control and Dynamics, Vol. 13 (4),703-713,1990 \n\n\f", "award": [], "sourceid": 615, "authors": [{"given_name": "Christopher", "family_name": "Bowman", "institution": null}]}*