{"title": "A Comparison of Discrete-Time Operator Models for Nonlinear System Identification", "book": "Advances in Neural Information Processing Systems", "page_first": 883, "page_last": 890, "abstract": "", "full_text": "A Comparison of Discrete-Time Operator Models \n\nfor Nonlinear System Identification \n\nAndrew D. Back, Ah Chung Tsoi \n\nDepartment of Electrical and Computer Engineering, \n\nUniversity of Queensland \n\nSt. Lucia, Qld 4072. Australia. \n\ne-mail: {back.act}@elec.uq.oz.au \n\nAbstract \n\nWe present a unifying view of discrete-time operator models used in the \ncontext of finite word length linear signal processing. Comparisons are \nmade between the recently presented gamma operator model, and the delta \nand rho operator models for performing nonlinear system identification \nand prediction using neural networks. A new model based on an adaptive \nbilinear transformation which generalizes all of the above models is \npresented. \n\n1 \n\nINTRODUCTION \n\nThe shift operator, defined as qx(t) ~ x(t + 1), is frequently used to provide time-domain \nsignals to neural network models. Using the shift operator, a discrete-time model for system \nidentification or time series prediction problems may be constructed. A common method of \ndeveloping nonlinear system identification models is to use a neural network architecture as \nan estimator F(Y(t), X(t); 0) of F(Y(t), X(t\u00bb, where 0 represents the parameter vector \nof the network. Shift operators at the input of the network provide the regression vectors \nY(t-l) = [yet-I), ... , y(t-N)]', andX(t) = [x(t), ... , x(t-M)]' in a manner analogous \nto linear filters, where [.], represents the vector transpose. \nIt is known that linear models based on the shift operator q suffer problems when used to \nmodellightly-damped-Iow-frequency (LDLF) systems, with poles near (1,0) on the unit \ncircle in the complex plane [5]. As the sampling rate increases, coefficient sensitivity and \nround-off noise become a problem as the difference between successive sampled inputs \nbecomes smaller and smaller. \n\n\f884 \n\nAndrew D. Back, Ah Chung Tsoi \n\nA method of overcoming this problem is to use an alternative discrete-time operator. \nAgarwal and Burrus first proposed the use of the delta operator in digital filters to replace \nthe shift operator in an attempt to overcome the problems described above [1]. The delta \noperator is defined as \n\n{) = q-l \n\n~ \n\n(1) \n\nwhere ~ is the discrete-time sampling interval. Williamson showed that the delta operator \nallows better performance in terms of coefficient sensitivity for digital filters derived from \nthe direct form structure [19], and a number of authors have considered using it in linear \nfiltering, estimation and control [5, 7, 8] \nMore recently, de Vries, Principe at. al. proposed the gamma operator [2, 3] as a means \nof studying neural network models for processing time-varying patterns. This operator is \ndefined by \n\n'Y = \n\nq -\n\n(1 - c) \nC \n\n(2) \n\nIt may be observed that it is a generalization of the delta operator with adjustable parameters \nc. An extension to the basic gamma operator introducing complex poles using a second \norder operator, was given in [18]. \nThis raises the question, is the gamma operator capable of providing better neural network \nmodelling capabilities for LDLF systems ? Further, are there any other operators which \nmay be better than these for nonlinear modelling and prediction using neural networks? \nIn the context of robust adaptive control, Palaniswami has introduced the rho operator \nwhich has shown useful improvements over the performance ofthe delta operator [9, 10]. \nThe rho operator is defined as \n\np = \n\n(3) \n\nwhere Cl, C2 are adjustable parameters. The rho operator generalizes the delta and gamma \noperators. For the case where Cl~ = C2~ = 1, the rho operator reduces to the usual shift \noperator. When c) = 0, and C2 = 1, the rho operator reduces to the delta operator [10]. \nFor Cl ~ = C2~ = c, the rho operator is equivalent to the gamma operator. \nOne advantage of the rho operator over the delta operator is that it is stably invertible, \nallowing the derivation of simpler algorithms [9]. The p operator can be considered as \na stable low pass filter, and parameter estimation using the p operator is low frequency \nbiased. For adaptive control systems, this gives robustness advantages for systems with \nunmodelled high frequency characteristics [9] . \nBy defining the bilinear transformation (BLT) as an operator, it is possible to introduce \nan operator which generalizes all of the above operators. We can therefore define the pi \noperator as \n\n11\" = 2 (Clq - C2) \n~ (C3Q + C4) \n\n(4) \n\nwith the restriction that Cl C4 f C2C3 (to ensure 11\" is not a constant function [14]). The bilinear \nmapping produced has a pole at q = -C4/C3. By appropriate setting of the Cl, C2, C3, C4 \nparameters each operator, the pi operator can be reduced to each of the previous operators. \nIn the work reported here, we consider these alternative discrete-time operators in feed(cid:173)\nforward neural network models for system identification tasks. We compare the popular \n\n\fA Comparison of Discrete-Time Operator Models for Nonlinear System Identification \n\n885 \n\ngamma model [4] with other models based on the shift, delta, rho and pi operators. A \nframework of models and Gauss-Newton training algorithms is provided, and the models \nare compared by simulation experiments. \n\n2 OPERATOR MODELS FOR NONLINEAR SIGNAL \n\nPROCESSING \n\nA model which generalizes the usual discrete-time linear moving average model, ie, a single \nlayer network is given by \n\nyet) \n\nG(v,O)x(t) \n\nC(v,O) \n\nM L: bw- i \n\ni=O I q-~ shift operator \n\n8-' delta operator \n, -i \np - i \n7r- i pi operator \n\ngamma operator \nrho operator \n\n(5) \n\n(6) \n\nThis general class of moving average model can be termed MA(v). We define uo(t) ~ x(t), \nand Ui(t) ~ V-IUi_l (t) and hence obtain \n\nI x(t - i) \n\n~Ui-l(t - 1) + Ui(t - 1) \nCUi-l(t - 1) + (1 - C)Ui(t - 1) \nC2~Ui-l(t - 1) + (1- Cl~)Ui(t - 1) \n2~1 (C3Ui-l(t) + C4Ui-l(t - 1\u00bb) - ~Ui(t - 1) \n\nshift operator \ndelta operator \ngamma operator \nrho operator \npi operator \n\n(7) \n\nA nonlinear model may be defined using a multilayer perceptron (MLP) with the v-operator \nelements at the input stage. The input vector ZP( t) to the network is \n\nZ?(t) = \n\n[Xi(t), V-1Xi(t), ... , V-MXi(t)]' \n\n(8) \nwhere Xi(t) is the ith input to the system. This model is termed the v-operator multilayer \nperceptron or MLP(v) model. \nAn MLP(v) model having L layers with No, N I , ... , NL nodes per layer, is defined in the \nsame manner as a usual MLP, with \n\nNz L WiiZJ-I(t) \n\ni=l \n\n(9) \n\n(10) \n\nwhere each neuron i in layer 1 has an output of z!(t); a layer consists of N/ neurons (1 = 0 \ndenotes the input layer, and 1 = L denotes the output layer, zJvz = 1.0 may be used for a \nbias); 10 is a sigmoid function typically evaluated as tanh(\u00b7), and a synaptic connection \nbetween unit i in the previous layer and unit k in the current layer is represented by wt. \nThe notation t may be used to represent a discrete time or pattern instance. While the case \n\n\f886 \n\nAndrew D. Back, Ah Chung Tsoi \n\nwe consider employs the v-operator at the input layer only, it would be feasible to use the \noperators throughout the network as required. \nOn-line algorithms to update the operator parameters in the MA(v) model can be found \nreadily. In the case of the MLP(v) model, we approach the problem by backpropagating \nthe error information to the input layer and using this to update the operator coefficients. \nde Vries and Principe et. al., proposed stochastic gradient descent type algorithms for \nadjusting the c operator coefficient using a least-squares error criterion [2, 12]. For brevity \nwe omit the updating procedures for the MLP network weights; a variety of methods may \nbe applied (see for example [13, 15]). \nWe define an instantaneous output error criterion J(t) = !e2(t), where e(t) = y(t) - f)(t). \nDefining 0 as the estimated operator parameter vector at time t of the parameter vector 0, \nwe have \n\no = \n\n{ \n\nc \ngamma operator \n[CI , C2]' \nrho operator \n[CI, C2, C3, 134 ]' pi operator \n\nA first order algorithm to update the coefficients is \n\nOi(t + 1) \n\nOi(t) + ~Oi(t) \n~Oi (t) = -71\\1 eJ ((}; t) \n\nwhere the adjustment in weights is found as \n\n~Oi(t) = -71 oJ(t) \no(}j \n\nM \n\n71 I: 1/J1'(t)Oj(t) \n\ni=1 \n\n(11) \n\n(12) \n\n(13) \n\n(14) \n\n(15) \n\nwhere OJ (t) is the backpropagated error at the jth node of input layer, and 1/J1' (t) is the first \norder sensitivity vector of the model operator parameters, defined by \n\nI &Ui(t) \n\n&Cjl \n\n./~ (t) = \n\n!Pi \n\ngamma operator \n\n[::~(t) &Ui(t)]' \n[ &Ui(t) &Ui(t) &Ui(t) &Ui(t)] I Pi operator \n\nrho operator \n\n' &Cj2 \n\n&Cjl \n\n' &Cj2 \n\n' &Cj3 \n\n' &Cj4 \n\nSubstituting Ui(t) in from (7), the recursive equations for 1/J1 (t) (noting that 1/J1 (t)= tPHt) \n'Vj) are \ntPi(t) = Ui_l(t - 1) - Ui(t - 1) + CitPi-l(t - 1) + (1 - C)tPi(t - 1) gamma operator \ntPi(t) = \n\nrho operator \n\n[ C2~tPi-I,I(t - 1) + (1- CI~)tf;i I(t - 1) - ~Ui(t - 1) \n~Ui-I(t - 1) + C2~tPi-I,2(t - 1) + (1 - CI~)tPi,2(t _ 1) \n2t (C3tPi-I,I(t) + C4tPi-l,1(t - 1\u00bb) + ~tPi,l(t - 1) \n-2~2 (C3Ui-l(t) + C4Ui_l(t - 1\u00bb) - ~Ui(t - 1), \n2~1 ~C3tPi-I,2(t) + C4tPi-I,2(t - 1\u00bb) + ~tPi,2(t - 1) + tUi(t - 1), \n2t (Ui_l(t) + C3tPi-l,3(t) + C4tPi-I,3(t - 1\u00bb) + ~tPi~3(t - 1), \n2t (C3tPi-I,4(t) + Ui-I(t - 1) + C4tPi-l,4(t - 1\u00bb + T.-tPi,4(t - 1) \n\n] \n\nc} \n\nC1 \n\npi operator \n\n\fA Comparison of Discrete-Time Operator Models for Nonlinear System Identification \n\n887 \n\nfor the gamma, rho, and pi operators respectively, and where \"\"i ,j (t) refers to the jth element \nof the ith \"\" vector, with \"\"i ,o( t) = O. \nA more powerful updating procedure can be obtained by using the Gauss-Newton method \n[6]. In this case, we replace (14) with (omitting i subscripts for clarity), \n\nOCt + 1) = OCt) + 'Y(t)R- 1 (t)\",,(t)A -16(t) \n\n(16) \n\nwhere 'Y(t) is the gain sequence (see [6] for details), A-I is a weighting matrix which may \nbe replaced by the identity matrix [16], or estimated as [6] \n\nA(t) = A(t - 1) + 'Y(t) (6 2(t) - A(t - 1\u00bb) \n\nR( t) is an approximate Hessian matrix, defined by \n\nR(t + 1) = A(t)R(t) + \u00ab(t)\"\"(t),,,,'(t) \n\n(17) \n\n(18) \n\nwhere A(t) = 1 - \u00ab(t). Efficient computation of R- 1 may be performed using the \nmatrix inversion lemma [17], factorization methods such as Cholesky decomposition or \nother fast algorithms. Using the well known matrix inversion lemma [6], we substitute \npet) = R-l(t), where \n\nP t \n\n-\n() -\n\n_1_p t _ \u00ab(t) ( \n) \nA(t) () A(t) A(t) + \u00ab(t)\"\"'(t)P(t)\",,(t) \n\nP(t)\",,(t)\"\"'(t)P(t) \n\n(19) \n\nThe initial values of the coefficients are important in determining convergence. Principe \net. al. [12] note that setting the coefficients for the gamma operator to unity provided the \nbest approach for certain problems. \n\n3 SIMULATION EXAMPLES \n\nWe are primarily interested in the differences between the operators themselves for mod(cid:173)\nelling and prediction, and not the associated difficulties of training multilayer perceptrons \n(recall that our models will only differ at the input layer). For the purposes of a more direct \ncomparison, in this paper we test the models using a single layer network. Hence these \nlinear system examples are used to provide an indication of the operators' performance. \n\n3.1 EXPERIMENT 1 \n\nThe first problem considered is a system identification task arising in the context of high \nbit rate echo cancellation [5]. In this case, the system is described by \n0.0254- 0.0296z- 1 + 0.00425z-2 \n\nH(z) = \n\n1 _ 1.957z-1 + 0.957z- 2 \n\n(20) \n\nThis system has poles on the real axis at 0.9994, and 0.9577, thus it is an LDLF system. \nThe input signal to the system in each case consisted of uniform white noise with unit \nvariance. A Gauss-Newton algorithm was used to determine all unknown weights. We \nconducted Monte-Carlo tests using 20 runs of differently seeded training samples each of \n2000 points to obtain the results reported. We assessed the performance of the models by \nusing the Signal-to-NoiseRatio (SNR) defined as 1010g( E[d(t)2Jj E[e(t)2D, where E[\u00b7l is \nthe expectation operator, and d(t) is the desired signal. For each run, we used the last 500 \nsamples to compute a SNR figure. \n\n\f888 \n\nAndrew D. Back, Ah Chung Tsoi \n\n0.04 \n0.03 \u2022 \n0.02 \n0.01 \n\n\u2022 \n\no .~ .. \n~. \n.~ \n\n-0.01 \n-0.02 \n-0.03 \n-O''\u00a5900 1920 \n\n\u2022 \n\n0.04 \n0.03 \n0.02 \no.ot \n0 \n-0.01 \n-0.02 \n-0.03 \n-0\u00b711900 1920 \n\n....\u2022 \n\n0.04 \n0.03 \n0.02 \no.ot \n0 \n-0.01 \n-0.02 \n-0.03 \n-0\u00b711900 1920 \n\n(b) \n\n(c) \n\n0.04 \n0.03 \n0.02 \n0.01 \n0 \n-0.01 \n-0.02 \n-0.03 \n-O''\u00a5900 1920 \n\n(a) \n\n0.04 \n0.03 \n0.02 \n0.01 \n0 \n-{l.01 \n-0.02 \n-0.03 \n-0\u00b711900 1920 \n\n(d) \n\n(e) \n\nFigure 1: Comparison of typical model output results for Experiment 1 with models based \non the following operators: (a) shift, (b) delta (c) gamma, (d) rho, and (e) pi. \n\nTable 1: System Identification Experiment 1 Results \n\nFor the purposes of this experiment, we conducted several trials and selected 0(0) values \nwhich provided stable convergence. The values chosen for this experiment were: 0(0) \n= {0.75, [0.5,0.75]' [0.75,0.7,0.35,-0.25]} for the gamma, rho and pi operator models \nrespectively. In each case we used model order M = 8. \nResults for this experiment are shown in Table 1 and Figure 1. We observe that the pi \noperator gives the best performance overall. Some difficulties with instability occurring \nwere encountered, thereby requiring a stability correction mechanism to be used on the \noperator updates. The next best performance was observed in the rho and then gamma \nmodels, with fewer instability problems occurring. \n\n3.2 EXPERIMENT 2 \n\nThe second experiment used a model described by \n\nH(z) = \n\n1 - 0.8731z- 1 - 0.8731z-2 + z-3 \n\n1 - 2.8653z- 1 + 2.7505z-2 - 0.8843z- 3 \n\n(21) \n\nThis system is a 3rd order lowpass filter tested in [11]. The same experimental procedures \nas used in Experiment 1 were followed in this case. \nFor the second experiment (see Table 2), it was found that the pi operator gave the best results \n\n\fA Comparison of Discrete-Time Operator Models for Nonlinear System Identification \n\n889 \n\nTable 2: System Identification Experiment 2 Results \n\nrecorded over all the tests. On average however, the improvement for this identification \nproblem is less. It is observed that that the pi model is only slightly better than the gamma \nand rho models. Interestingly, the gamma and rho models had no problems with stability, \nwhile the pi model still suffered from convergence problems due to instability. As before, \nthe delta model gave a wide variation in results and performed poorly. \nFrom these and other experiments performed it appears that performance advantages can \nbe obtained through the use of the more complex operators. As observed from the best \nrecorded runs, the extra degrees of freedom in the rho and pi operators appear to provide \nthe means to give better performance than the gamma model. The improvements of the \nmore complex operators come at the expense of potential convergence problems due to \ninstabilities occurring in the operators and a potentially multimodal mean square output \nerror surface in the operator parameter space. \nClearly, there is a need for further investigation into the performance of these models on a \nwider range of tasks. We present these preliminary examples as an indication of how these \nalternative operators perform on some system identification problems. \n\n4 CONCLUSIONS \n\nModels based on the delta operator, rho operator, and pi operator have been presented \nand new algorithms derived. Comparisons have been made to the previously presented \ngamma model introduced by de Vries, Principe et. al. [4] for nonlinear signal processing \napplications. \nWhile the simulation examples considered show are only linear, it is important to realize \nthat the derivations are applicable for multilayer perceptrons, and that the input stage of \nthese networks is identical to what we have considered here. We treat only the linear case \nin the examples in order not to complicate our understanding of the results, knowing that \nwhat happens in the input layer is important to higher layers in network structures. \nThe results obtained indicate that the more complex operators provide a potentially more \npowerful modelling structure, though there is a need for further work into mechanisms of \nmaintaining stability while retaining good convergence properties. \nThe rho model was able to perform better than the gamma model on the problems tested, \nand gave similar results in terms of susceptibility to convergence and instability problems. \nThe pi model appears capable of giving the best performance overall, but requires more \nattention to ensure the stability of the coefficients. \nFor future work it would be of value to analyse the convergence of the algorithms, in \norder to design methods which ensure stability can be maintained, while not disrupting the \nconvergence of the model. \n\n\f890 \n\nAndrew D. Back, Ah Chung Tsoi \n\nAcknowledgements \n\nThe first author acknowledges financial support from the Australian Research Council. The \nsecond author acknowledges partial support from the Australian Research Council. \n\nReferences \n[1] R.C. Agarwal and C.S. Burrus, ''New recursive digital filter structures having very \nlow sensitivity and roundoff noise\", IEEE Trans. Circuits and Systems, vol. cas-22, \npp. 921-927, Dec. 1975. \n\n[2] de Vries, B. Principe, J.C. \"A theory for neural networks with time delays\", Advances \nin Neural Information Processing Systems, 3, R.P. Lippmann (Ed.), pp 162 - 168, \n1991. \n\n[3] de Vries, B., Principe, J. and P.G. de Oliveira \"Adaline with adaptive recursive \nmemory\", Neural Networks for Signal Processing I. Juang, B.H., Kung, S.Y., Kamm, \nC.A. (Eds) IEEE Press, pp. 101-110, 1991. \n\n[4] de Vries, B. Principe, J. \"The Gamma Model - a new neural model for temporal \n\nprocessing\". Neural Networks. Vol 5, No 4, pp 565 - 576, 1992. \n\n[5] H. Fan and Q. Li, \"A () operator recursive gradient algorithm for adaptive signal \nprocessing\", Proc. IEEE Int. Conf. Acoust. Speech and Signal Proc., vol. nI, pp. \n492-495, 1993. \n\n[6] L. Ljung, and T. SOderstrtlm, Theory and Practice of Recursive Identification, Cam(cid:173)\n\nbridge, Massachusetts: The MIT Press, 1983. \n\n[7] R.H. Middleton, and G.C. Goodwin, Digital Control and Estimation, Englewood \n\n[8] V. Peterka, \"Control of Uncertain Processes: Applied Theory and Algorithms\", Ky(cid:173)\n\nCliffs: Prentice Hall, 1990. \n\nbernetika, vol. 22, pp. 1-102, 1986. \n\n[9] M. Palaniswami, \"A new discrete-time operator for digital estimation and control\". \nThe Uni versity of Melbourne, Department of Electrical Engineering, Technical Report \nNo.1, 1989. \n\n[10] M. Palaniswami, \"Digital Estimation and Control with a New Discrete Time Operator\", \n\nProc. 30th IEEE Conf. Decision and Control, pp. 1631-1632, 1991. \n\n[11] J.C. Principe, B. de Vries, J-M. Kuo and P. Guedes de Oliveira, \"Modeling Appli(cid:173)\n\ncations with the Focused Gamma Net\", Advances in Neural Information Processing \nSystems, vol. 4, pp. 143-150,1991. \n\n[12] J.C. Principe, B. de Vries, and P. Guedes de Oliveira, \"The Gamma Filter - a new class \nof adaptive IIR filters with restricted feedback\", IEEE Trans. Signal Processing, vol. \n41, pp. 649-656, 1993. \n\n[13] G. V. Puskorius, and L.A. Feldkamp, \"Decoupled Extended Kalman Filter Training of \nFeedforward Layered Networks\", Proc. Int Joint Conf. Neural Networks, Seattle, vol \nI, pp. 771-777,1991. \n\n[14] E.B. Saff and A.D. Snider, Fundamentals of Complex Analysis for Mathematics, \n\nScience and Engineering. Englewood Cliffs, NJ: Prentice-Hall, 1976. \n[15] S. Shah and F. Palmieri, \"MEKA - A Fast Local Algorithm for Training Feedfoward \nNeural Networks\", Proc Int Joint Conf. on Neural Networks, vol m, pp. 41-46, 1990. \n[16] J.J. Shynk, \"Adaptive IIR filtering using parallel-form realizations\", IEEE Trans. \n\nAcoust. Speech Signal Proc., vol. 37, pp. 519-533,1989. \n\n[17] Soderstrom and Stoica, \"System Identification\", London: Prentice Hall, 1989. \n[18] T.O. de Silva, P.G. de Oliveira, J.e. Principe, and B. de Vries, \" Generalized feed(cid:173)\n\nforward filters with complex poles\", Neural Networks for Signal Processing n, S.Y. \nKung et. al. (Eds) Piscataway,NJ: IEEE Press, 1992. \n\n[19] D. Williamson, \"Delay replacement in direct form structures\", IEEE Trans. Acoust., \n\nSpeech, Signal Processing, vol. ASSP-34, pp. 453-460, Aprl. 1988. \n\n\fPARTvm \n\nVISUAL PROCESSING \n\n\f\f", "award": [], "sourceid": 885, "authors": [{"given_name": "Andrew", "family_name": "Back", "institution": null}, {"given_name": "Ah", "family_name": "Tsoi", "institution": null}]}