{"title": "A Theory for Neural Networks with Time Delays", "book": "Advances in Neural Information Processing Systems", "page_first": 162, "page_last": 168, "abstract": null, "full_text": "A Theory for Neural Networks with Time Delays \n\nBert de Vries \nDepartment of Electrical Engineering \nUniversity of Horida, CSE 447 \nGainesville, FL 32611 \n\nJose C. Principe \nDepartment of Electrical Engineering \nUniversity of Horida, CSE 444 \nGainesville, FL 32611 \n\nAbstract \n\nWe present a new neural network model for processing of temporal \npatterns. This model, the gamma neural model, is as general as a \nconvolution delay model with arbitrary weight kernels w(t). We \nshow that the gamma model can be formulated as a (partially \nprewired) additive model. A temporal hebbian learning rule is \nderived and we establish links to related existing models for \ntemporal processing. \n\nINTRODUCTION \n\n1 \nIn this paper, we are concerned with developing neural nets with short term memory for \nprocessing of temporal patterns. In the literature, basically two ways have been \nreported to incorporate short-term memory in the neural system equations. The first \napproach utilizes reverberating (self-recurrent) units of type : = - aa (x) + e, that \nhold a trace of the past neural net states x(t) or the input e(t). Elman (1988) and \nJordan (1986) have successfully used this approach. The disadvantage of this method \nis the lack of weighting flexibility in the temporal domain, since the system equations \nare described by first order dynamics, implementing a recency gradient (exponential \nfor linear units). \n\nThe second approach involves explicit inclusion of delays in the neural system \nequations. A general formulation for this type requires a time-dependent weight \nmatrix W(t). In such a system, multiplicative interactions are substituted by temporal \nconvolution operations, leading to the following system equations for an additive \nconvolution model -\n\nt \n\n: = JW(t-s)a(x(s\u00bbds+e. \n\n( 1) \n\no \n\n162 \n\n\fA Theory for Neural Networks with Time Delays \n\n163 \n\nDue to the complexity of general convolution models, only strong simplifications of \nthe weight kernel have been proposed. Lang et. al. (1990) use a delta function kernel, \nW(I) = L Wk8(1-lk ) , which is the core for the Time-Delay-Neural-Network \n(TDNN). Tank and Hopfield (1987) prewire W(t) as a weighted sum of dispersive \n\nk=O \n\nK \n\ndelay kernels, W (I) = ~ Wk (t) e \n\nK \n~ I \n\nk k(l--) \n\nt \n\n11; \n\nK \n~ \n\n= ~ Wkhk (I, t k). The kernels \n\nk=O \n\nk \n\nk=O \n\nhk (I, t k) are the integrands of the gamma function. Tank and Hopfield described a \none-layer system for classification of isolated words. We will refer to their model as \na Concentration-In-Time-Network (CITN). The system parameters were non(cid:173)\nadaptive, although a Hebbian rule equivalent in functional differential equation form \nwas suggested. \n\nIn this paper, we will develop a theory for neural convolution models that are \nexpressed through a sum of gamma kernels. We will show that such a gamma neural \nnetwork can be reformulated as a (Grossberg) additive model. As a consequence, the \nsubstantial learning and stability theory for additive models is directly applicable to \ngamma models. \n\nTHE GAMMA NEURAL MODEL - FORMAL DERIVATION \n\n2 \nConsider the N-dimensional convolution model -\n\n~ = -ax+ WoY+ fdsW(,-S)Y(S) +e, \n\nI \n\no \n\n( 2) \n\nwhere x(t), y(t)=a(x) and e(t) are N-dimensional signals; Wo is NxN and W(t) is \nN xN x [0, 00] . The weight matrix W 0 communicates the direct neural interactions, \nwhereas W(t) holds the weights for the delayed neural interactions. We will now \nassume that W(t) can be written as a linear combination of normalized gamma \nkernels, that is, -\n\nK \n\nW(I) = L Wkgk(I), \n\nk=l \n\nwhere -\n\n( 3) \n\n( 4) \n\nwhere 1.1 is a decay parameter and k a (lag) order parameter. If W(t) decays \nexponentially to zero for I ---+ 00, then it follows from the completeness of Laguerre \npolynomials that this approximation can be made arbitrarily close (Cohen et. aI., \n\n\f164 \n\nde '\\Ties and Principe \n\n1979). In other words, for all physical plausible weight kernels there is a K such that \nW(t) can be expressed as (3), ( 4). The following properties hold for the gamma \nkernels g1c (t) -\n\n- [1] The gamma kernels are related by a set of linear homogeneous ODEs -\n\ndg 1 \ndt = -J,lgl \n\n- [2] The peak value (dt = 0) occurs at tp = ~. \n\nk - 1 \n\ndg1c \n\n- [3] The area of the gamma kernels is a normalized, that is, J dsg1c (s) = 1. \n\no \n\nSubstitution of (3) into ( 2) yields -\n\ndx \ndt = -ax+ L W1cY1c+ e , \n\nK \n\n1c=O \n\nwhere we defined Yo (t) = Y (t) and the gamma state variables -\n\nt \n\nY1c(t) = Jdsg1c (t-S)YO(S) , k=l, .. ,K. \n\no \n\n( 5) \n\n(6) \n\n( 7) \n\nThe gamma state variables hold memory traces of the neural states yo(t). The \nimportant question is how to compute Y1c (t) . Differentiating (7) using Leibniz' rule \nyields -\n\ndy \n\ndt1c = J :f1c (t - s) y (s) ds + g1c (0) Y (t) . \n\nt \n\no \n\nWe now utilize gamma kernel property [1] (eq. (5\u00bb to obtain-\n\nNote that since g1c (0) = 0 for k ~ 2 and gl (0) = J,1. (9) evaluates to -\n\n( 8) \n\n(9) \n\n\fA Theory for Neural Networks with Time Delays \n\n165 \n\ndy1c \ndt = -~Y1c+~Y1c-t' k=I, .. ,K. \n\n( 10) \n\nThe gamma model is described by (6) and (10). This extended set of ordinary \ndifferential equations (ODEs) is equivalent to the convolution model, described by \nthe set of functional differential equations (2), (3) and (4). \n\nIt is a valid question to ask whether the system of ODEs that describes the gamma \nmodel can still be expressed as a neural network model. The answer is affirmative, \nsince the gamma model can be formulated as a regular (Grossberg) additive model. \n\nTo see this, define the N(K+l)-dimensional augmented state vector X = \n\nx \nYt \n\n, the \n\nneural output signal Y = \n\na (x) \n\nYt \n\n, an external input E = \n\n, a diagonal matrix \n\ne \no \n\no \n\nof decay parameters M = \n\nand \n\nthe weight \n\n(super)matrix \n\na \n\n~ \n\no \n0\\ \n\n~ \n\n... WK \n\nWo Wt \n~ ~ 0 \no ,,~O \n\n0= \n\nform -\n\n. Then the gamma model can be rewritten in the following \n\ndX \ndt = -MX+QY+E, \n\n( 11) \n\nthe familiar Grossberg additive model. \n\n3 \n\nHEBBIAN LEARNING IN THE GAMMA MODEL \n\nThe additive model formulation of the gamma model allows a direct generalization \n\n\f166 \n\nde 'ties and Principe \n\nof learning techniques to the gamma model. Note however that the augmented state \nvector X contains the gamma state variables Y1, ... ,YK, basically (dispersively) \ndelayed neural states. As a result, although associative learning rules for \nconventional additive models only encode the simultaneous correlation of neural \nstates, the gamma learning rules are able to encode temporal associations as well. \nHere we present Hebbian learning for the gamma model. \n\nThe Hebbian postulate is often mathematically translated to a learning rule of the \nform dd~ = 11x (1) yT (t) , where 11 is a learning rate constant, x the neural activation \n\nvector and yT the neuron output signal vector. This procedure is not likely to encode \ntemporal order, since information about past states is not incorporated in the learning \nequations. \n\nTank and Hopfield (1987) proposed a generalized Hebbian learning rule with delays \nthat can be written as -\n\n( 12) \n\nwhere g (s) is a normalized delay kernel. Notice that ( 12) is a functional differential \nequation, for which explicit solutions and convergence criteria are not known (for \n\nmost implementations of g (s) ). In the gamma model, the signals J dsgk (s) Y (t - s) \n\nt \n\nare computed by the system and locally available as Yk (1) at the synaptic junctions \nWk. Thus, in the gamma model, ( 12) reduces to -\n\no \n\ndWk \ndt = 11x (1) Yk (1) . \n\nT \n\n( 13) \n\nThis learning rule encodes simultaneous correlations (for k=O) as well as temporal \nassociations (for k ~ 1). Since the gamma Hebb rule is structurally similar to the \nconventional Hebb rule, it is also local both in time and space. \n\n4 \n\nRELATION TO OTHER MODELS \n\nThe gamma model is related to Tank and Hopfield 's CITN model in that both models \ndecompose W(t) into a linear combination of gamma kernels. The weights in the \nCITN system are preset and fixed. The gamma model, expressed as a regular additive \nsystem, allows conventional adaptation procedures to train the system parameters; Il \nand K adapt the depth and shape of the memory, while Wo, .. ,W K encode \nspatiotemporal correlations between neural states. \n\nTime-Delay-Neural-Nets (TDNN) are characterized by a tapped delay line memory \nstructure. The relation is best illustrated by an example. Consider a linear one-layer \n\n\fA Theory for Neural Networks with Time Delays \n\n167 \n\nfeedforward convolution model, described by -\n\nx(t) = e (t) \n\ny(t) = JW(t-s)x(s)dS \n\nt \n\no \n\n( 14) \n\nwhere x(t), e(t) and y(t) are N-dimensional signals and W(t) a NxNx [0,00] \ndimensional weight matrix. This system can be approximated in discrete time by -\n\nx(n)=e(n) \n\nn \n\ny(n) = L W(n-m)x(m) \n\nm=O \n\n( 15) \n\nwhich is the TDNN formulation. An alternative approximation of the convolution \nmodel by means of a (discrete-time) gamma model, is described by (figure 1) -\n\nXo (n) = e (n) \n\nx k (n) = (l - ~) X k (n - 1) + Ilx k _ 1 (n - 1) k= 1, .. ,K \n\nK \n\ny(n) = L W~k(n) \n\nk=O \n\n( 16) \n\nThe recursive memory structure in the gamma model is stable for 0 S ~ S 2, but an \ninteresting memory structure is obtained only for 0 < Il S 1. For Il = 0, this system \ncollapses to a static additive net. In this case, no information from past signal values \nare stored in the net. For 0 < 1.1. < 1, the system works as a discrete-time CITN. The \ngamma memory structure consists of a cascade of first-order leaky integrators. Since \nthe total memory structure is of order K, the shape of the memory is not restricted to \n\na recency gradient. The effective memory depth approximates K for small Il. For \n\n1.1. = 1, the gamma model becomes a TDNN. In this case, memory is implemented by \na tapped delay line. The strength of the gamma model is that the parameters Il and K \ncan be adapted by conventional additive learning procedures. Thus, the optimal \ntemporal structure of the neural system, whether of CITN or TDNN type, is part of \nthe training phase in a gamma neural net. Finally, the application of the gamma \nmemory structure is of course not limited to one-layer feedforward systems. The \ntopologies suggested by Jordan (1986) and Elman (1988) can easily be extended to \ninclude gamma memory. \n\nIl \n\n\f168 \n\nde \\Ties and Principe \n\ne(n) \n\nFieure L \na one-layer \nffw gamma net \n\nCONCLUSIONS \n\n5 \nWe have introduced the gamma neural model, a neural net model for temporal \nprocessing, that generalizes most existing approaches, such as the CITN and TDNN \nmodels. The model can be described as a conventional dynamic additive model, \nenabling direct application of existing learning procedures for additive models. In the \ngamma model, dynamic objects are encoded by the same learning equations as static \nobjects. \n\nAcknowledgments \n\nThis work has been partially supported by NSF grants ECS-8915218 and DDM-\n8914084. \n\nReferences \nCohen et. al., 1979. Stable oscillations in single species growth models with \n\nhereditary effects. Mathematical Biosciences 44:255-268, 1979. \n\nDeVries and Principe, 1990. The gamma neural net - A new model for temporal \n\nprocessing. submitted to Neural Networks, Nov.1990. \n\nElman. 1988. Finding structure in time. CRL technical report 8801, 1988. \nJordan, 1986. Attractor dynamics and parallelism in a connectionist sequential \n\nmachine. Proc. Cognitive Science 1986. \n\nLang et. al. 1990. A time-delay neural network architecture for isolated word \n\nrecognition. Neural Networks, vol.3 (1), 1990. \n\nTank and Hopfield. 1987. Concentrating information in time: analog neural networks \nwith applications to speech recognition problems. 1st into con! on neural \nnetworks. IEEE. 1987. \n\n\f", "award": [], "sourceid": 356, "authors": [{"given_name": "Bert", "family_name": "de Vries", "institution": null}, {"given_name": "Jos\u00e9", "family_name": "Pr\u00edncipe", "institution": null}]}