{"title": "Closed-Form Inversion of Backpropagation Networks: Theory and Optimization Issues", "book": "Advances in Neural Information Processing Systems", "page_first": 868, "page_last": 872, "abstract": null, "full_text": "Closed-Form Inversion of Backpropagation \nNetworks: Theory and Optimization Issues \n\nMichael L. Rossen \nHNC, Inc. \n5.501 Oberlin Drive \nSan Diego, CA 92121 \nrossen@amos.ucsd.edu \n\nAbstract \n\nWe describe a closed-form technique for mapping the output of a trained \nbackpropagation network int.o input activity space. The mapping is an in(cid:173)\nverse mapping in the sense that, when the image of the mapping in input \nactivity space is propagat.ed forward through the normal network dynam(cid:173)\nics, it reproduces the output used to generate that image. When more \nthan one such inverse mappings exist, our inverse ma.pping is special in \nthat it has no projection onto the nullspace of the activation flow opera(cid:173)\ntor for the entire network. An important by-product of our calculation, \nwhen more than one invel'se mappings exist, is an orthogonal basis set of \na significant portion of the activation flow operator nullspace. This basis \nset can be used to obtain an alternate inverse mapping that is optimized \nfor a particular rea.l-world application. \n\n1 Overview \n\nThis paper describes a closed-form technique for mapping a particular output of a \ntrained backpropagation net.work int.o input activity space. The mapping produced \nby our technique is an inverse mapping in the sense that, when the image in input \nspace of the mapping of an output a.ctivity is propa.gated forward through the \nnorma.l network dynamics, it reproduces the output used to generate it.! \n\\Vhen \nmult.iple inverse mappings exist, our inverse mapping is unique in that it has no \n\n1 It is possible that no such inverse mappings exist. This point is addressed in sect.ion 4. \n\n868 \n\n\fClosed-Form Inversion of Backpropagation Networks \n\n869 \n\nprojection onto the nullspace of the activation flow operator for the entire network. \nAn important by-product of our calculation is an orthogonal basis set of a significant \nportion of this nullspace. Any vector within this nullspace can be added to the \nimage from the inverse mapping, producing a new point in input space that is \nstill an inverse mapping image in t.he above sense. Using this nullspace, the inverse \nmapping can be opt.imized for a particular applicat.ion by minimizing a cost function \nover the input element.s, relevant to that applicat.ion, to obtain the vector from the \nnullspace to add to the original inverse mapping image. For this reason and because \nof t.he closed-form we obt.ain for calculation of the network inverse mapping, our \nmet.hod compares favorably to previously proposed iterative methods of network \ninversion [';Yidrow & Stearns, 1985, Linden & Kinderman, 1989]. We now briefly \nsummarize our method of closed-form inversion of a backpropagation network. \n\n2 The Inverse Mapping Operator \n\nTo outline the calculation of our inverse mapping operator, we start by consid(cid:173)\nering a trained feed-forward backpropagation network with one hidden layer and \nbipolar sigmoidal activation functions. \\Ve calculate this inverse as a sequence of \nthe inverses of the sub-operations constituting the dynamics of activation flow. If \nwe use the 'I, II, 0' as suhscripts indicating input, hidden and output modules of \nmodel neurons, respectively. the act.ivation flow from input through hidden module \nto output module is: \n\nLo) \n\nu 0 W(O,H) 84H) \nu (:;) W(O,H) 8 u 0 'W(H,I) 8!{I) \n\n(1) \n\nwhere \n\nu : bipolar sigmoid funct.ion; \nW(dest,sotlrce) : rvlatl'ix operator of connection weights, indexed \n\nby 'solll'ce' and 'dest'{destination) modules; \n\n4k) : Vector of activit.ies for module 'k'. \n\nA is defined here as t.he activation flow operat.or for the entire network. The symbol \n8 separat.es operators sequent.ially applied to the argument. \nSince the sub-operators constit.uting A are applied sequentially, the inverse that we \ncalculate, A+ , is equal to a composit.ion of inverses of the individual sub-operators, \nwith the order of the composition reversed from the order in activation flow. The \nclosed-form mapping of a specified output !(O) to input space is then: \n\nA+ GL'o) \nW?o,H) (:> u- 1 8 }V~I,I) 8 u- 1 8 f(o), \n\n(2) \n\nwhere \n\nu- 1 : Inverse of t.he bipolar logistic sigmoid; \nW(dest,soUJ'ce) : Pseudo-inverse of W(de$t,source) . \n\n\f870 \n\nRossen \n\nSubject to the existence conditions discussed in section 4, !{I) is an inverse mapping \nof!{o) in that it reproduces f(O) when it is propagated forward through the network: \n\nf(O) \n\nA0~I). \n\n(3) \n\nWe use singular value decomposit.ion (SVD), a well-known matrix analysis method \n(e.g., [Lancaster, 1985]), to ca.lculate a particular matrix inverse, the pseudo-inverse \nW(~ .) (also known as the Moore-Penrose inverse) of each connection weight matrix \nblock. In the case of W( ll,I), for example, SVD yields the two unitary matrices, \nS(ll'!) and V(H,I), and a rectangular matrix V(H'!) , all zero except for the singular \nvalues on its diagonal, sllch that \n\nJ \" \n\nS(fl,I)V(fl,!) V(H, I) \n\nV(fl,/) V(fl, 1) S(fl ,/) , \n\n(4) \n\n(5) \n\nwhere \n\nV CH,/) , V(\"fl,J) : Transposes of SCH,/) , V(H,l), respectively; \nvtJ,J) : Pseudo-inverse of V(ll,I), which is simply it.s transpose \n\nwit.h each non-zero singular value replaced by its inverse. \n\n3 Uniqueness and Optimization Considerations \n\nThe pseudo-inverse (calculated by SVD or other methods) is one of a class of solu(cid:173)\ntions t.o the inverse of a mat.rix operator that may exist, called generalized inverses. \nFor our purposes, each of these generalized inverses, if they exist, are inverses in \nthe useful sense tha.t when subst.it.ued for W(j,i) in eq. (2), the resultant !{/) will be \nand inverse mapping image as defined by eq. (3). \n\nWhen a matrix operator W does not have a nullspace, the pseudo-inverse is the \nonly generalized inverse that exists. If W does have a nullspace, the pseudo-inverse \nis special in that its range cont.ains no projection onto the nullspace of W. It follows \nthat if either of t.he mat.rix operat.ors )/\\,'(H,J) or W(O,H) in eq. (1) have a nullspace, \nthen multiple inverse mapping operators WIll exist. However, the inverse mapping \noperator A+ calculated llsing pseudo-inverses will be the only inverse mapping \noperator that has no projection in the nullspace of A. The derivation of these \npropert.ies follow in a straightforward manner from the discussion of generalized \ninverses in [Lancaster, 1985]. An interesting result of using SVD to obtain the \npseudo-inverse is that: \n\nSVD provides a direct method for varying ~J) within the space of inverse \nmapping images ill input space of L \u00b0 J. \nThis becomes clear when we note that if 1\" = P(W(H,I\u00bb) is the rank of W(H,!) , only \nthe first 1\" singular values in V(H,J) are non-zero. Thus, only the first r columns of \nS(H,/) and V(/l,J) participate in the activity flow of the network from input module \nto hidden module. \n\n\fClosed-Form Inversion of Backpropagation Networks \n\n871 \n\nThe columns {Y.(II'/)(i)h>r of V(lI,I) span the nllllspace of W(H,I). This nullspace is \nalso the nullspace of A, or at least a significant portion thereof. 2 If ~J) is an inverse \nmapping image of f(0), then the addition of any vector from the nullspace to ~I) \nwould still be an inverse mapping image of ~O), satisfying eq. (3). If an inverse \nmapping image ~I) obtained from eq. (2) is unphysical, or somehow inappropriate \nfor a particular application, it could possibly be optimized by combining it with a \nvector from the nullspace of A. \n\n4 Existence and Stability Considerations \n\nThere are still implementational issues of importance to address: \n\n1. For a given Lo), can eq. (2) produce some mapping image t(I)? \n\n2. For a given Lo), will the image ~I) produced by eq. (2) be a true inverse \nmapping image; i.e .. will it. sat.isfy eq. (3)? If not, is it a best approximation in \nsome sense? \n\n3. How stable is an inverse mapping from f(0) tha.t produces the answer 'yes' to \nquestions 1 and 2; i.e., if L 0) is perturbed to produce a new output point, will \nthis new output point satisfy questions 1 and 2? \n\nIn general, eq. (2) will produce an image for any output point generated by the \nforward dynamics of the network, eq. (1). If Lo) is chosen arbitrarily, however, \nthen whether it is in t.he domain of A+ is purely a function of the network weights. \nThe domain is restricted because t.he domain of the inverse sigmoid sub-operator is \nrestricted to (-1, + 1). \n\\Vhether an image produced by eq. (2) will be an inverse mapping image, i.e., \nsatisfying eq. (3), is dependent on both the network weights and the network ar(cid:173)\nchitecture. A strong sufficient condit.ion for guara.nteeing this condition is that the \nnetwork have a c07l1'ergent archit.ecture; that is: \n\n\u2022 The dimension of input. space is greater than or equal to the dimension of \n\noutput space . \n\n\u2022 The rank of V(H,I) is greater t.han or equal t.o the rank of'D(O,H)' \n\nThe stability of inverse mappings of a desired output away from such an actual \noutput depends wholly on the weights of the network. The range of singular values \nof weight mat.rix block W(O,H) can be used to address this issue. If the range is \nmuch more than one order of magnitude, then random perturbations about a given \npoint in output space will often be outside the domain of A+. This is because the \ncolumns of S(O,H) and V(O,H) associated wit.h small singular values during forward \n\n2Since its first sub-operation is linear, and the sigmoid non-linearity we employ maps \nzero to zero, the non-linear operator A can still have a nullspace. Subsequent layers of the \nnetwork might add to this nullspace. however, and the added region may not be a linear \nsubspace. \n\n\f872 \n\nRossen \n\nactivity flow are associated with proportionately large inverse singular values in the \ninverse mapping. Thus, if singular value dO,Hi is small, a random perturbation \nwit.h a projection on column \u00a7{O,ll)(i) of S(O,H) will cause a large magnitude swing \nin the inverse sub-operator }V(6,[f) , with t.he result possibly outside the domain of \nu- 1 . \n\n5 SUllllllary \n\n\u2022 'Ve have shown t.hat. a closed-form inverse mapping operator of a backprop(cid:173)\n\nagat.ion network can be obt.ained using a composition of pseudo-inverses and \ninverse sigmoid operators. \n\n\u2022 This inverse mapping operat.or, specified in eq. (2), operating on any point in \nthe network's output space, will obtain an inverse image of that point that \nsat.isfies eq. (3), if snch an invf'rse image exist.s. \n\n\u2022 \"'hen many inverse images of an out.put. point exist, an extension of the SVD \nanalyses used t.o ohtain t.he original inverse image can be used to obtain an \nalternate inverse image optimized t.o satisfy the problem const.raints of a par(cid:173)\nticular application. \n\n\u2022 The existence of an inverse image of a particular output point. depends on that \noutput point. and the network weight.s. The dependence on the network can be \nexpressed conveniently in t.erms of the singular values and the singular value \nvectors of the net.work weight mat.rices. \n\n\u2022 Application for thesf' techniqllf'S include explanation of network operation and \n\nprocess control. \n\nReferences \n\n[Lancaster, 1985] \n\nLancaster, P., & Tismenetsky, M. (1985). The The(cid:173)\naT'y af Matrices. Orlando: Academic. \n\n[Linden & Kinderman, 1989] Linden, A., & Kinderman, J. (1989). Inversion of \n\n[\"Tidrow & Stearns, 1985] \n\nmultilayer nets. Proceedings of the Third Annual In(cid:173)\nter1lational Joint Conference on Neural Networks, \nFal II. 425-430. \n'Vidrow, B., & Stearns, S.D. (1985). Adpative Signal \nProcessing. Englewood Cliffs: Prentice-Hall. \n\n\fPart XIII \n\nLearning and Generalization \n\n\f\f", "award": [], "sourceid": 310, "authors": [{"given_name": "Michael", "family_name": "Rossen", "institution": null}]}