{"title": "Rational Parametrizations of Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 623, "page_last": 630, "abstract": "", "full_text": "Rational Parametrizations of Neural \n\nNetworks \n\nUwe Helmke \n\nDepartment of Mathematics \n\nUniversity of Regensburg \nRegensburg 8400 Germany \n\nDepartment of Systems Engineering \n\nAustralian National University \n\nRobert C. Williamson \n\nCanberra 2601 Australia \n\nAbstract \n\nA connection is drawn between rational functions, the realization \ntheory of dynamical systems, and feedforward neural networks. \nThis allows us to parametrize single hidden layer scalar neural \nnetworks with (almost) arbitrary analytic activation functions in \nterms of strictly proper rational functions. Hence, we can solve the \nuniqueness of parametrization problem for such networks. \n\n1 \n\nINTRODUCTION \n\nNonlinearly parametrized representations of functions \u00a2: IR -+- IR of the form \n\n(1.1) \n\nn \n\n\u00a2(x) = L CiU(X - ai) \n\nx E IR, \n\ni=l \n\nhave attracted considerable attention recently in the neural network literature. Here \nu: IR -+- IR is typically a sigmoidal function such as \n\n(1.2) \n\nbut other choices than (1.2) are possible and of interest. Sometimes more complex \nrepresentations such as \n\n(1.3) \n\nn \n\n\u00a2(x) = L ciu(bix - ad \n\ni=l \n\n623 \n\n\f624 \n\nHelmke and Williamson \n\nor even compositions of these are considered. \nThe purpose of this paper is to explore some parametrization issues regarding (1.1) \nand in particular to show the close connection these representations have with the \nstandard system-theoretic realization theory for rational functions. We show how \nto define a generalization of (1.1) parametrized by (A, b, c), where A is a matrix \nover a field, and band c are vectors. \n(This is made more precise below). The \nparametrization involves the (A, b, c) being used to define a rational function. The \ngeneralized u-representation is then defined in terms of the rational function. This \nconnection allows us to use results available for rational functions in the study of \nneural-network representations such as (1.1). It will also lead to an understanding \nof the geometry of the space of functions. \nOne of the main contributions of the paper is to show how in general neural network \nrepresentations are related to rational functions. In this summary all proofs have \nbeen omitted. A complete version of the paper is available from the second author. \n\n2 REALIZATIONS RELATIVE TO A FUNCTION \n\nIn this section we explore the relationship between sigmoidal representations of real \nanalytic functions \u00a2: II --+ IR defined on an interval II C IR, real rational functions \ndefined on the complex plane C, and the well established realization theory for \nlinear dynamical systems \n\nx(t) \ny(t) \n\nAx(t) + bu(t) \ncx(t) + du(t). \n\nFor standard textbooks on systems theory and realization theory we refer to [5, 7]. \nLet IK denote either the field IR of real numbers or the field C of complex numbers. \nLet ~ C C be an open and simply connected subset of the complex plane and let \nu: ~ --+ C be an analytic function defined on ~. For example, u may be obtained \nby an analytic continuation of some sigmoidal function u: IR --+ IR into the domain \nof holomorphy of the complex plane. \nLet T: V --+ V be a linear operator on a finite-dimensional IK-vector space V such \nthat T has all its eigenvalues in ~. Let r c ~ be a simple closed curve, oriented \nin the counter-clockwise direction, enclosing all the eigenvalues of T in its interior. \nMore generally, r may consist of a finite number of simple closed curves rk with \ninteriors ~~ such that the union of the domains ~~ contains all the eigenvalues of \nT. Then the matrix valued function u(T) is defined as the contour integral [8, p.44] \n\n(2.1) \n\nu(T) := 21. r u(z) (zI - T)-l dz. \n\n7rZ Jr \n\nNote that for each linear operator T: V --+ V, u(T): V --+ V is again a linear \noperator on V. \nIf we now make the substitution T := xl + A for x E C and A: V --+ V IK-linear, \nthen \n\nu(xI + A) = 21. f u(z) \u00abz - x)I - A)-l dz \n\n7rZ Jr \n\n\fRational Parametrizations of Neural Networks \n\n625 \n\nbecomes a function of the complex variable x, at least as long as r contains all the \neigenvalues of xl + A. Using the change of variables e := z - x we obtain \n(2.2) \nwhere r' = r - x C ~ encircles all the eigenvalues of A. \nGiven an arbitrary vector b E V and a linear functional c: V ---+- IK we achieve the \nrepresentation \n\nu(xl + A) = ~ ( u(x + e) (el - A)-I de \n\n27rZ Jr' \n\n(2.3) \nNote that in (2.3) the simple closed curve r c C is arbitrary, as long as it satisfies \nthe two conditions \n(2.4) \n(2.5) \n\nr encircles all the eigenvalues of A \nx + r = {x +el e E r} c~. \n\nLet : 1I ---+- ~ be a real analytic function in a single variable x E 1I, defined on an \ninterval II C ~. \n\nDefinition 2.1 A quadruple (A, b, c, d) is called a finite-dimensional u-realization \nof : II ---+- ~ over a field of constants IK if for all x E 1I \n(x) = cu(xl + A)b + d \n\n(2.6) \nholds, where the right hand side is given by (2.3) and r is assumed to satisfy the \nconditions (2.4)-(2.5). Here d E IK, b E V, and A: V ---+- V, c: V ---+- IK are IK-linear \nmaps and V is a finite dimensional IK-vector space. \n\nDefinition 2.2 The dimension (or degree) of a u-realization is dimK V. The 0'(cid:173)\ndegree of , denoted 817 (. A \nminimal u-realization is a u-realization of minimal dimension 817 ((x) = L.J 1 x\" \n\nN \n\"\" i \n. \nz. \n,=0 \n\nN $00, \n\n\f626 \n\nHelmke and Williamson \n\nand that (A, b, c) is a O'-realization in the sense of definition 2.1. The Taylor expan(cid:173)\n\nsion of c(xI + A)-lb at \u00b0 is (for A nonsingular) \n\n(3.2) \n\nThus \n\n(3.3) \n\nc(xI + A)-lb = 2:)-I)icA-(i+l)bxi. \n\n00 \n\ni=O \n\ni = 0, ... ,N. \n\nif and only if the expansions of (3.1) and (3.2) coincide up to order N. Observe [7] \nthat \n\n\u00a2(x) = c(xI + A)-lb and dim 'V < 00 \n\u00a2(x) is rational with \u00a2(oo) = 0. \n\nThe possibility of solving (3.3) is now easily seen as follows. Let 'V = lR N +1 = \nMap({O, ... ,N},lR) be the finite or infinite (N + I)-fold product space oflR. (Here \nMap(X, Y) denotes the set of all maps from X to Y.) If N is finite let \n\n(3.4) A-I \n\n[ O~ :.:: \n\n~1 \n\n1\n.] E ]R(N+l)X(N+l), \n\u00b00\n\nb = \n\n(10 ... O)T E'V, c= (~, \u00a2o, \u00a2l, ~~, ... , (~~~)!). \n\nFor N = 00 we take A-I: lRN ---Io]RN as a shift operator \n\n(3.5) \n\nA-I: ]RN ---Io]RN \nA-I: (xo, xl, ... ) .-- -(0, xo, Xl, \u2022\u2022 \u2022 ) \nb=(I,O, ... ), c=(0,\u00a20,\u00a2I,\u00a22/2!, ... ): \n\nand \n\nWe then have \n\nLemma 3.1 Let O'(x) = Li 7txi be analytic at X = \u00b0 and let (A, b, c) be a 0'(cid:173)\nrealization of the formal power series \u00a2( x) = L~o !ffxi , N ~ 00 (i. e. matching of \nthe first N + 1 derivatives of \u00a2(x) and cO'(xI + A)b at X = 0). Then \n\n(3.6) \n\n\u00a2i = cO'(i)(A)b for i = 0, ... , N. \n\nObserve that for O'(x) = x-I we have O'(i)(-A) = i!(A-l)i+1 as before. The exis(cid:173)\ntence part of the realization question Ql can now be restated as \n\nQ4 Given O'(x):= L:o~xi and a sequence of real numbers (\u00a2o, ... ,\u00a2N), does \n\nthere exist an (A, b, c) with \n\n(3.7) \n\n\u00a2i = cO'(i)(A)b, \n\ni = 0, ... , N? \n\n\fRational Parametrizations of Neural Networks \n\n627 \n\nThus question Q1 is essentially a Loewner interpolation question (1,3]. \nLet Ii = cAib, f. E No, and let \n\n(3.8) \n\nWrite \n\n(3.9) \n\n0\"1 \n0\"2 \n\n[ Uo \n0\"1 \n\nF= 7 0\"3 \n\n0\"2 \n0\"3 \n0\"4 \n\n10 \nII \n\n12/2! \n13/3! \n\nh]= \n\nand \n\n::! 1 = (Ui+i)r;=o\u00b7 \n[\u00a2] = [ ~q . \n\nThen (3.6) (for N = 00) can formally be written as \n(3.10) \n\n[\u00a2] = F\u00b7 hJ. \n\n\u2022 \n\nW \n\n17.+i \ni! Ii, z E !\"I0, eXls . \n\nOf course, any meaningful interpretation of (3.10) requires that the infinite sums \ni....Ji=O O\"i+i < 00, z E 1\"10 \n,\",00 \ni....Ji=O \nand 2:~0 C'Yi Jj!)2 < 00 exist. We have already seen that every finite or infinite \nsequence h] has a realization (A, b, c). Thus we obtain \n\nIS appens, lor examp e, I \n\n\u2022 t Th\u00b7 h \n\n1 \n\n\u00b7f ,\",00 \n\nC \n\n2 \n\n. \n\nW \n\nCorollary 3.2 A function \u00a2(x) admits a O\"-realization if and only if [\u00a2] E \nimage(F). \nCorollary 3.3 Let H = (/Hi )~=o. There exists a finite dimensionalO\"-realization \nof \u00a2(x) if and only if[\u00a2] = Fh] with rankH < 00. In this case 617 (\u00a2) = rankH. \n\n4 UNIQUENESS OF a-REALIZATIONS \n\nIn this section we consider the uniqueness of the representation (2.3). \n\nDefinition 4.1 (c.f. [2]) A system {91, ... ,9n} of continuous functions 9i: JI -P \nlR?, defined on an interval IT C lR?, is said to satisfy a Haar* condition of order \nn on JI if 91, ... ,9n are linearly independent, i. e. For every Cl, . .. , Cn E lR? with \n2:7:1 Ci9i(X) = 0 for all x E JI, then Cl = ... = Cn = O. \n\nRemark The Haar* condition is implied by the stronger classical Haar condition \nthat \n\n91(Xt} \n\ndet \n\n[ \n\n: \n\ngn(xd \n\nfor all distinct (xi)i=1 in IT. Equivalently, if 2:7=1 cigi(X) has n distinct roots in JI, \nthen Cl = ... = Cn = o. \nDefinition 4.2 A subset A of C is called self-conju9ate if a E A implies a E A. \n\n\f628 \n\nHelmke and Williamson \n\nLet (1': ~ ---+ ~ be a continuous function and define (1'~~)(x) := (1'(i)(x + Zi). Let \n'\" := (\"'1, ... ''''m) where L \"'j = n, \"'j EN, \"'j ~ 1, j = 1, ... ,m \n\nm \n\nj=l \n\ndenote a combination of n of size m. For a given combination\", = (\"'1, ... , \"'m) of \nn, let 1:= {I, ... ,m} and let Ji := {I, ... ,\"'d. Let Zm := {ZI, ... ,zm} and let \n\n( Z) \n\n(1' \"', m \n\n(4.1 ) \nDefinition 4.3 If for all m < n, for all combinations\", = (\"'I, ... ''''m) of n of size \nm, and for any self-conjugate set Zm of distinct points, (1'(\"\" Zm) satisfies a H aar* \ncondition of order n, then (1' is said to be Haar generating of order n. \n\n: ~ E ,J E \n\n:= (1'Zi \n\n{ (i-I). \n\nI \n\n. J} \n. \n\ni \n\nTheorem 4.4 (Uniqueness) Let (1': ~ ---+ ~ be Haar generating of order at least \n2n on 1I and let (A, b, c) and (A, b, c) be minimal (1'-realizations of order n of functions \n\u00a2 and \u00a2 respectively. Then the following equivalence holds \n\n(4.2) \n\nc(1'(xI + A)b = c(1'(xI + A)b \\:Ix E 1I \n\nc(eI - A)-lb = c(eI - A)-Ii; \n\n\\:Ie E ~. \n\nConversely, if ({2) holds for almost all order n triples (A, b, c), (A, b, c), then \n(1': ~ ---+ ~ is Haar generating on 1I of order ~ n. \n\nThe following result gives examples of activation functions (1': ~ ---+ ~ which are \nHaar generating. \n\nLemma 4.5 Let d E No. Then 1) The function (1'(x) = x- d is Haar generating of \narbitrary order. 2) The monomial (1'(x) = x d is Haar generating of order d + 1. 3) \nThe function e- x2 is Haar generating of arbitrary order. \n\nRemark A simple example of a (1' which is not Haar generating of order ~ 2 is \n(1'(x) = eX. In fact, in this case (1'(x+Zj) = Cj(1'(x+zd for Cj = eZj - Z \" \nj = 2, ... ,no \n\nRemark The function (1'(x) = (l+e- X)-l is not Haar generating of any order > 2. \nBy the periodicity of the complex exponential function, (1'( x + 27ri) = (1'( x - 27ri), \ni = .;::I, for all x. Thus the Haar* condition fails for Z2 = {27ri, -27ri}. \nIn particular, the above uniqueness result fails for the standard sigmoid case. In \norder to cover this case we need a further definition. \n\nDefinition 4.6 Let \u00b0 = n c C be a self-conjugate subset of C. A function (1': ~ ---+ \n~ is said to be Haar generating of order non 0, if for all m $ n, for all combinations \n'\" = (\"'1, ... ,\"'m) of n of size m, and for any self-conjugate subset Zm C n of \ndistinct points of 0, (1'(\"', Zm) satisfies a Haar* condition of order n. \nOf course for n = C, this definition coincides with definition 4.3. \n\n\fRational Parametrizations of Neural Networks \n\n629 \n\nTheorem 4.1 (Local Uniqueness) Let u: ~ -+ ~ be analytic and let 0 C C \nbe a self-conjugate subset contained in the domain of holomorphy of u. Let 1I be a \nnontrivial subinterval ofOn~. Suppose u: ~ -+ ~ is Haar generating on 0 of order \nat least 2n, n EN. Then for any two minimal u-realizations (A, b, c) and (A, b, c) \nof orders at most n with spect A, spect A E n the following equivalence holds: \n\n(4.3) \n\ncu(xI + A)~ = cu(xI + A)b 'Vx E 1I \nc(~I - A)-lb = c(~I - A)-Ii; \n\n'Ve E~. \n\nLemma 4.8 Let 0 := {z E C: I~zl < 7r}. Then the standard sigmoid function \nu(x) = (1 + e-X)-l is Haar generating on 0 of arbitrary order. \n\n5 MAIN RESULT \n\nAs a consequence of the uniqueness theorems 4.4 and 4.7 we can now state our main \nresult on the existence of minimal u-realizations of a function \u00a2(x). It extends a \nparallel result for standard transfer function realizations, where u( x) = x-I. \nTheorem 5.1 (Realization) Let n c C be a self-conjugate subset, contained in \nthe domain of holomorphy of a real meromorphic function u: ~ -+ ~. Suppose u is \nHaar generating on n of order at least 2n and assume \u00a2(x) has a finite dimensional \nrealization (A, b, c) of dimension at most n such that A has all its eigenvalues in O. \n\n1. There exists a minimal u-realization (AI, bl , cd of \u00a2(x) of degree 6q (\u00a2) ::; \n\ndim(A, b, c). Furthermore, there exists an invertible matrix S such that \n\n(5.1) \n\nSAS- I = [~l ~~ 1 ' Sb = [ be: 1 ' cS-1 = [CI, C2]. \n\n2. If (AI, bt, cd and (A~, b~, cD are minimal u-realizations of \u00a2( x) such that \nthe eigenvalues of Al and A~ are contained in 0, then there exists a unique \ninvertible matrix S such that \n(5.2) \n\n3. A u-realization (A, b, c) is minimal if and only if(A, b, c) is controllable and \nobservable; i.e. if and only if (A, b, c) satisfies the generic rank conditions \n\nrank(b, Ab, ... ,An-Ib) = n, \n\nrank [ \n\nc~ 1 = n \n\ncAn-1 \n\nfor A E ocn xn , bE ocn, cT E ocn . \n\nRemark The use of the terms \"observable\" and \"controllable\" is solely for formal \ncorrespondence with standard systems theory. There are no dynamical systems \nactually under consideration here. \n\n\f630 \n\nHelmke and Williamson \n\nthat \n\no A22 \n\n,b = \n\n[ b1 ] \n\n0 \n\n(A, b, c) of \n\nthe \n\n,c = Cl, C2 ,we ave u A = \n\nfor any u-realization \n[ ] h \n\nRemark Note \n] \n[ All A12] \nand thus cu(xI + A)b = clu(xI + A ll )b1 \u2022 Thus transformations of the above kind \nalways reduce the dimension of au-realization. \nCorollary 5.2 ([9]) Let u(x) = (1 + e- X )-l and let \u00a2(x) = E~=l CiU(X -\nai) = E?=l c~u(x - aD be two minimal length u-representations with I~ad < \nfor a unique permuta-\n11\", \ntion p: {I, . .. ,n} - {I, ... ,n}. In particular, minimal length representation (1.1) \nwith real coefficients ai and Ci are unique up to a permutation of the summands. \n\ni = 1, ... ,n. Then (aL cD = (ap(i)' Cp(i\u00bb \n\nform A \n* \n\nl~aH < 11\", \n\n() \n\n[ U(All) \n\n0 \n\nU(A22 ) \n\n6 CONCLUSIONS \n\nWe have drawn a connection between the realization theory for linear dynamical \nsystems and neural network representations. There are further connections (not \ndiscussed in this summary) between representations of the form (1.3) and rational \nfunctions of two variables. There are other questions concerning diagonalizable \nrealizations and Jordan forms. Details are given in the full length version of this \npaper. Open questions include the problem of partial realizations [4,6] .1 \n\nREFERENCES \n\n[1] A. C. Antoulas and B. D. O. Anderson, On the Scalar Rational Interpolation Prob(cid:173)\nlem,IMA Journal of Mathematical Control and Information, 3 (1986), pp. 61-88. \n[2] E. W. Cheney, Introduction to Approximation Theory, Chelsea Publishing Com(cid:173)\n\npany, New York, 1982. \n\n[3] W . F. Donoghue, Jr, Monotone Matrix Functions and Analytic Continuation, \n\nSpringer-Verlag, Berlin, 1974. \n\n[4] W. B. Gragg and A. Lindquist, On the Partial Realization Problem, Linear Algebra \n\nand its Applications, 50 (1983), pp. 277-319. \n\n[5] T . Kailath, Linear Systems, Prentice-Hall, Englewood Cliffs, 1980. \n[6] R. E. Kalman, On Partial Realizations, Transfer Functions, and Canonical Forms, \n\nActa Polytechnica Scandinavica, 31 (1979), pp. 9-32. \n\n[7] R. E. Kalman, P. L. Falb and M. A. Arbib, Topics in Mathematical System Theory, \n\nMcGraw-Hill, New York, 1969. \n\n[8] T. Kato, Perturbation Theory for Linear Operators, Springer-Verlag, Berlin, 1966. \n\n[9] R. C. Williamson and U. Helmke, Existence and Uniqueness Results for Neural \nNetwork Approximations, To appear, IEEE Transactions on Neural Networks, \n1993. \n\nIThis work was supported by the Australian Research Council, the Australian Telecom(cid:173)\nmunications and Electronics Research Board, and the Boeing Commercial Aircraft Com(cid:173)\npany (thanks to John Moore). Thanks to Eduardo Sontag for helpful comments also. \n\n\f", "award": [], "sourceid": 696, "authors": [{"given_name": "Uwe", "family_name": "Helmke", "institution": null}, {"given_name": "Robert", "family_name": "Williamson", "institution": null}]}