{"title": "On the Circuit Complexity of Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 953, "page_last": 959, "abstract": null, "full_text": "On The Circuit Complexity of Neural Networks \n\nv. P. Roychowdhury \nInformation Systems Laboratory \nStanford University \nStanford, CA, 94305 \n\nA. Orlitsky \nAT &T Bell Laboratories \n600 Mountain A venue \nMurray Hill, NJ, 07974 \n\nK. Y. Sill \nInformation Systems Laboratory \nStanford University \nStanford, CA, 94305 \n\nT. Kailath \nInformat.ion Systems Laboratory \nStanford U ni versity \nStanford, CA, 94305 \n\nAbstract \n\n'~le introduce a geometric approach for investigating the power of threshold \ncircuits. Viewing n-variable boolean functions as vectors in 'R'2\", we invoke \ntools from linear algebra and linear programming to derive new results on \nthe realizability of boolean functions using threshold gat.es. \nUsing this approach, one can obtain: (1) upper-bounds on the number of \nspurious memories in HopfielJ networks, and on the number of functions \nimplementable by a depth-d threshold circuit; (2) a lower bound on the \nnumber of ort.hogonal input. functions required to implement. a threshold \nfunction; (3) a necessary condit.ion for an arbit.rary set of input. functions to \nimplement a threshold function; (4) a lower bound on the error introduced \nin approximating boolean functions using sparse polynomials; (5) a limit \non the effectiveness of the only known lower-bound technique (based on \ncomputing correlations among boolean functions) for the depth of thresh(cid:173)\nold circuit.s implement.ing boolean functions, and (6) a constructive proof \nthat every boolean function f of n input variables is a threshold function \nof polynomially many input functions, none of which is significantly cor(cid:173)\nrelated with f. Some of these results lead t.o genera.lizations of key results \nconcerning threshold circuit complexity, particularly t.hose that are based \non the so-called spectral or Ha.rmonic analysis approach. Moreover, our \ngeometric approach yields simple proofs, based on elementary results from \nlinear algebra, for many of these earlier results. \n\n953 \n\n\f954 \n\nRoychowdhury, Orlitsky, Siu, and Kailath \n\n1 \n\nIntroduction \n\nAn S-input threshold gate is characterized by S real weights 'WI, \u2022\u2022. , 'Ws . It takes S \ninputs: Xl, . .. , xs, each either +1 or -1, and outputs +1 if the linear combination \n2::f=1 'WiXi is positive and -1 if the linear combination is negative. Threshold gates \nwere recently used to implement several functions of practical interest (including: \nParity, Addition, Multiplication, Division, and Comparison) with fewer gates and \nreduced depth than conventional circuits using AND, OR, and NOT gates [12,4, 11]. \n\nThis success has led to a considerable amount of research on the power of threshold \ncircuits [1, 10,9, 11,3, 13]. However, even simple questions remain unanswered. It \nis not known, for example, whether there is a function that can be computed by a \ndepth-3 threshold circuit with polynomially many gates but cannot be computed \nby any depth-2 circuit with polynomially many threshold gates. \n\nGeometric approaches have proven useful for analyzing threshold gates. An S-input \nthreshold gate corresponds to a hyperpla.ne in n.s. This has been used for example \nto count the number of boolean functions computable by a single threshold gate [6], \nand also to determine functions that cannot be implemented by a single threshold \ngate. However, t.hreshold circuits of depth two or more do not carry a simple geo(cid:173)\nmetric interpretation in 'R,s. The inputs to gates in the second level are themselves \nthreshold functions, hence the linear combination computed at the second level is \na non-linear function of the inputs. Lacking a geomet.ric view, researchers [5, 3] \nhave used indirect approaches, applying harmonic-analysis t.echniques to analyze \nthreshold gates. These techniques, apart from their complexity, restricted the input \nfunctions of the gates to be of very special types: input variables or parities of the \ninput variables, t.hus not applying even t.o depth-t.wo cil'Cuits. \n\nIn this paper, we describe a simple geometric relation between the output function \nof a threshold gate and its set of input functions. This applies to arbitrary sets of \ninput functions. Using this relation , we can prove t.he following results: (1) upper \nbounds on (a) the number of threshold functions of any set of input functions, (b) \nthe number of spurious memories in a IIopfield network, and (c) the number of \nfunctions implementable by threshold circuits of depth d; (2) a lower bound on the \nnumber of orthogonal input functions required to implement a threshold function; \n(3) a quantifiable necessary condition for a set of functions to implement a threshold \nfunction; (4) a lower bound on the error in approximating boolean functions using \nsparse polynomials; (5) a limit on the effectiveness of the correlation method used \nin [7] to prove t.hat a cert.ain function cannot be implement.ed by depth two circuit.s \nwith polynomially many gates and polynomially bounded weights; (6) a proof that \nevery function f is a threshold function of polynomially many input functions, none \nof which is significant.ly correlated wit.h f. \n\nSpecial cases of some of these results, where the input functions to a threshold gate \nare restricted to the input. variables, or parities of the input variables, were proven \nin [5, 3] using harmonic-analysis tools. Our technique shows that these tools are \nnot needed, providing simpler proofs for more general results. \n\nDue to space limitations, we cannot present the full details of our results. Instead, \nwe shall introduce the basic definitions followed by a technical summary of the \nresults; the emphasis will be on pointing out the motivation and relating our results \n\n\fOn The Circuit Complexity of Neural Networks \n\n955 \n\nwith those in the literature. The proofs and other technical details will appear in a \ncomplete journal paper. \n\n2 Definitions and Background \n\nAn n-variable boolean function is a mapping f : {-I, l}n - {-I, I} . We view I \nas a (column) vector in n 2n. Each of 1's 2n com ponents is either -lor + 1 and \nrepresents f(x) for a distinct value assignment x of the n boolean variables. We view \nthe S weights of an S-input threshold gate as a weight vector w = (WI, ... , Ws f \nin nS. \nLet the functions It, ... ,Is be the inputs of a threshold gate w. The gate computes \na function f (or f is the output of the gate) if the following vector equation holds: \n\nf = sgn (t j , w,) \n\n,=1 \n\n(1) \n\nwhere \n\nsgn(x) = \n{ \n\n+1 \n-1 \nundefined \n\nif x > 0, \nif x < 0, \nif x = o. \n\nNote that this definition requires that all components of 2::=1 liWi be nonzero. It \nis convenient to write Equat.ion (1) in a matrix form: \n\nwhere the input matrix \n\nf = sgn(Yw) \n\ny = [It\u00b7\u00b7\u00b7 fs] \n\nis a 2n by S matrix whose columns are the input functions. The function f, is a \nthreshold function of It, ... , fs if t.here exist.s a threshold gate (i.e., w) with inputs \nIt, ... , Is that computes I\u00b7 \nThese definitions form the basis of our approach. Each function, being a \u00b11 vector \nin n 2n , determines an orthant in 'R.2n. A function f is t.he output of a threshold gate \nwhose input. functions are It, ... , fs if and only if t.he linear combination 2::=1 liWi \ndefined by the gate lies inside the orthant determined by f. \nDefinition 1 The correlation of two n-variable boolean functions It and his: \n\nChh = UT f'2)/2 n ; \n\nthe two functions are uncorrelated or orthogonal if Chh = O. \n\nNote that Chh = 1 - 2dlI(lt, 12)/2n , where dlI(lt, h) is the Hamming distance \nbetween It and 12; thus, the correlation can be interpreted as a measure of how \n'close' the two functions are. \nFix the input functions It, ... fs to a threshold gate. The correlation vector of a \nfunction I, with the input functions is \n\nCfl' = (}TT f)/2 n = (C\"I C, h \n\n... CJJs f\u00b7 \n\nNext, we define C as the maximum in magnitude among the correlation coefficients, \ni.e. ,C={IC\",I : l::;i::;S}. \n\n\f956 \n\nRoychowdhury, Orlitsky, Siu, and Kailath \n\n3 Sumluary of Results \n\nThe correlation between two n-variable functions is a multiple of 2-(\"-1), bounded \nbetween -1 and 1, hence can assume 2\" + 1 values. The cOlTelation vector GJY = \n(GIJp . .. ,GJ It)T can therefore assume at most (2\" + I)S different values. There are \n22ft Boolean functions of n Boolean variables, hence many share the same correlation \nvector. However, the next theorem says that a tht\u00b7eshold function of II, ... , f s does \nnot share its correlation vector with any other function. \n\nUniqueness Theorem Let f be a threshold function of 11, ... , fs. Then, for all \n9 f; f, \n\nCorollary 1 There are at most (2\" + I)S threshold functions of any set of S input \nfunctions. \n\nThe special case of the Uniqueness Theorem where the functions II, ... , fs are \nt.he input variables had been proven in [5, 9]. The proof used harmonic-analysis \ntools such as Parseval's theorem. It relied on the mutual orthogonality of the input \nfunctions (namely, CX\"Xj = 0 for all i :f:. j). Another special case where the input \nfunctions are parities of the input variables was proven in [3]. The same proof \nwas used; see e.g. , pages 419-422 of [9]. Our proof shows that the harmonic(cid:173)\nanalysis tools and assumpt.ions are not needed thereby (1) significantly simplifying \nthe proof, and (2) showing that the functions It, ... , fs need not be orthogonal: \nthe Uniqueness Theorem holds for all collections of functions. The more general \nresult of the Uniqueness Theorem can be applied to obtain the following two new \ncounting results. \n\nCorollary 2 The number of stable states in a Hopfield network with n elements \nwhich is programmed by the outer product rule to store s given vectors is :::; \n2& log(\"+!). \n\nCorollary 3 Let Fn(S(71), d) be the number ofn-variable boolean functions comput(cid:173)\ned by depth-d thresh.old circuits with fan-in bounded by S(n) (we assume S(n) ~ n). \nThen, for all d, n ~ 1, \n\nIt follows easily from our geometric framework that if GJl' = 0 then f is not a \nthreshold function of It, ... , f s: every linear combination of It, ... , f s is orthogonal \nto f, hence cannot intersect the orthant determined by f. \nNext, we consider the case where en' :f:. O. Define the generalized spectrum to be \nthe S-dimensional vector: \n\n!3 = (!31, .. . ,!3s f = (yTy)-1yT f \n\n(the reason for the definition and\\he name will be clarified soon). \n\n\fOn The Circuit Complexity of Neural Networks \n\n957 \n\nSpectral-bound TheoreIn If I is a linear threshold function of It, ... , Is I then \n\ns L IPd ~ 1, \n\ni=l \n\nhence, \n\nS > 1//3, where /3 = max {IPil: 1 ~ i ~ S} \n\nThe Spectral-Bound theorem provides a way of lower bounding the number S of \ninput functions. Specifically, if Pi is exponentially small (in n) for all i E {I, ... , S}, \nthen S must be exponentially large. \n\nIn the special case where the input functions are parities of the input variables, all \ninput functions are orthogonal; hence yTy = 2n Is and \n\nP = ~yT I = Gn' . \n\n2n \n\nNote that every parity function p is a basis function of the Hadamard transfor(cid:173)\nm, hence Glp is the spectral coefficient corresponding to p in the transform (see \n[8, 2] for more details on spectral representation of boolean functions). Therefore, \nthe generalized spectrum in this case is the real spectrum of I . In that case, the \nSpectral-Bound Theorem implies that S > max{lcJJ\\l~i~S} ' Therefore, the num(cid:173)\nber of input functions needed is at least the reciprocal of the maximum magnitude \namong the spectral coefficients (i. e. , C). This special case was proved in [3]. A(cid:173)\ngain, their proofs used harmonic-analysis tools and assumptions that we prove are \nunnecessa.ry, thereby generalizing them to arbitrary input functions. Moreover, \nour geometric approach considerably simplifies the exposition by presenting simple \nproofs based on elementary results from linear algebra. \nIn general, we can show that if the input. functions Ii are orthogonal (i. e. , GI,l) = 0 \nfor i f. j) or asymptotically orthogonal (i. e. , lim GI,l \u00b7 = 0) then the number of \ninput functions S ~ I/C, where C is the largest (in magnitude) correlat.ion of the \noutput function with any of its input function. \n\nn-oo \n\n} \n\nWe can also use the generalized spectrum to derive a lower bound on the error \nincurred in approxima.ting a boolean function, I, using a set of basis functions. \nThe lower bound can then be applied to show that the Majority function cannot be \nclosely approxim ated by a sparse polynomial. In particular, it can be shown that if a \npolynomial of the input variables with only polynomially many (in n) monomials is \nused to approximate an n variable Majority function then the approximation error \nis n(I/(log log n )3/2). This provides a direct spectral approach for proving lower \nbounds on the approximation error. \n\nThe method of proving lower bounds on S in terms of the correlation coefficients \nGI I, of I with the possible input functions, can be termed the method of correla(cid:173)\ntions. Hajnal et. al. [7] used a different a.'3pect of this method 1 to prove a lower \nbound on t.he depth of a threshold circuit that computes the Inner-product-mod-2 \nfunction. \n\n1 They did not exactly use the correlation approach introduced in this paper, rather an \n\nequivalent framework. \n\n\f958 \n\nRoychowdhury, Orlitsky, Siu, and Kailath \n\nOur techniques can be applied to investigate the method of correlations in more \ndetail and prove some limits to its effectiveness. We can show that the number, \nS, of input functions need not be inversely proportional to the largest correlation \ncoefficient 6. In particular, we give two constructive procedures showing that any \nfunction 1 is a threshold function of O( n) input functions each having an exponen(cid:173)\ntially small correlation with I: IG,,;I ~ 2-(n-l). \nConstruction 1 Every boolean function 1 01 n variables (Jor n even) can be \nexpressed as a threshold function of 3n boolean functions: II, 12,\"\" hn such that \n(1) G\". = 0, V 1 ~ i ~ 3n - 1, and (2) Gfhn = 2-(n-l). \nConstruction 2 Every boolean function 1 of n variables can be expressed as a \nthreshold function of 2n boolean functions: II, 12,\"\" hn such that (1) G,,; = \n0, V 1 ~ i < 2n - 2, and (2) Gfhn_l = Gfhn = 2-(n-l). \nThe results of the above constructions are surprising. For example, in Construction \n1, the output function of the threshold gate is uncorrelated with all but one of \nthe input functions, and the only non-zero correlation is the smallest possible (= \n2-(n-I\u00bb). Note that 1 is not a threshold function of a set of input functions, each \nof which is orthogonal to I. \nThe above results thus provide a comprehensive understanding of the so-called \nmethod of correlations. In particular: (1) If the input functions are mutually or(cid:173)\nthogonal (or asymptotically orthogonal), then the method of correlations is effective \neven if exponential weights are allowed, i. e. , if a function is exponentially small cor(cid:173)\nrelated with every function from a pool of possible input functions, then one would \nrequire exponentially many inputs to implement the given function using a thresh(cid:173)\nold gate; (2) If the input functions are not mutually orthogonal, then the method of \ncorrelations need not be effective, i. e. , one can construct examples, where the out(cid:173)\nput function is correlated exponentially small with every input function, and yet it \nca.n be implemented as a threshold function of polynomially many input functions. \n\nFurthermore, the constructive procedures can also be considered as constituting a \npreliminary answer to the following question: Given an n-variable boolean function \nI, are there efficient procedures for expressing it as threshold functions of poly(cid:173)\nnomially many (in 11,) input functions? A procedure for so decomposing a given \nfunction 1 will be referred to as a threshold-decomposition procedure; moreover, a \ndecomposition procedure can be considered as efficient if the input functions have \nsimpler threshold implementations than I (i.e., easier to implement or require less \ndepth/size). Constructions 1 and 2 present two such threshold-decomposition pro(cid:173)\ncedures. At present, the efficiency of these constructions is not clear and further \nwork is necessary. 'Ve hope, however, that the general methodology introduced here \nmay lead to subsequent work resulting in more efficient threshold-decomposition \nprocedures. \n\n4 Concluding Renlarks \n\nWe have out.lined a new geometric approach for investigating the properties of \nthreshold circuits. In the process, we have developed a unified framework where \nmany of the previous results can be derived simply as special cases, and without in-\n\n\fOn The Circuit Complexity of Neural Networks \n\n959 \n\ntroducing too many seemingly difficult concepts. Moreover, we have derived several \nnew results that quantify the input/output relationships of threshold gates, derive \nlower bounds on the number of input functions required to implement a given func(cid:173)\ntion using a threshold gate, and also analyze the limitations of a well-known lower \nbound technique for threshold circuit. \nAcknowledgenlents \nThis work was supported in part by the Joint Services Program at Stanford Univer(cid:173)\nsity (US Army, US Navy, US Air Force) under Contract DAAL03-88-C-0011, the \nSDIO/IST, managed by the Army Research Office under Contract DAAL03-90-G-\n0108, and the Department of the Navy, NASA Headquarters, Center for Aeronautics \nand Space Information Sciences under Grant NAG'V-419-S6. \n\nReferences \n\n[1] E. Allender. A note on the power of threshold circuits. IEEE Symp. Found. \n\nCompo Sci., 30, 1989. \n\n[2] Y. Bradman, A. Orlitsky, and J. Hennessy. A Spectral Lower Bound Technique \nfor the size of Decision Trees and Two level AND/OR Circuits. IEEE Trans. \non Computers, 39, No. 2:282-287, February 1990. \n\n[3] J. Bruck. Harmonic Analysis of Polynomial Threshold Functions. SIAM Jour(cid:173)\n\nnal on Discrete Mathematics, May 1990. \n\n[4] A. K. Chandra, L. Stockmeyer, a.nd U. Vishkin. Constant depth reducibility. \n\nSiam J. Comput., 13:423-439, 1984. \n\n[5] C. K. Chow. On The Characterization of Threshold Functions. Proc. Symp. \n\non Switching Ci7'cuit Theory and Logical Design, pages 34-38, 1961. \n\n[6] T. M. Covel'. Geometrical and Statistical Properties of Systems of Linear In(cid:173)\n\nequalities with Applications in Pattern Recognit.ion. IEEE Trans. on Electronic \nComputers, EC-14:326-34, 1965. \n\n[7) A. Hajnal, W.1\\la.ass, P. Pudlak, 1\\L Szegedy, and G. Turan. Threshold circuits \n\nof bounded depth. IEEE Symp. Found. Compo Sci., 28:99-110, 1987. \n\n[8] R. J. Lechner. Harmonic analysis of switching functions. In A. Mukhopadhyay, \n\neditor, Recent Development in Switching Theory. Academic Press, 1971. \n\n[9] P. M. Lewis and C. L. Coates. Threshold Logic. John 'Viley & Sons, Inc., 1967. \n[10) I. Parberry a.nd G. Schnitger. Parallel Computation with Threshold Functions. \n\nJournal of Computer and System Sciences, 36(3):278-302, 1988. \n\n[11] J. Reif. On Threshold Circuits and Polynomial Computation. In Structure in \n\nComplexity Theory Symp., pages 118-123, 1987. \n\n[12] K. Y. Siu and J. Bruck. On the Power of Threshold Circuits with Small \n\n\"\"eights. to appear in SIAM J. Discrete Math. \n\n[13] K. Y. Siu, V. P. Roychowdhury, and T. Kailath. Computing with Almost \n\nOptimal Size Threshold Circuits. submitted to JCSS, 1990. \n\n\f\fPart XIV \n\nPerforl11ance COlll parisons \n\n\f\f", "award": [], "sourceid": 354, "authors": [{"given_name": "V. P.", "family_name": "Roychowdhury", "institution": null}, {"given_name": "K. Y.", "family_name": "Siu", "institution": null}, {"given_name": "A.", "family_name": "Orlitsky", "institution": null}, {"given_name": "T.", "family_name": "Kailath", "institution": null}]}