{"title": "The Devil and the Network: What Sparsity Implies to Robustness and Memory", "book": "Advances in Neural Information Processing Systems", "page_first": 883, "page_last": 889, "abstract": null, "full_text": "The Devil and the Network: \n\nWhat Sparsity Implies to Robustness and \n\nMemory \n\nSanjay Biswas and Santosh S. Venkatesh \n\nDepartment of Electrical Engineering \n\nUniversity of Pennsylvania \n\nPhiladelphia, PA 19104 \n\nAbstract \n\nRobustness is a commonly bruited property of neural networks; in particu(cid:173)\nlar, a folk theorem in neural computation asserts that neural networks-in \ncontexts with large interconnectivity-continue to function efficiently, al(cid:173)\nbeit with some degradation, in the presence of component damage or loss. \nA second folk theorem in such contexts asserts that dense interconnectiv(cid:173)\nity between neural elements is a sine qua non for the efficient usage of \nresources. These premises are formally examined in this communication \nin a setting that invokes the notion of the \"devil\" 1 in the network as an \nagent that produces sparsity by snipping connections. \n\n1 ON REMOVING THE FOLK FROM THE THEOREM \n\nRobustness in the presence of component damage is a property that is commonly \nattributed to neural networks. The content of the following statement embodies \nthis sentiment. \n\nFolk Theorem 1: Computation in neural networks is not substantially \naffected by damage to network components. \n\nWhile such a statement is manifestly not true in general-witness networks with \n\"grandmother cells\" where damage to the critical cells fatally impairs the com(cid:173)\nputational ability of the network-there is anecdotal evidence in support of it in \n\n1 Well, maybe an imp. \n\n883 \n\n\f884 \n\nBiswas and Venkatesh \n\nsituations where the network has a more \"distributed\" flavour with relatively dense \ninterconnectivity of elements and a distributed format for the storage of information. \nQualitatively, the phenomenon is akin to holographic modes of storing information \nwhere the distributed, non-localised format of information storage carries with it a \nmeasure of security against component damage. \n\nThe flip side to the robust folk theorem is the following observation, robustness \nnotwithstanding: \n\nFolk Theorem 2: Dense interconnectivity is a sine qua non for efficient \nusage of resources; in particular, sparser structures exhibit a degradation \nin compu tationalcapability. \n\nAgain, disclaimers have to be thrown in on the applicability of such a statement. \nIn recurrent network architectures, however, this might seem to have some merit. \nIn particular, in associative memory applications, while structural robustness might \nguarantee that the loss in memory storage capacity with increased interconnection \nsparsity may not be catastrophic , nonetheless intuitively a drop in capacity with \nincreased sparsity may be expected. \n\nThis communication represents an effort to mathematically codify these tenets. In \nthe setting we examine we formally introduce sparse network inter connectivity by \ninvoking the notion of a (puckish) devil in the network which severs interconnection \nlinks between neurons. Our results here involve some surprising consequences(cid:173)\nviewed in the light of the two folk theorems-of sparse interconnectivity to robust(cid:173)\nness and to memory storage capability. Only the main results are stated here; for \nextensions and details of proofs we refer the interested reader to Venkatesh (1990) \nand Biswas and Venkatesh (1990). \n\nNotation We denote by IB the set {-1, 1}. For every integer k we denote the set \nof integers {1, 2, . . . ,k} by [k]. By ordered multiset we mean an ordered collection \nof elements with repetition of elements allowed, and by k-set we mean an ordered \nmultiset of k elements. All logarithms in the exposition are to base e. \n\n2 RECURRENT NETWORKS \n\n2.1 \n\nINTERCONNECTION GRAPHS \n\nWe consider a recurrent network of n formal neurons. The allowed pattern of \nneural inter connectivity is specified by the edges of a (bipartite) interconnectivity \ngraph, Gn , on vertices, [n] x [n]. In particular, the existence of an edge {i,i} in \nG n is indicative that the state of neuron j is input to neuron i.2 The network is \ncharacterised by an n x n matrix of weights, W = [Wij], where Wij denotes the \n(real) weight modulating the state of neuron i at the input of neuron i. If u E IBn \nis the current state of the system, an update, Ui ~ u~ of the state of neuron i is \n\n2Equivalently, imagine a devil loose with a pair of scissors snipping those interconnec(cid:173)\n\ntions for which {i, j} ~ Gn \u2022 For a complementary discussion of sparse interconnectivity \nsee Koml6s and Paturi (1988) . \n\n\fThe Devil and the Network \n\n885 \n\nspecified by the linear threshold rule \n\nU~ = sgn (. ~ WiiUi) \n\n.dl,)}e G \n\nThe network dynamics describe trajectories in a state space comprised of the vertices \nof the n-cube. 3 We are interested in an associative memory application where we \nwish to store a desired set of states-the memories-as fixed points of the network, \nand with the property that errors in an input representation of a memory are \ncorrected and the memory retrieved. \n\n2.2 DOMINATORS \nLet u E IBn be a memory and 0 ~ p < 1 a parameter. Corresponding to the memory \nu we generate a probe u E mn by independently specifying the components, Uj, of \nthe probe as follows: \n\n~ \nu\u00b7 -\n) -\n\n{U)' \n\n-Uj \n\nwith probability 1 - P \nwith probability p. \n\n(1) \n\nWe call u a random probe with parameter p. \n\nDefinition 2.1 We say that a memory, u, dominates over a radius pn if, with \nprobability approaching one as n --r 00, the network corrects all errors in a ran(cid:173)\ndom probe with parameter p in one synchronous step. We call p the (fractional) \ndominance radius. We also say that u is stable if it is a O-dominator . \n\nREMARKS: Note that stable memories are just fixed points of the network. Also, \nthe expected number of errors in a probe is pn. \n\n2.3 CODES \nFor given integers m ~ 1, n ~ 1, a code, x::;a, is a collection of ordered multisets of \nsize m from IBn. We say that an m-set of memories is admissible iff it is in x::;a.4 \nThus, a code just specifies which m-sets are allowable as memories. Examples of \ncodes include: the set of all multisets of size m from IBn; a single multiset of size \nm from IBn; all collections of m mutually orthogonal vectors in IBn; all m-sets of \nvectors in IBn in general position. \n\nDefine two ordered multisets of memories to be equivalent if they are permutations \nof one another. We define the size of a code, X::;a, to be the number of distinct \nequivalence classes of m-sets of memories. We will be interested in codes of rela(cid:173)\ntively large size: log Ix::;a lin --r 00 as n --r 00. In particular, we require at least \nan exponential number of choices of (equivalence classes of) admissible m-sets of \nmemOries. \n\n3 As usual, there are Liapunov functions for the system under suitable conditions On \n\nthe interconnectivity graph and the corresponding weights. \n\n4We define admissible m-sets of memories in terms of ordered multisets rather than \n\nsets so as to obviate certain technical nuisances. \n\n\f886 \n\nBiswas and Venkatesh \n\n2.4 CAPACITY \n\nFor each fixed nand inter connectivity graph, G n , an algorithm, X, is a prescription \nwhich, given an m-set of memories, produces a corresponding set of interconnection \nweights, Wij, i E [n], {i,j} E Gn . For m ~ 1 let A(u1 , ... ,urn) be some attribute \nof m-sets of memories. (The following, for instance, are examples of attributes of \nadmissible sets of memories: all the memories are stable in the network generated \nby X; almost all the memories dominate over a radius pn.) For given nand m, we \nchoose a random m-set of memories, u 1 , \u2022.. , urn, from the uniform distribution on \nK~. \n\nDefinition 2.2 Given interconnectivity graphs Gn , codes K~, and algorithm X, \na sequence, {Cn}~=l' is a capacity function for the attribute A (or A-capacity for \nshort) if for .-\\ > 0 arbitrarily small: \n\na) P {A(u1 , ... , urn)} -+ 1 as n -+ 00 whenever m ~ (1 -\n.-\\)Cn ; \nb) P {A(u1 , ... ,urn)} -+ 0 as n -+ 00 whenever m ~ (1 + .-\\)Cn . \n\nWe also say that Cn is a lower A-capacity if property (a) holds, and that Cn is an \nupper A-capacity if property (b) holds. \n\nFor m ~ 1 let u 1 , ... , urn E IBn be an m-set of memories chosen from a code K~. \nThe outer-product algorithm specifies the interconnection weights, Wij, according \nto the following rule: for i E [n], {i,j} E Gn , \n\nW\u00b7\u00b7 -\n\nI) - ~ i \n\nrn \n' \" ' tluf3 \nj' \nf3=1 \n\n(2) \n\nIn general, if the interconnectivity graph, Gn , is symmetric then, under a suitable \nmode of operation, there is a Liapunov function for the network specified by the \nouter-product algorithm. Given graphs G n , codes K~, and the outer-product algo(cid:173)\nrithm, for fixed 0 ~ p < 1/2 we are interested in the attribute V p that each of the \nm memories dominates over a radius pn. \n\n3 RANDOM GRAPHS \n\nWe investigate the effect of a random loss of neural interconnections in a recurrent \nnetwork of n neurons by considering a random bipartite interconnectivity graph \nRGn on vertices [n] x [n] with \n\nP {{i,j} E RGn } = p \n\nfor all i E [n], j E [n], and with these probabilities being mutually independent. \nThe interconnection probability p is called the sparsity parameter and may depend \non n. The system described above is formally equivalent to beginning with a fully(cid:173)\ninterconnected network of neurons with specified interconnection weights Wij, and \nthen invoking a devil which randomly severs interconnection links, independently \nretaining each interconnection weight Wij with probability p, and severing it (re(cid:173)\nplacing it with a zero weight) with probability q = 1 - p. \n\n\fThe Devil and the Network \n\n887 \n\nLet CK': denote the complete code of all choices of ordered multisets of size m from \nIBn. \n\nTheorem 3.1 Let 0 ~ p < 1/2 be a fixed dominance radius, and let the sparsity \nparameter p satisfy pn2 -+ 00 as n -+ 00. Then (1 - 2p)2pn/210gpn2 is a Vp(cid:173)\ncapacity for random interconnectivity graphs RGn , complete codes CK':, and the \nouter-product algorithm. \n\nREMARKS: The above result graphically validates Folk Theorem 1 on the fault(cid:173)\ntolerant nature of the network; specifically, the network exhibits a graceful degra(cid:173)\ndation in storage capacity as the loss in interconnections increases. Catastrophic \nfailure occurs only when p is smaller than log n/n: each neuron need retain only of \nthe order of o (log n) links of a total of n possible links with other neurons for useful \nassociative properties to emerge. \n\n4 BLOCK GRAPHS \n\nOne of the simplest (and most regular) forms of sparsity that a favourably disposed \ndevil might enjoin is block sparsity where the neurons are partitioned into disjoint \nsubsets of neurons with full-interconnectivity within each subset and no neural \ninterconnections between subsets. The weight matrix in this case takes on a block \ndiagonal form, and the interconnectivity graph is composed of a set of disjoint, \ncomplete bipartite sub-graphs. \nMore formally, let 1 ~ b ~ n be a positive integer, and let {h, ... ,!n/b} partition \n[n] such that each subset of indices, lTc, k = 1, ... , nib, has size IITcI = b.5 We call \neach ITc a block and b the block size. We specify the edges of the (bipartite) block \ninterconnectivity graph BGn by {i, j} E BGn iff i and j lie in a common block. \nTheorem 4.1 Let the block size b be such that b = O(n) as n -+ 00, and let \no ~ p < 1/2 be a fixed dominance radius. Then (1- 2p)2b/210gbn is a Vp-capacity \nfor block interconnectivity graphs BGn , complete codes CK'::, and the outer-product \nalgorithm. \n\nCorollary 4.2 Under the conditions of theorem 4.1 the fixed point memory capacity \nis b/210g bn. \n\nCorollary 4.3 For a fully-interconnected graph, complete codes CK'::, and the \nouter-product algorithm, the fixed point memory capacity is n/410g n. \n\nCorollary 4.3 is the main result shown by McEliece, Posner, Rodemich, and \nVenkatesh (1987). Theorem 4.1 extends the result and shows (formally validat(cid:173)\ning the intuition espoused in Folk Theorem 2) that increased sparsity causes a loss \nin capacity if the code is complete, i.e., all choices of memories are considered ad(cid:173)\nmissible. It is possible, however, to design codes to take advantage of the sparse \ninterconnectivity structure, rather at odds with the Folk Theorem. \n\nSHere, as in the rest of the paper, we ignore details with regard to integer rounding. \n\n\f888 \n\nBiswas and Venkatesh \n\nWithout loss of generality let us assume that block h consists of the first b indices, \n[b], block 12 the next b indices, [2b] - [b], and so on, with the last block Inlb consisting \nof the last b indices, [n] - [n - b). We can then partition any vector u E rnn as \n\nu = ( :~ ) \n\n(3) \n\nUnlb \n\nwhere for k = 1, ... , nib, Uk is the vector of components corresponding to block Ik. \nFor M ~ 1 we form the block code BK,n \nas follows: to each ordered multiset of \nM vectors, u 1 , ... , u M from rnn , we associate a unique ordered multiset in BK:r;;n/b \nby lexicographically ordering all Mnlb vectors of the form \n\nMn./b \n\n01 n./b \nu nlb \n\nThus, we obtain an admissible set of Mnl b memories from any ordered multiset \nof M vectors in rnn by \"mixing\" the blocks of the vectors. We call each M-set of \nvectors, u 1, ... , u M E rn n , the generating vectors for the corresponding admissible \nset of memones m BK,n \nEXAMPLE: Consider a case with n = 4, block size b = 2, and M = 2 generating \nvectors. To any 2-set of generating vectors there corresponds a unique 4(=Mnlb)_set \nin the block code as follows: \n\n. . Mn./b \n. \n\nul \nu1 \n2 \n\nu1 \n3 \nu 1 \n4 \n\nu2 \n1 \nu2 \n2 \n\nu2 \n3 \nu2 \n4 \n\n1---+ \n\nu 1 \n1 \nu 1 \n2 \n\nu 1 \n3 \nu1 \n4 \n\nul \nu1 \n2 \n\nu2 \n3 \nu2 \n4 \n\nu2 \n1 \nu2 \n2 \n\nu1 \n3 \nu 1 \n4 \n\nu2 \n1 \nu2 \n2 \n\nu2 \n3 \nu2 \n4 \n\nTheorem 4.4 Let 0 ~ p < 1/2 be a fixed dominance radius. Then we have the fol(cid:173)\nlowing capacity estimates for block interconnectivity graphs BGn , block codes BKr;: , \nand the outer-product algorithm: \n\na) If the block size b satisfies n log log bnlb log bn --+ 0 as n --+ 00 then the \n1) p-capacity is \n\n[ (1 - 2P)2b] nib \n\n2 log bn \n\nb) Define for any v \n\n\fThe Devil and the Network \n\n889 \n\nIf the block size b satisfies b/logn -- 00 and blogbn/loglogbn = O(n) as \nn -- 00, then Cn(v) is a lower 1Jp -capacity for any choice of v < 3/2 and \nCn(v) is an upper 1Jp-capacity for any v> 3/2. \n\nCorollary 4.5 If, for fixed t ~ 1, we have b = nit, then, under the conditions of \ntheorem 4.4, the 1Jp -capacity is \n\n(1- 2p) 2tt-t4-t (~)t \n\nlogn \n\nCorollary 4.6 For any fixed dominance radius 0 ~ p < 1/2, and for any T < 1, a \nconstant c > 0 and a code of size n (2 cn2 - r\n) can be found such that it is possible \nto achieve lower 1J p -capacities which are n (2nr) in recurrent neural networks with \ninterconnectivity graphs of degree e (n 1- T ). \n\nIf the number of blocks is kept fixed as n grows (i.e., the block size \n\nREMARKS : \ngrows linearly with n) then capacities polynomial in n are attained. If the num(cid:173)\nber of blocks increases with n (i.e., the block size grows sub-linearly with n) then \nsuper-polynomial capacities are attained. Furthermore, we have the surprising re(cid:173)\nsult rather at odds with Folk Theorem 2 that very large storage capacities can \nbe obtained at the expense of code size (while still retaining large code sizes) in \nincreasingly sparse networks. \n\nAcknow ledgements \n\nThe support of research grants from E. I. Dupont de Nemours, Inc . and the Air \nForce Office of Scientific Research (grant number AFOSR 89-0523) is gratefully \nacknowledged. \n\nReferences \n\nBiswas, S. and S. S. Venkatesh (1990), \"Codes, sparsity, and capacity in neural \nassociative memory,\" submitted for publication. \nKoml6s, J. and R. Paturi (1988), \"Effects of connectivity in associative memory \nmodels,\" Technical Report CS88-131, University of California, San Diego, 1988. \nMcEliece, R. J., E. C. Posner, E. R. Rodemich, and S. S. Venkatesh (1987), \"The \ncapacity of the Hopfield associative memory,\" IEEE Trans. Inform. Theory, vol. IT-\n33, pp. 461-482. \n\nVenkatesh, S. S. (1990), \"Robustness in neural computation: random graphs and \nsparsity,\" to appear IEEE Trans. Inform. Theory. \n\n\f", "award": [], "sourceid": 439, "authors": [{"given_name": "Sanjay", "family_name": "Biswas", "institution": null}, {"given_name": "Santosh", "family_name": "Venkatesh", "institution": null}]}