{"title": "Efficient Simulation of Biological Neural Networks on Massively Parallel Supercomputers with Hypercube Architecture", "book": "Advances in Neural Information Processing Systems", "page_first": 904, "page_last": 910, "abstract": null, "full_text": "Efficient Simulation of Biological Neural \n\nNetworks on Massively Parallel \nSupercomputers with Hypercube \n\nArchi tect ure \n\nErnst Niebur \n\nComputation and Neural Systems \nCalifornia Institute of Technology \n\nPasadena, CA 91125, USA \n\nDean Brettle \n\nBooz, Allen and Hamilton, Inc. \n\n8283 Greensboro Drive \n\nMcLean, VA 22102-3838, USA \n\nAbstract \n\nWe present a neural network simulation which we implemented \non the massively parallel Connection Machine 2. \nIn contrast to \nprevious work, this simulator is based on biologically realistic neu(cid:173)\nrons with nontrivial single-cell dynamics, high connectivity with a \nstructure modelled in agreement with biological data, and preser(cid:173)\nvation of the temporal dynamics of spike interactions. We simulate \nneural networks of 16,384 neurons coupled by about 1000 synapses \nper neuron, and estimate the performance for much larger systems. \nCommunication between neurons is identified as the computation(cid:173)\nally most demanding task and we present a novel method to over(cid:173)\ncome this bottleneck. The simulator has already been used to study \nthe primary visual system of the cat. \n\n1 \n\nINTRODUCTION \n\nNeural networks have been implemented previously on massively parallel supercom(cid:173)\nputers (Fujimoto et al., 1992, Zhang et al., 1990). However, these are implemen(cid:173)\ntations of artificial, highly simplified neural networks, while our aim was explicitly \nto provide a simulator for biologically realistic neural networks. There is also at \nleast one implementation of biologically realistic neuronal systems on a moderately \n\n904 \n\n\fEfficient Simulation of Biological Neural Networks \n\n905 \n\nparallel but powerful machine (De Schutter and Bower, 1992) , but the complexity \nof the used neuron model makes simulation of larger numbers of neurons impracti(cid:173)\ncal. Our interest here is to provide an efficient simulator of large neural networks \nof cortex and related subcortical structures. \n\nThe most important characteristics of the neuronal systems we want to simulate \nare the following: \n\n\u2022 Cells are highly interconnected (several thousand connections per cell) but \n\nfar from fully interconnected. \n\n\u2022 Connections do not follow simple deterministic rules (like, e.g., nearest \n\nneighbor connections). \n\n\u2022 Cells communicate with each other via delayed spikes which are binary \n\nevents (\"all-or-nothing\"). \n\n\u2022 Such communication events are short (1 ms) and infrequent (1 to 100 per \n\nsecond). \n\n\u2022 The temporal fine structure of the spike trains may be an important \ninformation carrier (Kreiter and Singer, 1992, Richmond and Optican, \n1990, Softky and Koch, 1993). \n\n2 \n\nIMPLEMENTATION \n\nThe biological network was modelled as a set of improved integrate-and-fire neurons \nwhich communicate with each other via delayed impulses (spikes). The single-cell \nmodel and details of the connectivity have been described in refs. (Wehmeier et al., \n1989, Worgotter et al., 1991). \n\nDespite the rare occurrence of action potentials, their processing accounts for the \nmajor workload of the machine. The efficient implementation of inter-neuron com(cid:173)\nmunication is therefore the decisive factor which determines the efficacy of the sim(cid:173)\nulator implementation. By \"spike propagation\" we denote the process by which a \nneuron communicates the occurrence of an action potential to all its postsynaptic \npartners. While the most efficient computation of the neuronal equations is ob(cid:173)\ntained by mapping each neuron on one processor, this is very inefficient for spike \npropagation. This is due to the fact that spikes are rare events and that in the SIMD \narchitecture used, each processor has to wait for the completion of the current tasks \nof all other processors. Therefore, only very few processors are active at any given \ntime step. A more efficient data representation than provided by this \"direct\" algo(cid:173)\nrithm is shown in Fig. 1. In this \"transposed\" scheme, a processor changes its role \nfrom simulating one of the neurons to simulating one synapse, which is, in general, \nnot a synapse of the neuron simulated by the processor (see legend of Fig. 1). At \nany given time step, the addresses of the processors representing spiking neurons are \nbroadcast along binary trees which are implemented efficiently (in time wmplexity \nlog2M for M processors) in a hypercube architecture such as the CM-2. We obtain \nfurther computational efficiency by dividing the processor array into \"partitions\" of \nsize M and by implementing partially parallel I/O scheduling (both not discussed \nhere). \n\n\f906 \n\nNiebur and Brettle \n\n1 \n\n2 \n\n3 \n\n4 \n\n5 \n\n1 ,1 1,2 1,3 1,4 \n2,1 2,2 2,3 2,4 \n3,1 3,2 3,3 3,4 \n\n\u00b7 \n\u00b7 \n\u00b7 \n\n\u00b7 \n1 , i \n\u00b7 2, i \n3, i \n\n\u00b7 \n\u00b7 \n\u00b7 \n\nJ...-............ - _ ' -__ ..IIl.I _ ......... ,.... ... ......... -.. _ .............. _ .. \n\u00b7 \ni,1 \n......... - r.-. ......... ---~r' \n........ -- --_ ..... \n\ni,2 i,3 \n\ni,4 \n\n+ + + + \n\n..... \"\". .. ....... ~ .. _\\,a .... ..,. --_ ...... r-- t \n-- t--...... .. ..... ..,.- 1----- -\n\ni, i \n\n\u00b7 \n\n+ \n\nM-1 M \n\n\u00b7 1,M \n\u00b7 2,M \n\u00b7 3,M \n\n............. 1----- ._-................. \ni, M \n--- ......... -..--... ---\n\n+ \n\nM,1 M,2 M,3 M,4 \n\nM,i \n\n. M,M \n\nFigure 1: Transposed storage method for connections. The storage space for each \nof the N processors is represented by a vertical column. A small part of this space \nis used for the time-dependent variables describing each of the N neurons (upper \npart of each column, \"Cell data\"). The main part of the storage is used for datasets \nconsisting of the addresses, weights and delays of the synapses (\"Synapse data\"), \nrepresented by the indices i, j in the figure. For instance, \"1, I\" stands for the first \nsynapse of neuron 1, \"1,2\" for the second synapse of this neuron and so on. Note \nthat the storage space of processor i does not hold the synapses of neuron i. If \nneuron i generates a spike, all M processors are used for propagating the spike \n(black arrows) \n\n\fEfficient Simulation of Biological Neural Networks \n\n907 \n\n3 PERFORMANCE ANALYSIS \n\nIn order to accurately compare the performance of the described spike propagation \nalgorithms, we implemented both the direct algorithm and the transposed algorithm \nand compared their performances with analytical estimates. \n\n-Cf) \n\n::s \n........ \n\nE-t \n\n10 \n\n1 \n\n0.1 \n\n0.01 \n\n0.001 \n\n0.0001 0.001 \n\n0.01 \n\n0.1 \n\n1 \n\np \n\nFigure 2: Execution time for the direct algorithm (diamonds) and the transposed \nalgorithm (crosses) as function of the spiking probability p for each cell. If all cells \nfire at each time step, there is no advantage for the transposed algorithm; in fact, \nit is at a disadvantage due to the overhead discussed in the text. Therefore, the \ntwo curves cross at a value just below p = 1. As expected, the largest difference \nbetween them is found for the smallest values of p. \n\nFigure 2 compares the time required for the direct algorithm to the time required \nfor the transposed algorithm as a function of p, the average number of spikes per \nneuron per time step. Note that while the time required rises much more rapidly for \nthe transposed algorithm than the direct algorithm, it takes significantly less time \nfor p < 0.5. The peak speedup was a factor of 454 which occurred at p = 0.00012 \n(or 1.2 impulses per second at a timestep of O.lms, corresponding approximately to \nspontaneous spike rates). The absolutely highest possible speedup, obtained if there \nis exactly one spike in every partition at every time step, is equal to M (M == 1024 \nin this simulation). The average speedup is determined by the maximal number of \nspiking neurons per time step in any partition, since the processors in all partitions \nhave to wait until the last partition has propagated all of its spikes. The average \nmaximal number of spikes in a system of N partitions, each one consisting of M \n\n\f908 \n\nNiebur and Brettle \n\nneurons IS \n\nNmar(p, M, N) = {; k J; ( ~ ) TI(k)mft(k)N-m \n\nM \n\nN \n\n(1) \n\nwhere p is the spiking probability of one cell, II(k) is the probability that a given \npartition has k spikes and \n\nk-l \n\nIT(k) = L II(i) \n\ni=O \n\n(2) \n\n1000 \n\n100 \n\n10 \n\n1 \n\n0.0001 0.001 \n\n0.01 \n\n0.1 \n\n1 \n\np \n\nFigure 3: Speedup of the transposed algorithm over the direct algorithm as a func(cid:173)\ntion of p for different VP ratios; M = 1024. The ideal speedup (uppermost curve; \ndiamonds), computed in eq. 3 essentially determines the observed speedup. (lower \ncurves; \"+\" signs: VP-ratio=1, diamonds: VP-raio=2, crosses: VP-ratio=4.). The \ndifference between the ideal and the effectively obtained speedup is due to commu(cid:173)\nnication and other overhead of the transposed algorithm. Note that the difference \nin speedup for different VP ratios (difference between lower curves) is relatively \nsmall, which shows that the penalty for using larger neuron numbers is not large. \nAs expected, the speedup approaches unity for p ~ 1 in all cases. \n\nIt can be shown that for independent neurons and for low spike rates, II( k) is the \nPoisson distribution and IT(k) the incomplete r function. The average maximal \n\n\fEfficient Simulation of Biological Neural Networks \n\n909 \n\nnumber of spikes for M = 1024 and different values of P (eq. 1) can be shown to be \na mildly growing function of the number of partitions which shows that the perfor(cid:173)\nmance will not be limited crucially by changing the number of partitions. Therefore, \nthe algorithm scales well with increasing network size and the performance-limiting \nfactor is the activity level in the network and not the size of the network. This is \nalso evident in Fig. 3 which shows the effectively obtained speedup compared to the \nideal speedup, which would be obtained if the transposed algorithm were limited \nonly by eq. 1 and would not require any additional communication or other over(cid:173)\nhead. Using Nmax(P, M, N) from eq. 1 it is clear that this ideal speedup is given \nby \n\nM \n\nNmax(P, M, N) \n\n(3) \n\nThe difference between theory and experiment can be attributed to the time re(cid:173)\nquired for the spread operation and other additional overhead associated with the \ntransposed algorithm. At P = 0.0010 (or 10 ips) the obtained speedup is a factor \nof 106. \n\n4 VERY LARGE SYSTEMS \n\nU sing the full local memory of the machine and the \"Virtual Processor\" capabil(cid:173)\nity of the CM-2, the maximal number of neurons that can be simulated without \nany change of algorithm is as high as 4,194,304 (\"4M\"). Figure 3 shows that the \nspeedup is reduced only slightly as the number of neurons increases, when the addi(cid:173)\ntional neurons are simulated by virtual processors . The performance is essentially \nlimited by the mean network activity, whose effect is expressed by eq. 3, and the \nadditional overhead originating from the higher \"VP ratio\" is small. This corrob(cid:173)\norates our earlier conclusion that the algorithm scales well with the size of the \nsimulated system. Although we did not study the scaling of execution time with \nthe size of the simulated system for more than 16,384 real processors, we expect the \ntotal execution time to be basically independent of the number of neurons, as long \nas additional neurons are distributed on additional processors. \n\nAcknowlegdements \n\nWe thank U. Wehmeier and F . Worgotter who provided us with the code for gen(cid:173)\nerating the connections, and G. Holt for his retina simulator. Discussions with C. \nKoch and F. W orgotter were very helpful. We would like to thank C. Koch for \nhis continuing support and for providing a stimulating research atmosphere. We \nalso acknowledge the Advanced Computing Laboratory of Los Alamos National \nLaboratory, Los Alamos, NM 87545. Some of the numerical work was performed \non computing resources located at this facility. This work was supported by the \nNational Science Foundation, the Office of Naval Research, and the Air Force Office \nof Scientific Research. \n\n\f910 \n\nNiebur and Brettle \n\nReferences \n\nDe Schutter E. and Bower J .M. (1992). Purkinje cell simulation on the Intel Touch(cid:173)\n\nstone Delta with GENESIS. In Mihaly T. and Messina P., editors, Proceedings \nof the Grand Challenge Computing Fair, pages 268-279. CCSF Publications, \nCaltech, Pasadena CA. \n\nFujimoto Y., Fukuda N., and Akabane T. (1992). Massively parallel architectures for \nlarge scale neural network simulations. IEEE Transactions on Neural Networks, \n3(6):876-888. \n\nKreiter A.K. and Singer W. (1992). Oscillatory neuronal responses in the visual(cid:173)\n\ncortex of the awake macaque monkey. Europ. 1. Neurosci., 4(4):369-375. \n\nRichmond B.J. and Optican L.M. (1990). Temporal encoding of two-dimensional \n\npatterns by single units in primate primary visual cortex. II: Information trans(cid:173)\nmission. J. Neurophysiol., 64:370-380. \n\nSoftky W. and Koch C. (1993). The highly irregular firing of cortical-cells is incon(cid:173)\n\nsistent with temporal integration of random epsps. 1. Neurosci., 13(1):334-350. \nWehmeier U., Dong D., Koch C., and van Essen D. (1989). Modeling the visual \nsystem. In Koch C. and Segev I., editors, Methods in Neuronal Modeling, pages \n335-359. MIT Press, Cambridge, MA. \n\nWorgotter F., Niebur E., and Koch C. (1991). \n\nfunctional asymmetrical behavior in visual cortical cells. \n66(2):444-459. \n\nIsotropic connections generate \nJ. Neurophysiol., \n\nZhang X., Mckenna M., Mesirov J., and Waltz D. (1990). An efficient implemen(cid:173)\n\ntation of the back-propagation algorithm on the Connection Machine CM-2. \nIn Touretzky D.S., editor, Neural Information Processing Systems 2, pages \n801-809. Morgan-Kaufmann, San Mateo, CA. \n\n\f", "award": [], "sourceid": 729, "authors": [{"given_name": "Ernst", "family_name": "Niebur", "institution": null}, {"given_name": "Dean", "family_name": "Brettle", "institution": null}]}