Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Wolfgang Maass, Robert Legenstein, Nils Bertschinger
What makes a neural microcircuit computationally powerful? Or more precisely, which measurable quantities could explain why one microcir- cuit C is better suited for a particular family of computational tasks than another microcircuit C ? We propose in this article quantitative measures for evaluating the computational power and generalization capability of a neural microcircuit, and apply them to generic neural microcircuit mod- els drawn from different distributions. We validate the proposed mea- sures by comparing their prediction with direct evaluations of the com- putational performance of these microcircuit models. This procedure is applied first to microcircuit models that differ with regard to the spatial range of synaptic connections and with regard to the scale of synaptic efficacies in the circuit, and then to microcircuit models that differ with regard to the level of background input currents and the level of noise on the membrane potential of neurons. In this case the proposed method allows us to quantify differences in the computational power and gen- eralization capability of circuits in different dynamic regimes (UP- and DOWN-states) that have been demonstrated through intracellular record- ings in vivo.
1 Introduction
Rather than constructing particular microcircuit models that carry out particular computa- tions, we pursue in this article a different strategy, which is based on the assumption that the computational function of cortical microcircuits is not fully genetically encoded, but rather emerges through various forms of plasticity ("learning") in response to the actual distribution of signals that the neural microcircuit receives from its environment. From this perspective the question about the computational function of cortical microcircuits C turns into the questions:
a) What functions (i.e. maps from circuit inputs to circuit outputs) can the circuit C learn to compute.
b) How well can the circuit C generalize a specific learned computational function to new inputs?
We propose in this article a conceptual framework and quantitative measures for the in- vestigation of these two questions. In order to make this approach feasible, in spite of numerous unknowns regarding synaptic plasticity and the distribution of electrical and bio- chemical signals impinging on a cortical microcircuit, we make in the present first step of this approach the following simplifying assumptions:
Particular neurons ("readout neurons") learn via synaptic plasticity to extract specific information encoded in the spiking activity of neurons in the circuit.
We assume that the cortical microcircuit itself is highly recurrent, but that the impact of feedback that a readout neuron might send back into this circuit can be neglected.1
We assume that synaptic plasticity of readout neurons enables them to learn arbitrary linear transformations. More precisely, we assume that the input to such readout neuron can be approximated by a term n-1 w i=1 ixi(t), where n - 1 is the number of presynaptic neurons, xi(t) results from the output spike train of the ith presynaptic neuron by filtering it according to the low-pass filtering property of the membrane of the readout neuron,2 and wi is the efficacy of the synaptic connection. Thus wixi(t) models the time course of the contribution of previous spikes from the ith presynaptic neuron to the membrane potential at the soma of this readout neuron. We will refer to the vector x(t) as the circuit state at time t.
Under these unpleasant but apparently unavoidable simplifying assumptions we propose new quantitative criteria based on rigorous mathematical principles for evaluating a neural microcircuit C with regard to questions a) and b). We will compare in sections 4 and 5 the predictions of these quantitative measures with the actual computational performance achieved by 132 different types of neural microcircuit models, for a fairly large number of different computational tasks. All microcircuit models that we consider are based on bio- logical data for generic cortical microcircuits (as described in section 3), but have different settings of their parameters.
2 Measures for the kernel-quality and generalization capability of neural microcircuits
One interesting measure for probing the computational power of a neural circuit is the pair- wise separation property considered in [Maass et al., 2002]. This measure tells us to what extent the current circuit state x(t) reflects details of the input stream that occurred some time back in the past (see Fig. 1). Both circuit 2 and circuit 3 could be described as being chaotic since state differences resulting from earlier input differences persist. The "edge-of- chaos" [Langton, 1990] lies somewhere between points 1 and 2 according to Fig. 1c). But the best computational performance occurs between points 2 and 3 (see Fig. 2b)). Hence the "edge-of-chaos" is not a reliable predictor of computational power for circuits of spik- ing neurons. In addition, most real-world computational tasks require that the circuit gives a desired output not just for 2, but for a fairly large number m of significantly different inputs. One could of course test whether a circuit C can separate each of the m pairs of 2
1This assumption is best justified if such readout neuron is located for example in another brain area that receives massive input from many neurons in this microcircuit and only has diffuse back- wards projection. But it is certainly problematic and should be addressed in future elaborations of the present approach. 2One can be even more realistic and filter it also by a model for the short term dynamics of the synapse into the readout neuron, but this turns out to make no difference for the analysis proposed in this article.
8 a 4 b state separation 0.25 c 4 7 2 circuit 3 0.2 0 2 6 3 0 1 2 3 5 0.2 1 0.15 0.7 scale 2 4 0.1 0.5 W circuit 2 0.1 0.3 3 1 00 1 2 3 state separation 2 0.1 0.05 0.1 1 0.05 0.05 circuit 1 0 0.5 1 1.4 2 3 4 6 8 0 1.4 1.6 1.8 2 2.2 0 1 2 3 t [s]
Figure 1: Pointwise separation property for different types of neural microcircuit models as specified in section 3. Each circuit C was tested for two arrays u and v of 4 input spike trains at 20 Hz over 3 s that differed only during the first second. a) Euclidean differences between resulting circuit states xu(t) and xv(t) for t = 3 s, averaged over 20 circuits C and 20 pairs u, v for each indicated value of and Wscale (see section 3). b) Temporal evolution of xu(t) - xv(t) for 3 different circuits with values of , Wscale according to the 3 points marked in panel a) ( = 1.4, 2, 3 and Wscale = 0.3, 0.7, 2 for circuit 1, 2, and 3 respectively). c) Pointwise separation along a straight line between point 1 and point 2 of panel a).
such inputs. But even if the circuit can do this, we do not know whether a neural readout from such circuit would be able to produce given target outputs for these m inputs.
Therefore we propose here the linear separation property as a more suitable quantitative measure for evaluating the computational power of a neural microcircuit (or more precisely: the kernel-quality of a circuit; see below). To evaluate the linear separation property of a circuit C for m different inputs u1, . . . , um (which are in this article always functions of time, i.e. input streams such as for example multiple spike trains) we compute the rank of the n m matrix M whose columns are the circuit states xu (t i 0 ) resulting at some fixed time t0 for the preceding input stream ui. If this matrix has rank m, then it is guaranteed that any given assignment of target outputs yi R at time t0 for the inputs ui can be implemented by this circuit C (in combination with a linear readout). In particular, each of the 2m possible binary classifications of these m inputs can then be carried out by a linear readout from this fixed circuit C. Obviously such insight is much more informative than a demonstration that some particular classification task can be carried out by such circuit C. If the rank of this matrix M has a value r < m, then this value r can still be viewed as a measure for the computational power of this circuit C, since r is the number of "degrees of freedom" that a linear readout has in assigning target outputs yi to these inputs ui (in a way which can be made mathematically precise with concepts of linear algebra). Note that this rank-measure for the linear separation property of a circuit C may be viewed as an empirical measure for its kernel-quality, i.e. for the complexity and diversity of nonlinear operations carried out by C on its input stream in order to boost the classification power of a subsequent linear decision-hyperplane (see [Vapnik, 1998]).
Obviously the preceding measure addresses only one component of the computational per- formance of a neural circuit C. Another component is its capability to generalize a learnt computational function to new inputs. Mathematical criteria for generalization capability are derived in [Vapnik, 1998] (see ch. 4 of [Cherkassky and Mulier, 1998] for a compact ac- count of results relevant for our arguments). According to this mathematical theory one can quantify the generalization capability of any learning device in terms of the VC-dimension of the class H of hypotheses that are potentially used by that learning device.3 More pre-
3The VC-dimension (of a class H of maps H from some universe Suniv of inputs into {0, 1}) is defined as the size of the largest subset S Suniv which can be shattered by H. One says that S Suniv is shattered by H if for every map f : S {0, 1} there exists a map H in H such that H(u) = f (u) for all u S (this means that every possible binary classification of the inputs u S
cisely: if VC-dimension (H) is substantially smaller than the size of the training set Strain, one can prove that this learning device generalizes well, in the sense that the hypothesis (or input-output map) produced by this learning device is likely to have for new examples an error rate which is not much higher than its error rate on Strain, provided that the new examples are drawn from the same distribution as the training examples (see equ. 4.22 in [Cherkassky and Mulier, 1998]).
We apply this mathematical framework to the class HC of all maps from a set Suniv of inputs u into {0, 1} which can be implemented by a circuit C. More precisely: HC consists of all maps from Suniv into {0, 1} that a linear readout from circuit C with fixed internal parameters (weights etc.) but arbitrary weights w Rn of the readout (that classifies the circuit input u as belonging to class 1 if w xu(t0) 0, and to class 0 if w xu(t0) < 0) could possibly implement.
Whereas it is very difficult to achieve tight theoretical bounds for the VC-dimension of even much simpler neural circuits, see [Bartlett and Maass, 2003], one can efficiently estimate the VC-dimension of the class HC that arises in our context for some finite ensemble Suniv of inputs (that contains all examples used for training or testing) by using the following mathematical result (which can be proved with the help of Radon's Theorem):
Theorem 2.1 Let r be the rank of the n s matrix consisting of the s vectors xu(t0) for all inputs u in Suniv (we assume that Suniv is finite and contains s inputs). Then r VC-dimension(HC) r + 1.
We propose to use the rank r defined in Theorem 2.1 as an estimate of VC-dimension(HC ), and hence as a measure that informs us about the generalization capability of a neural microcircuit C. It is assumed here that the set Suniv contains many noisy variations of the same input signal, since otherwise learning with a randomly drawn training set Strain Suniv has no chance to generalize to new noisy variations. Note that each family of computational tasks induces a particular notion of what aspects of the input are viewed as noise, and what input features are viewed as signals that carry information which is rel- evant for the target output for at least one of these computational tasks. For example for computations on spike patterns some small jitter in the spike timing is viewed as noise. For computations on firing rates even the sequence of interspike intervals and temporal rela- tions between spikes that arrive from different input sources are viewed as noise, as long as these input spike trains represent the same firing rates. Examples for both families of computational tasks will be discussed in this article.
3 Models for generic cortical microcircuits
We test the validity of the proposed measures by comparing their predictions with direct evaluations of the computational performance for a large variety of models for generic cor- tical microcircuits consisting of 540 neurons. We used leaky-integrate-and-fire neurons4 and biologically quite realistic models for dynamic synapses.5 Neurons (20 % of which were randomly chosen to be inhibitory) were located on the grid points of a 3D grid of dimensions 6 6 15 with edges of unit length. The probability of a synaptic connection
can be carried out by some hypothesis H in H). 4Membrane voltage V dVm m modeled by m = -(V dt m -Vresting )+Rm (Isyn(t)+Ibackground + Inoise), where m = 30 ms is the membrane time constant, Isyn models synaptic inputs from other neurons in the circuits, Ibackground models a constant unspecific background input and Inoise models noise in the input. 5Short term synaptic dynamics was modeled according to [Markram et al., 1998], with distribu- tions of synaptic parameters U (initial release probability), D (time constant for depression), F (time constant for facilitation) chosen to reflect empirical data (see [Maass et al., 2002] for details).
from neuron a to neuron b was proportional to exp(-D2(a, b)/2), where D(a, b) is the Euclidean distance between a and b, and regulates the spatial scaling of synaptic connec- tivity. Synaptic efficacies w were chosen randomly from distributions that reflect biological data (as in [Maass et al., 2002]), with a common scaling factor Wscale.
8 0.7 b 4
a 2 3 0.65 1 0.7 scale 2 0.5 W 0.3 1 0.6
0 50 100 150 200 0 50 100 150 200 0.1 t [ms] t [ms] 0.05 0.5 1 1.4 2 3 4 6 8
Figure 2: Performance of different types of neural microcircuit models for classification of spike patterns. a) In the top row are two examples of the 80 spike patterns that were used (each consisting of 4 Poisson spike trains at 20 Hz over 200 ms), and in the bottom row are examples of noisy variations (Gaussian jitter with SD 10 ms) of these spike patterns which were used as circuit inputs. b) Fraction of examples (for 200 test examples) that were correctly classified by a linear readout (trained by linear regression with 500 training examples). Results are shown for 90 different types of neural microcircuits C with varying on the x-axis and Wscale on the y-axis (20 randomly drawn circuits and 20 target classification functions randomly drawn from the set of 280 possible classification functions were tested for each of the 90 different circuit types, and resulting correctness-rates were averaged. The mean SD of the results is 0.028.). Points 1, 2, 3 defined as in Fig. 1.
Linear readouts from circuits with n - 1 neurons were assumed to compute a weighted sum n-1 w i=1 ixi(t) + w0 (see section 1). In order to simplify notation we assume that the vector x(t) contains an additional constant component x0(t) = 1, so that one can write w x(t) instead of n-1 w i=1 ixi(t) + w0. In the case of classification tasks we assume that the readout outputs 1 if w x(t) 0, and 0 otherwise.
4 Evaluating the influence of synaptic connectivity on computational performance
Neural microcircuits were drawn from the distribution described in section 3 for 10 differ- ent values of (which scales the number and average distance of synaptically connected neurons) and 9 different values of Wscale (which scales the efficacy of all synaptic connec- tions). 20 microcircuit models C were drawn for each of these 90 different assignments of values to and Wscale. For each circuit a linear readout was trained to perform one (randomly chosen) out of 280 possible classification tasks on noisy variations u of 80 fixed spike patterns as circuit inputs u. The target performance of any such circuit input was to output at time t = 100 ms the class (0 or 1) of the spike pattern from which the preceding circuit input had been generated (for some arbitrary partition of the 80 fixed spike patterns into two classes. Each spike pattern u consisted of 4 Poisson spike trains over 200 ms. Per- formance results are shown in Fig. 2b for 90 different types of neural microcircuit models.
We now test the predictive quality of the two proposed measures for the computational power of a microcircuit on spike patterns. One should keep in mind that the proposed measures do not attempt to test the computational capability of a circuit for one particu- lar computational task, but for any distribution on Suniv and for a very large (in general infinitely large) family of computational tasks that only have in common a particular bias regarding which aspects of the incoming spike trains may carry information that is relevant for the target output of computations, and which aspects should be viewed as noise. Fig. 3a
explains why the lower left part of the parameter map in Fig. 2b is less suitable for any
8 8 8 a b c 20 4 450 4 450 4
2 400 2 400 2 3 15 1 350 1 350 1 0.7 0.7 0.7 scale 2 0.5 W 0.5 0.5 10 300 300 0.3 0.3 0.3 1 250 250 5 0.1 0.1 200 0.1 200 0.05 0.05 0.05 0 0.5 1 1.4 2 3 4 6 8 0.5 1 1.4 2 3 4 6 8 0.5 1 1.4 2 3 4 6 8
Figure 3: Values of the proposed measures for computations on spike patterns. a) Kernel-quality for spike patterns of 90 different circuit types (average over 20 circuits, mean SD = 13; For each circuit, the average over 5 different sets of spike patterns was used).6 b) Generalization capability for spike patterns: estimated VC-dimension of HC (for a set Suniv of inputs u consisting of 500 jittered versions of 4 spike patterns), for 90 different circuit types (average over 20 circuits, mean SD = 14; For each circuit, the average over 5 different sets of spike patterns was used). c) Difference of both measures (mean SD = 5.3). This should be compared with actual computational performance plotted in Fig. 2b. Points 1, 2, 3 defined as in Fig. 1.
such computation, since there the kernel-quality of the circuits is too low. Fig. 3b explains why the upper right part of the parameter map in Fig. 2b is less suitable, since a higher VC-dimension (for a training set of fixed size) entails poorer generalization capability. We are not aware of a theoretically founded way of combining both measures into a single value that predicts overall computational performance. But if one just takes the difference of both measures then the resulting number (see Fig. 3c) predicts quite well which types of neural microcircuit models perform well for the particular computational tasks considered in Fig. 2b.
5 Evaluating the computational power of neural microcircuit models in UP- and DOWN-states
Data from numerous intracellular recordings suggest that neural circuits in vivo switch be- tween two different dynamic regimes that are commonly referred to as UP- and DOWN states. UP-states are characterized by a bombardment with synaptic inputs from recurrent activity in the circuit, resulting in a membrane potential whose average value is signifi- cantly closer to the firing threshold, but also has larger variance. We have simulated these different dynamic regimes by varying the background current Ibackground and the noise current Inoise. Fig. 4a shows that one can simulate in this way different dynamic regimes of the same circuit where the time course of the membrane potential qualitatively matches data from intracellular recordings in UP- and DOWN-states (see e.g. [Shu et al., 2003]). We have tested the computational performance of circuits in 42 different dynamic regimes (for 7 values of Ibackground and 6 values of Inoise) with 3 complex nonlinear computations on firing rates of circuit inputs.7 Inputs u consisted of 4 Poisson spike trains with time- varying rates (drawn independently every 30 ms from the interval of 0 to 80 Hz for the first two and the second two of 4 input spike trains, see middle row of Fig. 4a for a sample). Let f1(t) (f2(t)) be the actual sum of rates normalized to the interval [0, 1] for the first
6The rank of the matrix consisting of 500 circuit states xu(t) for t = 200 ms was computed for 500 spike patterns over 200 ms as described in section 2, see Fig. 2a. 7Computations on firing rates were chosen as benchmark tasks both because UP states were con- jectured to enhance the performance for such tasks, and because we want to show that the proposed measures are applicable to other types of computational tasks than those considered in section 4.
16 a 100 UP-state [mV] 14 m 50 V 12 0
16 100 DOWN-state [mV] 14 m 50 V 12 0 300 350 400 450 500 350 400 450 500 t [ms] t [ms] b c d 10 10 10 120 6 70 6 6 0.2 4.5 UP 4.5 100 4.5 60 0.15 3.2 3.2 3.2 80 I noise 50 0.1 1.9 DOWN 1.9 60 1.9 40 0.05 40 30 0 0.6 0.6 20 0.6 11.5 12 12.5 13.5 14.3 11.5 12 12.5 13.5 14.3 11.5 12 12.5 13.5 14.3 e f g 10 10 10 0.25 6 6 6 0.3 4.5 0.7 4.5 4.5 0.2 3.2 3.2 3.2
I noise 0.6 1.9 1.9 0.15 1.9 0.25
0.5 0.1 0.2 0.6 0.6 0.6 11.5 12 12.5 13.5 14.3 11.5 12 12.5 13.5 14.3 11.5 12 12.5 13.5 14.3 I I I background background background
Figure 4: Analysis of the computational power of simulated neural microcircuits in different dy- namic regimes. a) Membrane potential (for a firing threshold of 15 mV) of two randomly selected neurons from circuits in the two parameter regimes marked in panel b), as well as spike rasters for the same two parameter regimes (with the actual circuit inputs shown between the two rows). b) Estimates of the kernel-quality for input streams u with 34 different combinations of firing rates from 0, 20, 40 Hz in the 4 input spike trains (mean SD = 12). c) Estimate of the VC-dimension for a set Suniv of inputs consisting of 200 different spike trains u that represent 2 different combinations of firing rates (mean SD = 4.6). d) Difference of measures from panels b and c (after scaling each lin- early into a common range [0,1]). e), f), g): Evaluation of the computational performance (correlation coefficient; all for test data; mean SD is 0.06, 0.04, and 0.03 for panels e), f), and g) respectively.) of the same circuits in different dynamic regimes for computations involving multiplication and abso- lute value of differences of firing rates (see text). The theoretically predicted parameter regime with good computational performance for any computations on firing rates (see panel d) agrees quite well with the intersection of areas with good computational performance in panels e, f, g.
two (second two) input spike trains computed from the time interval [t - 30ms, t]. The computational tasks considered in Fig. 4 were to compute online (and in real-time) every 30 ms the functions f1(t) f2(t) (see panel e), to decide whether the value of the product f1(t) f2(t) lies in the interval [0.1, 0.3] or lies outside of this interval (see panel f), and to decide whether the absolute value of the difference f1(t) - f2(t) is greater than 0.25 (see panel g).
We wanted to test whether the proposed measures for computational power and general- ization capability were able to make reasonable predictions for this completely different parameter map, and for computations on firing rates instead of spike patterns. It turns out that also in this case the kernel-quality (Fig. 4b) explains why circuits in the dynamic regime corresponding to the left-hand side of the parameter map have inferior computa- tional power for all three computations on firing rates (see Fig. 4 e,f,g). The VC-dimension (Fig. 4c) explains the decline of computational performance in the right part of the pa- rameter map. The difference of both measures (Fig. 4d) predicts quite well the dynamic regime where high performance is achieved for all three computational tasks considered in Fig. 4 e,f,g. Note that Fig. 4e has high performance in the upper right corner, in spite of a very high VC-dimension. This could be explained by the inherent bias of linear readouts
to compute smooth functions on firing rates, which fits particularly well to this particular target output.
If one estimates kernel-quality and VC-dimension for the same circuits, but for computa- tions on sparse spike patterns (for an input ensemble Suniv similarly as in section 4), one finds that circuits at the lower left corner of this parameter map (corresponding to DOWN- states) are predicted to have better computational performance for these computations on sparse input. This agrees quite well with direct evaluations of computational performance (not shown). Hence the proposed quantitative measures may provide a theoretical founda- tion for understanding the computational function of different states of neural activity.