Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Richard Zemel, Rama Natarajan, Peter Dayan, Quentin Huys
As animals interact with their environments, they must constantly update estimates about their states. Bayesian models combine prior probabil- ities, a dynamical model and sensory evidence to update estimates op- timally. These models are consistent with the results of many diverse psychophysical studies. However, little is known about the neural rep- resentation and manipulation of such Bayesian information, particularly in populations of spiking neurons. We consider this issue, suggesting a model based on standard neural architecture and activations. We illus- trate the approach on a simple random walk example, and apply it to a sensorimotor integration task that provides a particularly compelling example of dynamic probabilistic computation.
Bayesian models have been used to explain a gamut of experimental results in tasks which require estimates to be derived from multiple sensory cues. These include a wide range of psychophysical studies of perception;13 motor action;7 and decision-making.3, 5 Central to Bayesian inference is that computations are sensitive to uncertainties about afferent and efferent quantities, arising from ignorance, noise, or inherent ambiguity (e.g., the aperture problem), and that these uncertainties change over time as information accumulates and dissipates. Understanding how neurons represent and manipulate uncertain quantities is therefore key to understanding the neural instantiation of these Bayesian inferences.
Most previous work on representing probabilistic inference in neural populations has fo- cused on the representation of static information.1, 12, 15 These encompass various strategies for encoding and decoding uncertain quantities, but do not readily generalize to real-world dynamic information processing tasks, particularly the most interesting cases with stim- uli changing over the same timescale as spiking itself.11 Notable exceptions are the re- cent, seminal, but, as we argue, representationally restricted, models proposed by Gold and Shadlen,5 Rao,10 and Deneve.4
In this paper, we first show how probabilistic information varying over time can be repre- sented in a spiking population code. Second, we present a method for producing spiking codes that facilitate further processing of the probabilistic information. Finally, we show the utility of this method by applying it to a temporal sensorimotor integration task.
1 TRAJECTORY ENCODING AND DECODING
We assume that population spikes R(t) arise stochastically in relation to the trajectory X(t) of an underlying (but hidden) variable. We use RT and XT for the whole trajectory and
spike trains respectively from time 0 to T . The spikes RT constitute the observations and are assumed to be probabilistically related to the signal by a tuning function f (X, i):
P (R(i, T )|X(T )) f (X, i) (1)
for the spike train of the ith neuron, with parameters i. Therefore, via standard Bayesian inference, RT determines a distribution over the hidden variable at time T , P (X(T )|RT ).
We first consider a version of the dynamics and input coding that permits an analytical examination of the impact of spikes. Let X(t) follow a stationary Gaussian process such that the joint distribution P (X(t1), X(t2), . . . , X(tm)) is Gaussian for any finite collection of times, with a covariance matrix which depends on time differences: Ctt = c(|t - t |). Function c(|t|) controls the smoothness of the resulting random walks. Then,
P (X(T )|RT ) p(X(T )) dX(T )P (R X(T ) T |X(T ))P (X(T )|X (T )) (2)
where P (X(T )|X(T )) is the distribution over the whole trajectory X(T ) conditional on the value of X(T ) at its end point. If RT are a set of conditionally independent inhomoge- neous Poisson processes, we have
P (RT |X(T )) f (X(t d f (X( ), i i ), i) exp - i i) , (3)
where ti are the spike times of neuron i in RT . Let = [X(ti )] be the vector of stimulus positions at the times at which we observed a spike and = [(ti )] be the vector of spike positions. If the tuning functions are Gaussian f (X, i) exp(-(X - i)2/22) and sufficiently dense that d f (X, i i) is independent of X (a standard assumption in population coding), then P (RT |X(T )) exp(- - 2/22) and in Equation 2, we can marginalize out X(T ) except at the spike times ti :
P (X(T )|RT ) p(X(T )) d exp -[, X(T )]T C-1 [, X(T )] - - 2 (4) 2 22
and C is the block covariance matrix between X(ti ), x(T ) at the spike times [tt ] and the final time T . This Gaussian integral has P (X(T )|RT ) N ((T ), (T )), with
(T ) = CT t(Ctt + I2)-1 = k (T ) = CT T - kCtT (5)
CT T is the T, T th element of the covariance matrix and CT t is similarly a row vector. The dependence in on past spike times is specified chiefly by the inverse covariance matrix, and acts as an effective kernel (k). This kernel is not stationary, since it depends on factors such as the local density of spiking in the spike train RT .
For example, consider where X(t) evolves according to a diffusion process with drift:
dX = -Xdt + dN (t) (6)
where prevents it from wandering too far, N (t) is white Gaussian noise with mean zero and 2 variance. Figure 1A shows sample kernels for this process.
Inspection of Figure 1A reveals some important traits. First, the monotonically decreasing kernel magnitude as the time span between the spike and the current time T grows matches the intuition that recent spikes play a more significant role in determining the posterior over X(T ). Second, the kernel is nearly exponential, with a time constant that depends on the time constant of the covariance function and the density of the spikes; two settings of these parameters produced the two groupings of kernels in the figure. Finally, the fully adaptive kernel k can be locally well approximated by a metronomic kernel k
Kernels k and ks Variance ratio Full kernel A B D -0.5
-2 10 10 2 / 0 -4 2 5 Space 10
Kernel size (weight) 0 0.5 0 0.03 0.06 0.09 0.04 0.06 0.08 0.1 t-tspike Time C True stimulus and means Regular, stationary kernel E 0.5 -0.5
0 0 Space Space
-0.5 0.5 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Time Time
Figure 1: Exact and approximate spike decoding with the Gaussian process prior. Spikes are shown in yellow, the true stimulus in green, and P (X(T )|RT ) in gray. Blue: exact inference with nonstationary and red: approximate inference with regular spiking. A Ker- nel samples for a diffusion process as defined by equations 5, 6. B, C: Mean and variance of the inference. D: Exact inference with full kernel k and E: approximation based on metronomic kernel k
the form of decoder used for the network model in the next section.6 Figure 1D shows an example of how well Equation 5 specifies a distribution over X(t) through very few spikes.
Finally, 1E shows a factorized approximation with the stationary kernel similar to that used by Hinton and Brown6 and in our recurrent network:
^ t P (X(t)|R(t)) f (X, kst j=0 j ij = exp(-E(X(t), R(t), t)), (7) i i)
By design, the mean is captured very well, but not the variance, which in this example grows too rapidly for long interspike intervals (Figure 1B, C). Using a slower kernel im- proves performance on the variance, but at the expense of the mean. We thus turn to the net- work model with recurrent connections that are available to reinstate the spike-conditional characteristics of the full kernel.
2 NETWORK MODEL FORMULATION
Above we considered how population spikes RT specify a distribution over X(T ). We now extend this to consider how interconnected populations of neurons can specify distributions over time-varying variables. We frame the problem and our approach in terms of a two-level network, connecting one population of neurons to another; this construction is intended to apply to any level of processing. The network maps input population spikes R(t) to output population spikes S(t), where input and output evolve over time. As with the input spikes, ST indicates the output spike trains from time 0 to T , and these output spikes are assumed to determine a distribution over a related hidden variable.
For the recurrent and feedforward computation in the network, we start with the de- ceptively simple goal9 of producing output spikes in such a way that the distribution Q(X(T )|ST ) they imply over the same hidden variable X(T ) as the input, faithfully matches P (X(T )|RT ). This might seem a strange goal, since one could surely just lis- ten to the input spikes. However, in order for the output spikes to track the hidden variable, the dynamics of the interactions between the neurons must explicitly capture the dynamics
of the process X(T ). Once this `identity mapping' problem has been solved, more general, complex computations can be performed with ease. We illustrate this on a multisensory integration task, tracking a hidden variable that depends on multiple sensory cues.
The aim of the recurrent network is to take the spikes R(t) as inputs, and produce output spikes that capture the probabilistic dynamics. We proceed in two steps. We first consider the probabilistic decoding process which turns ST into Q(X(t)|ST ). Then we discuss the recurrent and feedforward processing that produce appropriate ST given this decoder. Note that this decoding process is not required for the network processing; it instead provides a computational objective for the spiking dynamics in the system.
We use a simple log-linear decoder based on a spatiotemporal kernel:6
Q(X(T )|ST ) exp(-E(X(T ), ST , T )) , where (8) E(X, S T T , T ) = S(j, T - ) j =0 j (X, ) (9)
is an energy function, and the spatiotemporal kernels are assumed separable: j(X, ) = gj(X)( ). The spatial kernel gj(X) is related to the receptive field f (X, j) of neuron j and the temporal kernel j(X, ) to k