{"title": "Bayesian Inference for Spiking Neuron Models with a Sparsity Prior", "book": "Advances in Neural Information Processing Systems", "page_first": 529, "page_last": 536, "abstract": null, "full_text": "Bayesian Inference for Spiking Neuron Models\n\nwith a Sparsity Prior\n\nSebastian Gerwinn\n\nJakob H Macke\n\nMatthias Seeger\n\nMatthias Bethge\n\nSpemannstrasse 41\n\nMax Planck Institute for Biological Cybernetics\n\n{firstname.surname}@tuebingen.mpg.de\n\n72076 Tuebingen, Germany\n\nAbstract\n\nGeneralized linear models are the most commonly used tools to describe the stim-\nulus selectivity of sensory neurons. Here we present a Bayesian treatment of such\nmodels. Using the expectation propagation algorithm, we are able to approximate\nthe full posterior distribution over all weights. In addition, we use a Laplacian\nprior to favor sparse solutions. Therefore, stimulus features that do not critically\nin\ufb02uence neural activity will be assigned zero weights and thus be effectively\nexcluded by the model. This feature selection mechanism facilitates both the in-\nterpretation of the neuron model as well as its predictive abilities. The posterior\ndistribution can be used to obtain con\ufb01dence intervals which makes it possible\nto assess the statistical signi\ufb01cance of the solution. In neural data analysis, the\navailable amount of experimental measurements is often limited whereas the pa-\nrameter space is large. In such a situation, both regularization by a sparsity prior\nand uncertainty estimates for the model parameters are essential. We apply our\nmethod to multi-electrode recordings of retinal ganglion cells and use our uncer-\ntainty estimate to test the statistical signi\ufb01cance of functional couplings between\nneurons. Furthermore we used the sparsity of the Laplace prior to select those\n\ufb01lters from a spike-triggered covariance analysis that are most informative about\nthe neural response.\n\n1 Introduction\n\nA central goal of systems neuroscience is to identify the functional relationship between environ-\nmental stimuli and a neural response. Given an arbitrary stimulus we would like to predict the neural\nresponse as well as possible. In order to achieve this goal with limited amount of data, it is essential\nto combine the information in the data with prior knowledge about neural function. To this end,\ngeneralized linear models (GLMs) have proven to be particularly useful as they allow for \ufb02exible\nmodel architectures while still being tractable for estimation.\nThe GLM neuron model consists of a linear \ufb01lter, a static nonlinear transfer function and a Poisson\nspike generating mechanism. To determine the neural response to a given stimulus, the stimulus\nis \ufb01rst convolved with the linear \ufb01lter (i.e. the receptive \ufb01eld of the neuron). Subsequently, the\n\ufb01lter output is converted into an instantaneous \ufb01ring rate via a static nonlinear transfer function,\nand \ufb01nally spikes are generated from an inhomogeneous Poisson-process according to this \ufb01ring\nrate. Note, however, that the GLM neuron model is not limited to describe neurons with Poisson\n\ufb01ring statistics. Rather, it is possible to incorporate in\ufb02uences of its own spiking history on the\nneural response. That is, the \ufb01ring rate is then determined by a combination of both the external\n\n1\n\n\fstimulus and the spiking-history of the neuron. Thus, the model can account for typical effects\nsuch as refractory periods, bursting behavior or spike-frequency adaptation. Last but not least, the\nGLM neuron model can also be applied for populations of coupled neurons by making each neuron\ndependent not only on its own spiking activity but also on the spiking history of all the other neurons.\nIn previous work (Pillow et al., 2005; Chornoboy et al., 1988; Okatan et al., 2005) it has been\nshown how point-estimates of the GLM-parameters can be obtained using maximum-likelihood (or\nmaximum a posteriori (MAP)) techniques. Here, we extend this approach one step further by using\nBayesian inference methods in order to obtain an approximation to the full posterior distribution,\nrather than point estimates. In particular, the posterior determines con\ufb01dence intervals for every\nlinear weight, which facilitates the interpretation of the model and its parameters. For example, if\na weight describes the strength of coupling between two neurons, then we can use these con\ufb01dence\nintervals to test whether this weight is signi\ufb01cantly different from zero. In this way, we can readily\ndistinguish statistical signi\ufb01cant interactions between neurons from spurious couplings.\nAnother application of the Bayesian GLM neuron model arises in the context of spike-triggered\ncovariance analysis. Spike-triggered covariance basically employs a quadratic expansion of the\nexternal stimulus parameter space and is often used in order to determine the most informative\nsubspace. By combining spike-triggered covariance analysis with the Bayesian GLM framework,\nwe will present a new method for selecting the \ufb01lters of this subspace.\nFeature selection in the GLM neuron model can be done by the assumption of a Laplace prior over\nthe linear weights, which naturally leads to sparse posterior solutions. Consequently, all weights\nare equally strongly pushed to zero. This contrasts the Gaussian prior which pushes weights to zero\nproportional to their absolute value. In this sense, the Laplace prior can also be seen as an ef\ufb01cient\nregularizer, which is well suited for the situation when a large range of alternative explanations for\nthe neural response shall be compared on the basis of limited data. As we do not perform gradient\ndescent on the posterior, differentiability of the posterior is not required.\nThe paper is organized as follows: In section 2, we describe the model, and the \u201cexpectation prop-\nagation\u201d algorithm (Minka, 2001; Opper & Winther, 2000) used to \ufb01nd the approximate posterior\ndistribution. In section 3, we estimate the receptive \ufb01elds, spike-history effects and functional cou-\nplings of a small population of retinal ganglion cells. We demonstrate that for small training sets,\nthe Laplace-prior leads to superior performance compared to a Gaussian-prior, which does not lead\nto sparse solutions. We use the con\ufb01dence intervals to test whether the functional couplings between\nthe neurons are signi\ufb01cant.\nIn section 4, we use the GLM neuron model to describe a complex cell response recorded in macaque\nprimary visual cortex: After computing the spike-triggered covariance (STC) we determine the\nrelevant stimulus subspace via feature selection in our model. In contrast to the usual approach, the\nselection of the subspace in our case becomes directly linked to an explicit neuron model which also\ntakes into account the spike-history dependence of the spike generation.\n\n2 Generalized Linear Models and Expectation Propagation\n\n2.1 Generalized Linear Models\nLet Xt \u2208 Rd, t \u2208 [0, T ] denote a time-varying stimulus and Di = {ti,j} the spike-times of i =\n1, . . . , n neurons. Here Xt consists of the sensory input at time t and can include preceeding input\nframes as well. We assume that the stimulus can only change at distinct time points, but can be\nevaluated at continous time t. We would like to incorporate spike-history effects, couplings between\nneurons and dependence on nonlinear features of the stimulus. Therefore, we describe the effective\ninput to a neuron via the following feature-map:\n\n\u03c8(t) = \u03c8st (Xt)M\n\n\u03c8sp({ti,j \u2208 Di : ti,j < t}),\n\ni\n\nwhere \u03c8sp represents the spike time history and \u03c8st the possibly nonlinear feature map for the\nstimulus. That is, the complete feature vector \u03c8 contains possibly nonlinear features of the stimulus\nand the spike history of every neuron. Any feature which is causal in the sense that it does not\ndepend on future events can be used. We model the spike history dependence by a set of small time\n\n2\n\n\fwindows [t \u2212 \u03c4l, t \u2212 \u03c40\n\nl ) in which occuring spikes are counted.\n\n(\u03c8sp,i({ti,j \u2208 Di : ti,j < t}))l = X\n\n1[t\u2212\u03c4l,t\u2212\u03c40\n\nl )(ti,j)\n\n,\n\nj:ti,j <t\n\nwhere 1[a,b)(t) denotes the indicator function which is one if t \u2208 [a, b) and zero otherwise. In other\nwords, for each neuron there is a set of windows l = 1, . . . , L with time-lags \u03c4l and width \u03c4l \u2212 \u03c40\nl\ndescribing its spiking history. More precisely, the rate can only change if the stimulus changes\nor a spike leaves or enters one of these windows. Thus, we obtain a sequence of changepoints\n0 = \u02dct0 < \u02dct1 < \u00b7\u00b7\u00b7 < \u02dctj < \u00b7\u00b7\u00b7 < T , where each feature \u03c8i(t) is constant in [\u02dctj\u22121, \u02dctj), attaining the\nvalue \u03c8i,j. In the GLM neuron model setting the instantanious \ufb01ring rate of neuron i is obtained by\na linear \ufb01lter of the feature map:\n\np(spike|Xt,{ti,j \u2208 D : ti,j < t}) = \u03bb(wT\n\ni \u03c8(t)),\n\n(1)\n\nwhere \u03bb is the nonlinear transfer function. Following general point process theory (Snyder & Miller,\n1991) and using the fact that the features stay constant between two changepoints we can write down\n\nthe likelihood P (D|{w}) =Qn\nLi(wi) \u221d Y\n\ni=1 Li(wi), where each Li(wi) has the form\n\n\u03c6i,j(ui,j),\n\nui,j = wT\n\ni \u03c8j,\n\nP\n\u03c6i,j(ui,j) = \u03bbi(ui,j)\n\nj\n\nt\u2208Di\n\n\u03b4(t\u2212\u02dctj ) exp(\u2212\u03bbi(ui,j)(\u02dctj \u2212 \u02dctj\u22121))\n\n.\n\nThe function \u03b4(.) in the second equation is de\ufb01ned to be one if and only if its argument equals zero.\nThe sum therfore is 1 iff a spike of neuron i occurs at changepoint \u02dctj. Note that the changepoints\n\u02dctj depend on the spikes and therefore, the process is not Poissonian, as it might be suggested by the\nfunctional form of the likelihood.\nAs it has been shown in (Paninski, 2004), the likelihood is log-concave in wi if \u03bbi(\u00b7) is both convex\nand log-concave. We are using the transfer function \u03bbi(u) = eu which, in particular, gives rise to a\nlog-linear point process model. Alternatively, one could also use \u03bbi(u) = eu1u<0 + (1 + u)1u\u22650,\nwhich grows only linearly (cf. Harris et al. (2003); Pillow et al. (2005)).\nWhile we require all rates \u03bbi(t) to be piecewise constant, it should be noted that we do not restrict\nourselves to a uniform quantization of the time axis. In this way, we achieve an ef\ufb01cient architecture\nfor which the density of change points automatically adapts to the speed with which the input signal\nis changing.\nThe choice of the prior distribution can play a central role when coping with limited amount of data.\nWe use a Laplace prior distribution over the weights in order to favor sparse solutions over those\nwhich explain the data equally well but require more weights different from zero (c.f. Tibshirani\n(1996)):\n\ne\u2212\u03c1k|wk,i|.\n(2)\nk\n2 exp(\u2212\u03c1k|ui,k|) with \u03c8k = (1l=k)l and ui,k =\nThus, prior factors have the form \u03c6i,k(ui,k) = \u03c1k\nwT\nof the stimulus-dependent\nfeatures to be different from the variance of the spike-history features. The posterior takes the form:\n\ni \u03c8k as above. In our applications, we allowed the prior variance 2\n\u03c12\nk\n\nP (wi) \u221dY\n\nP (w|D) \u221d Y\n\n\u03c6i,j(ui,j),\n\ni,j\n\nwhere each \u03c6i,j individually instantiates a Generalized Linear Model (either corresponding to a\nlikelihood factor or to a prior factor). As the posterior factorizes over neurons, we can perform\nour analysis for each neuron seperately. Therefore, for simplicity we drop the subscript i in the\nfollowing.\nOur model does not assume or require any speci\ufb01c stimulus distribution.\nIn particular, it is not\nlimited to white noise stimuli or elliptically contoured distributions but it can be used without mod-\ni\ufb01cation for other stimulus distributions such as natural image sequences. Finally, this framework\nallows exact sampling of spike trains due to the piecewise constant rate.\n\n3\n\n\f2.2 Expectation Propagation\n\nAs exact Bayesian inference is intractable in our model, we seek to \ufb01nd a good approximation to the\nfull posterior. In our case all likelihood and prior factors are log-concave. Therefore, the posterior is\nunimodal and a Gaussian approximation is well suited. A frequently used technique for this purpose\nis the Laplace-approximation which computes a quadratic approximation to the log-posterior based\non the Hessian around the maximum. For the Laplacian prior, however, this approach falls short\nsince the distribution is not differentiable at zero. Instead, we employ the Expectation Propagation\n(EP) algorithm (Minka, 2001; Opper & Winther, 2000). In this approximation technique, each factor\n(also called site) \u03c6j of the posterior is replaced by an unnormalised Gaussian:\n\nN U (uj|bj, \u03c0j) = exp(\u22121\n\n\u03c0j \u2265 0\n\nj + bjuj) =: \u02c6\u03c6(uj),\n\n2 \u03c0ju2\n\nj\n\nLeibler divergence between the full posterior P (w|D) and the approximation, Q(w) \u2248Q\n\nwhere the bj, \u03c0j are called the site parameters. The approximation aims at minimizing the Kullback-\n\u02c6\u03c6(uj).\nThe log-concavity of the model implies that all \u03c0j \u2265 0, which supports the numerical stability of\nthe EP algorithm. Some of the \u03c0j may even be 0, as long as Q(w) is a (normalizable) Gaussian. An\nEP update at j consists of computing the Gaussian cavity distribution Q\\j \u221d Q \u02c6\u03c6\u22121\nand the non-\nGaussian tilted distribution \u02c6P \u221d Q\\j\u03c6j, then updating bj, \u03c0j such that the new Q0 has the same\nmean and covariance as \u02c6P (moment matching). This is iterated in random order over the sites until\nconvergence.\nWe omit the detailed update schemes here and refer to (Seeger et al., 2007; Seeger, 2005). Conver-\ngence guarantees for EP applied to non-Gaussian log-concave models have not been shown so far.\nNevertheless it is reported that at least in the log-concave case EP behaves stable (e.g., Rasmussen\n& Williams (2006)), and we observe quick convergence in our case ( \u2264 20 iterations over all sites\nare required). The model still contains hyperparameters, namely the prior variances 2\n. In each ex-\n\u03c12\nk\nperiment, these were determined via a standard crossvalidation procedure (80% training data, 10%\nvalidation, 10% test).\n\nj\n\n3 Modeling retinal ganglion cells: Which cells are functionally coupled?\n\nt\ne\ns\n-\nt\ns\ne\nt\nn\no\n\nWe applied the GLM neuron model to multi-electrode recordings of three rabbit retinal ganglion\ncells. The stimulus consisted of 32767 frames each of which showing a random 16\u00d716 checkerboard\npattern with a refresh rate of 50 Hz (data provided by G. Zeck, see (Zeck et al., 2005)).\nFirst,\nin order to investigate the role of the\nLaplace prior, we trained a single cell GLM neu-\nron model on datasets of different sizes with ei-\nther a Laplace prior or a Gaussian prior. The\nmodels, which have the same number of param-\neters, were compared by evaluating their nega-\ntive log-likelihood on an independent test set. As\ncan be seen on the right the choice of prior be-\ncomes less important for large training sets as the\nweights are suf\ufb01ciently constrained by the data.\nFor each training set size a separate crossvalida-\ntion was carried out. Errorbars were obtained by\ndrawing 100 samples from the posterior.\nFig. 1 shows the spatiotemporal receptive \ufb01eld of each neuron, as well as the \ufb01lters describing\nthe in\ufb02uence of spiking history and input from other cells. For conciseness, we only plot the \ufb01lters\nfor 80 and 120 ms time lags, but the \ufb01tted model included 60 and 140 ms time lags as well. The\nstrongly positive weights on the diagonal of \ufb01gure 1(c) for the spiking history can be interpreted\nas \u201cself-excitation\u201d. In this way, it is possible to model the bursting behavior exhibited by the cells\nin our recordings (see also Fig. 2). The strongly negative weights at small time lags represent re-\nfractory periods. The red lines correspond to 3 standard deviations of the posterior. The \ufb01rst neuron\nseems to elicit \u201dbursts\u201d at lower frequencies. Note the different scaling of the y-axis for diagonal and\noff-diagonal terms. By analyzing the coupling terms, we can see that there is signi\ufb01cant interaction\n\ntraining data-set size (% of complete dataset)\n\ne\nr\no\nc\ns\nd\no\no\nh\ni\nl\ne\nk\ni\nl\n\ng\no\nl\n\n.\n\ng\ne\nn\n\n4\n\n\fbetween cells 2 and 3, but not between any other pair of cells. As our prior assumption is that the\ncouplings are 0, this interaction-term is not merely a consequence of our choice of prior. As a result\nof our crossvalidation it turns out that the prior variance for spike history weights should be set to\nvery large values (\u03c1= 0.1, variance = 2 1\n\u03c12 ) meaning that these are well determinated by the data. In\ncontrast, prior variances for the stimulus weights should be more strongly biased towards zero (\u03c1 =\n150).\n\n(a) GLM\n\n(b) STA\n\n(c)\n\nFigure 1: (a): Stimulus dependence inferred by the GLM for the three neurons (columns) at different\ntime lags (rows). 2 of 4 time lags are plotted (60, 140 ms not shown). (b): Spike-triggered average\nfor the same neurons and time lags as in (a). (c): Causal dependencies between the three neurons.\nEach plot shows the value of the linear weight as a function of increasing time lag \u03c4l (in ms). Shown\nare posterior mean and three std. dev. (indicated in red). Different scaling of the y-axis is used for\ndiagonal and off-diagonal plots.\n\nNeuron 1\nNeuron 2\nNeuron 3\n\nMean\n\nSTA\n0.2199\n0.1746\n0.1828\n0.1924\n\nGLM GLM with couplings\n0.2442\n0.2348\n0.3319\n0.2703\n\n0.3576\n0.3320\n0.4202\n0.3699\n\nTable 1: Predictions performance of different models. Entries correspond to the correlation coef-\n\ufb01cient between the predicted rate of each model and spikes on a test set. Both rate and spikes are\nbinned in 5 ms bins. The \ufb01rst GLM models neither connections nor self-feedback.\n\nBecause of the regularization by the prior the spatio-temporal receptive \ufb01elds are much smoother\nthan the spike-triggered average ones, see Fig. 1(a). The receptive \ufb01elds of the STA seems to be\n\n5\n\n\fFigure 2: Predicted rate for the GLM neuron model with and without any spike history and the\npredicted rate for the STA for the same neurons as in the other plots. For the STA the linear response\nis recti\ufb01ed. Rate for the GLM with spike dependence is obtained by averaging over 1000 sampled\nspike-trains. Rates are rescaled to have the same standard deviation.\n\nmore smeared out which might be due to the fact that it cannot model bursting behavior. The more\nconservative estimate of the sparse neuron model should increase the prediction performance. To\nverify this, we calculated the linear response from the spike-triggered average and the rate of our\nGLM neuron model. In order to have the same number of parameters we neglected all connections.\nAs a model free performance measure we used the correlation coef\ufb01cient between the spike trains\nand the rates (each are binned in 5 ms bins). For the GLM with couplings, rates were estimated\nby sampling 1000 spike trains with the posterior mean as linear weights. As our model explicitly\nincludes the nonlinearity during \ufb01tting, the rate is more sharply peaked around the spikes, see Fig. 2.\nThe prediction performance can be increased even further by modeling couplings between neurons\nas summarized in Tab. 1.\n\n4 Modeling complex cells: How many \ufb01lters do we need?\n\nComplex cells in primary visual cortex exhibit strongly nonlinear response properties which cannot\nbe well described by a single linear \ufb01lter, but rather requires a set of \ufb01lters. A common approach\nfor \ufb01nding these \ufb01lters is based on the covariance of the spike-triggered ensemble: Eigenvectors of\neigenvalues that are much bigger (or smaller) than the eigenvalues of the whole stimulus ensemble\nindicate directions in stimulus space to which the cell is sensitive to. Usually, a statistical hypothesis\ntest on the eigenvalue-spectrum is used to decide how many of the eigenvectors ei are needed to\nmodel the cells (Simoncelli et al., 2004; Touryan et al., 2002; Rust et al., 2005; Steveninck & Bialek,\n1988). Here, we take a different approach: We use the con\ufb01dence intervals of our GLM neuron\nmodel to determine the relevant dimensions within the subspace revealed by STC. We \ufb01rst apply\nSTC to \ufb01nd the space spanned by a set of eigenvectors that is substantially larger than the expected\ndimensionality of the relevant subspace. Next, we \ufb01t a nonlinear function ni to the \ufb01lter-outputs\nfi(Xt) = hXt, eii. Finally, we linearly combine the ni(t), resulting in a model of the same form as\nequation (1) with (\u03c8st)i(Xt) = ni(fi(Xt))\n\n6\n\n\f(a)\n\n(b)\n\nFigure 3: (a): 24 out of 40 Filters estimated by STC. The \ufb01lters are ordered according to their log-\nratio of their eigenvalue to the corresponding eigenvalue of the complete stimulus ensemble (from\nleft to right). Highlighted \ufb01lter are those with signi\ufb01cant non-zero weights, red indicating excitatory\nand blue inhibitory \ufb01lters. (b) Upper: Posterior mean +/- 3 std. dev. Filter indices are ordered in the\nsame way as in (a). Lower: Predicted rate on a test set for STC and for the GLM neuron model with\nspike history dependence on a test set.\n\nAs the model is linear in the weights wi, we can use the GLM neuron model to \ufb01t these weights\nand obtain con\ufb01dence intervals. If a \ufb01lter fi(t) is not needed for explaining the cells response, its\ncorresponding weight wi will automatically be set to zero by the model due to the sparsity prior.\nThis provides an alternative, model-based method of determining the number of \ufb01lters required to\nmodel the cell. The signi\ufb01cance of each \ufb01lter is not determined by a separate hypothesis test on\nthe spectrum of the spike-triggered covariance, but rather by assessing its in\ufb02uence on the neural\nactivity within the full model.\nAs in the previous application, we can model the spike history effects with an additional feature\nvector \u03c8sp to take into account temporal dynamics of single neurons or couplings.\nBefore applying our method to real data, we tested it on data generated from an arti\ufb01cial complex cell\nsimilar to the one in (Rust et al., 2005). On this simulated data we were able to recover the original\n\ufb01lters. We then \ufb01tted this GLM neuron model to data recorded from a complex cell in primary visual\ncortex of an anesthetized macaque monkey (same data as in (Rust et al., 2005)). We \ufb01rst extracted\n40 \ufb01lters which eigenvalues were most different to their corresponding eigenvalues of the complete\nstimulus ensemble. Any nonlinear regression procedure could be used to \ufb01t a nonlinearity to each\n\ufb01lter output. We used a simple quadratic regression technique. Having \ufb01xed the \ufb01rst nonlinearity\nwe approximated the posterior as above. The resulting con\ufb01dence intervals for the linear weights\nare plotted in Fig. 3(b). The \ufb01lters with signi\ufb01cant non-zero weights are highlighted in Fig. 3(a).\nRed indicates exitatory and blue inhibitory effects on the \ufb01ring rate. Using 3 std. dev. con\ufb01dence\nintervals 9 excitatory and 8 inhibitory \ufb01lters turned out to be signi\ufb01cant in our model. The number\nof \ufb01lters is similar to that reportet in Rust et al., who regarded 7 excitatory and 7 inhibitory \ufb01lters as\nsigni\ufb01cant (Rust et al., 2005). The rank order of the linear weights is closely related but not identical\nto the order of eigenvalues, as can be seen in Fig. 3(b), top.\n\n5 Summary and Conclusions\n\nWe have shown how approximate Bayesian inference within the framework of generalized linear\nmodels can be used to address the problem of identifying relevant features of neural data. More\nprecisely, the use of a sparsity prior favors sparse posterior solutions: non-zero weights are assigned\nonly to those features which which are critical for explaining the data. Furthermore, the explicit\n\n7\n\n\funcertainty information obtained from the posterior distribution enables us to identify ranges of sta-\ntistical signi\ufb01cance and therefore facilitates the interpretation of the solution. We used this technique\nto determine couplings between neurons in a multi-cell recording and demonstrated an increase in\nprediction performance due to regularization by the sparsity prior. Also, in the context of spike-\ntriggered covariance analysis, we used our method to determine the relevant stimulus subspace\nwithin the space spanned by the eigenvectors. Our subspace selection method is directly linked\nto an explicit neuron model which also takes into account the spike-history dependence of the spike\ngeneration.\n\nAcknowledgements\n\nWe would like to thank G\u00a8unther Zeck and Nicole Rust for generously providing their data and for\nuseful discussions.\n\nReferences\nChornoboy, E., Schramm, L., & Karr, A.(1988). Maximum likelihood identi\ufb01cation of neural point\n\nprocess systems. Biological Cybernetics, 59, 265-275.\n\nHarris, K., Csicsvari, J., Hirase, H., Dragoi, G., & Buzsaki, G. (2003). Organization of cell assem-\n\nblies in the hippocampus. Nature, 424(6948), 552\u20136.\n\nMinka, T. (2001). Expectation propagation for approximate Bayesian inference. Uncertainty in\n\nArti\ufb01cial Intelligence, 17, 362\u2013369.\n\nOkatan, M., Wilson, M. A., & Brown, E. N. (2005). Analyzing functional connectivity using a\nnetwork likelihood model of ensemble neural spiking activity. Neural Computation, 17, 1927-\n1961.\n\nOpper, M., & Winther, O. (2000). Gaussian Processes for Classi\ufb01cation: Mean-Field Algorithms.\n\nNeural Computation, 12(11), 2655-2684.\n\nPaninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding\n\nmodels. Network, 15(4), 243\u2013262.\n\nPillow, J. W., Paninski, L., Uzzell, V. J., Simoncelli, E. P., & Chichilnisky, E. J. (2005). Prediction\nand decoding of retinal ganglion cell responses with a probabilistic spiking model. J Neurosci,\n25(47), 11003\u201311013.\n\nRasmussen, C., & Williams, C.(2006). Gaussian processes for machine learning. Springer.\nRust, N., Schwartz, O., Movshon, J., & Simoncelli, E.(2005). Spatiotemporal Elements of Macaque\n\nV1 Receptive Fields. Neuron, 46(6), 945\u2013956.\n\nSeeger, M. (2005). Expectation propagation for exponential families (Tech. Rep.). University of\n\nCalifornia at Berkeley. (See www.kyb.tuebingen.mpg.de/bs/people/seeger.)\n\nSeeger, M., Steinke, F., & Tsuda, K. (2007). Bayesian inference and optimal design in the sparse\n\nlinear model. AI and Statistics.\n\nSimoncelli, E., Paninski, L., Pillow, J., & Schwartz, O.(2004). Characterization of neural responses\n\nwith stochastic stimuli. In M. Gazzaniga (Ed.), (Vol. 3, pp. 327\u2013338). MIT Press.\n\nSnyder, D., & Miller, M. (1991). Random point processes in time and space. Springer Texts in\n\nElectrical Engineering.\n\nSteveninck, R., & Bialek, W. (1988). Real-Time Performance of a Movement-Sensitive Neuron in\nthe Blow\ufb02y Visual System: Coding and Information Transfer in Short Spike Sequences. Proceed-\nings of the Royal Society of London. Series B, Biological Sciences, 234(1277), 379\u2013414.\n\nTibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal\n\nStatistical Society. Series B (Methodological), 58(1), 267\u2013288.\n\nTouryan, J., Lau, B., & Dan, Y. (2002). Isolation of Relevant Visual Features from Random Stimuli\n\nfor Cortical Complex Cells. Journal of Neuroscience, 22(24), 10811.\n\nZeck, G. M., Xiao, Q., & Masland, R. H. (2005). The spatial \ufb01ltering properties of local edge\n\ndetectors and brisk-sustained retinal ganglion cells. Eur J Neurosci, 22(8), 2016-26.\n\n8\n\n\f", "award": [], "sourceid": 284, "authors": [{"given_name": "Sebastian", "family_name": "Gerwinn", "institution": null}, {"given_name": "Matthias", "family_name": "Bethge", "institution": null}, {"given_name": "Jakob", "family_name": "Macke", "institution": null}, {"given_name": "Matthias", "family_name": "Seeger", "institution": null}]}