{"title": "Bayesian binning beats approximate alternatives: estimating peri-stimulus time histograms", "book": "Advances in Neural Information Processing Systems", "page_first": 393, "page_last": 400, "abstract": "The peristimulus time historgram (PSTH) and its more continuous cousin, the spike density function (SDF) are staples in the analytic toolkit of neurophysiologists. The former is usually obtained by binning spiketrains, whereas the standard method for the latter is smoothing with a Gaussian kernel. Selection of a bin with or a kernel size is often done in an relatively arbitrary fashion, even though there have been recent attempts to remedy this situation \\cite{ShimazakiBinningNIPS2006,ShimazakiBinningNECO2007}. We develop an exact Bayesian, generative model approach to estimating PSHTs and demonstate its superiority to competing methods. Further advantages of our scheme include automatic complexity control and error bars on its predictions.", "full_text": "Bayesian binning beats approximate alternatives:\n\nestimating peristimulus time histograms\n\nDominik Endres, Mike Oram, Johannes Schindelin and Peter F\u00a8oldi\u00b4ak\n\nSchool of Psychology\n\nUniversity of St. Andrews\n\n{dme2,mwo,js108,pf2}@st-andrews.ac.uk\n\nKY16 9JP, UK\n\nAbstract\n\nThe peristimulus time histogram (PSTH) and its more continuous cousin, the\nspike density function (SDF) are staples in the analytic toolkit of neurophysiol-\nogists. The former is usually obtained by binning spike trains, whereas the stan-\ndard method for the latter is smoothing with a Gaussian kernel. Selection of a bin\nwidth or a kernel size is often done in an relatively arbitrary fashion, even though\nthere have been recent attempts to remedy this situation [1, 2]. We develop an\nexact Bayesian, generative model approach to estimating PSTHs and demonstate\nits superiority to competing methods. Further advantages of our scheme include\nautomatic complexity control and error bars on its predictions.\n\n1 Introduction\n\nPlotting a peristimulus time histogram (PSTH), or a spike density function (SDF), from spiketrains\nevoked by and aligned to a stimulus onset is often one of the \ufb01rst steps in the analysis of neurophys-\niological data. It is an easy way of visualizing certain characteristics of the neural response, such\nas instantaneous \ufb01ring rates (or \ufb01ring probabilities), latencies and response offsets. These measures\nalso implicitly represent a model of the neuron\u2019s response as a function of time and are important\nparts of their functional description. Yet PSTHs are frequently constructed in an unsystematic man-\nner, e.g. the choice of time bin size is driven by result expectations as much as by the data. Recently,\nthere have been more principled approaches to the problem of determining the appropriate temporal\nresolution [1, 2].\nWe develop an exact Bayesian solution, apply it to real neural data and demonstrate its superiority\nto competing methods. Note that we do in no way claim that a PSTH is a complete generative\ndescription of spiking neurons. We are merely concerned with inferring that part of the generative\nprocess which can be described by a PSTH in a Bayes-optimal way.\n\n2 The model\n\nSuppose we wanted to model a PSTH on [tmin, tmax], which we discretize into T contiguous in-\ntervals of duration \u2206t = (tmax \u2212 tmin)/T (see \ufb01g.1, left). We select a discretization \ufb01ne enough\nso that we will not observe more than one spike in a \u2206t interval for any given spike train. This can\nbe achieved easily by choosing a \u2206t shorter than the absolute refractory period of the neuron under\ninvestigation. Spike train i can then be represented by a binary vector ~zi of dimensionality T . We\nmodel the PSTH by M +1 contiguous, non-overlapping bins having inclusive upper boundaries km,\nwithin which the \ufb01ring probability P (spike|t \u2208 (tmin +\u2206t(km\u22121 +1), tmin +\u2206t(km +1)]) = fm\nis constant. M is the number of bin boundaries inside [tmin, tmax]. The probability of a spike train\n\n1\n\n\fFigure 1: Left: Top: A spike train, recorded between times tmin and tmax is represented by a binary\nvector ~zi. Bottom: The time span between tmin and tmax is discretized into T intervals of duration\n\u2206t = (tmax \u2212 tmin)/T , such that interval k lasts from k \u00d7 \u2206t + tmin to (k + 1) \u00d7 \u2206t + tmin. \u2206t\nis chosen such that at most one spike is observed per \u2206t interval for any given spike train. Then, we\nmodel the \ufb01ring probabilities P (spike|t) by M + 1 = 4 contiguous, non-overlapping bins (M is the\nnumber of bin boundaries inside the time span [tmin, tmax]), having inclusive upper boundaries km\nand P (spike|t \u2208 (tmin + \u2206t(km\u22121 + 1), tmin + \u2206t(km + 1)]) = fm. Right: The core iteration. To\ncompute the evidence contribution subEm[T \u22121] of a model with a bin boundary at T \u22121 and m bin\nboundaries prior to T \u2212 1, we sum over all evidence contributions of models with a bin boundary at\nk and m \u2212 1 bin boundaries prior to k, where k \u2265 m \u2212 1, because m bin boundaries must occupy\nat least time intervals 0; . . . ; m \u2212 1. This takes O(T ) operations. Repeat the procedure to obtain\nsubEm[T \u22122]; . . . ; subEm[m]. Since we expect T (cid:29) m, computing all subEm[k] given subEm\u22121[k]\nrequires O(T 2) operations. For details, see text.\n\n~zi of independent spikes/gaps is then\n\nP (~zi|{fm},{km}, M) =\n\nMY\n\nm=0\n\nf s(~zi,m)\nm\n\n(1 \u2212 fm)g(~zi,m)\n\n(1)\n\nwhere s(~zi, m) is the number of spikes and g(~zi, m) is the number of non-spikes, or gaps in spike-\ntrain ~zi in bin m, i.e. between intervals km\u22121 +1 and km (both inclusive). In other words, we model\nthe spiketrains by an inhomogeneous Bernoulli process with piecewise constant probabilities. We\nalso de\ufb01ne k\u22121 = \u22121 and kM = T \u2212 1. Note that there is no binomial factor associated with\nthe contribution of each bin, because we do not want to ignore the spike timing information within\nthe bins, but rather, we try to build a simpli\ufb01ed generative model of the spike train. Therefore, the\nprobability of a (multi)set of spiketrains {~zi} = {z1, . . . , zN}, assuming independent generation, is\n\nP ({~zi}|{fm},{km}, M) =\n\nf s(~zi,m)\nm\n\n(1 \u2212 fm)g(~zi,m)\n\nMY\n\nm=0\n\nNY\nMY\n\ni=1\n\nwhere s({~zi}, m) =PN\n\ni=1 s(~zi, m) and g({~zi}, m) =PN\n\nm=0\n\ni=1 g(~zi, m)\n\n=\n\nf s({~zi},m)\n\nm\n\n(1 \u2212 fm)g({~zi},m)\n\n2.1 The priors\nWe will make a non-informative prior assumption for p({fm},{km}), namely\n\np({fm},{km}|M) = p({fm}|M)P ({km}|M).\n\n2\n\n(2)\n\n(3)\n\nP(spike|t)0k0k1k2k3=T\u00ad1\ue0adtttmintmaxk=[1,0,0,1,1,1,0,1,0,1,0,0,1,0]\ue098zif1f2subEm\u00ad1[m\u00ad1]subEm\u00ad1[m]subEm\u00ad1[T\u00ad2]subEm[T\u00ad1]km\u00ad1mT\u00ad2T\u00ad1\u00d7\u00a0getIEC(m,T\u00ad1,m)\u00d7\u00a0getIEC(T\u00ad1,T\u00ad1,m)\u00d7\u00a0getIEC(m+1,T\u00ad1,m)subEm\u00ad1[m\u00ad1]subEm\u00ad1[m]subEm[T\u00ad2]\u00d7\u00a0getIEC(m,T\u00ad2,m)\u00d7\u00a0getIEC(m+1,T\u00ad2,m)subEm[T\u00ad1]\fi.e. we have no a priori preferences for the \ufb01ring rates based on the bin boundary positions. Note\nthat the prior of the fm, being continuous model parameters, is a density. Given the form of eqn.(1)\nand the constraint fm \u2208 [0, 1], it is natural to choose a conjugate prior\n\nThe Beta density is de\ufb01ned in the usual way [3]:\n\nm=0\n\nMY\n\np({fm}|M) =\n\nB(fm; \u03c3m, \u03b3m).\n\nB(p; \u03c3, \u03b3) =\n\n\u0393(\u03c3 + \u03b3)\n\u0393(\u03c3)\u0393(\u03b3) p\u03c3(1 \u2212 p)\u03b3.\n(cid:19) .\n\n(cid:18) T \u2212 1\n\n1\n\nP ({km}|M) =\n\n(4)\n\n(5)\n\n(6)\n\n(7)\n\n(8)\n\n(12)\n\nThere are only \ufb01nitely many con\ufb01gurations of the km. Assuming we have no preferences for any of\nthem, the prior for the bin boundaries becomes\n\nM\n\nwhere the denominator is just the number of possibilities in which M ordered bin boundaries can\nbe distributed across T \u2212 1 places (bin boundary M always occupies position T \u2212 1, see \ufb01g.1,left ,\nhence there are only T \u2212 1 positions left).\n3 Computing the evidence P ({~zi}|M )\n\nTo calculate quantities of interest for a given M, e.g. predicted \ufb01ring probabilities and their variances\nor expected bin boundary positions, we need to compute averages over the posterior\n\np({fm},{km}|M,{~zi}) = p({~zi},{fm},{km}|M)\n\nP ({~zi}|M)\n\nwhich requires the evaluation of the evidence, or marginal likelihood of a model with M bins:\n\nP ({~zi}|M) =\n\n. . .\n\nP ({~zi}|{km}, M)P ({km}|M)\n\nT\u22122X\n\nkM\u22121\u22121X\n\nkM\u22121=M\u22121\n\nkM\u22122=M\u22122\n\nk1\u22121X\n\nk0=0\n\nwhere the summation boundaries are chosen such that the bins are non-overlapping and contiguous\nand\n\nP ({~zi}|{km}, M) =\n\ndfM P ({~zi}|{fm},{km}, M)p({fm}|M).\n\n(9)\n\nZ 1\n\nZ 1\n\ndf0\n\ndf1 . . .\n\n0\n\n0\n\nBy virtue of eqn.(2) and eqn.(4), the integrals can be evaluated:\n\nP ({~zi}|{km}, M) =\n\n\u0393(s({~zi}, m) + \u03c3m)\u0393(g({~zi}, m) + \u03b3m)\n\u0393(s({~zi}, m) + \u03c3m + g({~zi}, m) + \u03b3m)\n\nMY\n\nm=0\n\n\u0393(\u03c3m + \u03b3m)\n\u0393(\u03c3m)\u0393(\u03b3m) .\n\n(10)\n\nZ 1\nMY\n\n0\n\nm=0\n\nComputing the sums in eqn.(8) quickly is a little tricky. A na\u00a8\u0131ve approach would suggest that a\ncomputational effort of O(T M ) is required. However, because eqn.(10) is a product with one factor\nper bin, and because each factor depends only on spike/gap counts and prior parameters in that bin,\nthe process can be expedited. We will use an approach very similar to that described in [4, 5] in the\ncontext of density estimation and in [6, 7] for Bayesian function approximation: de\ufb01ne the function\n\ngetIEC(ks, ke, m) :=\n\n\u0393(s({~zi}, ks, ke) + \u03c3m)\u0393(g({~zi}, ks, ke) + \u03b3m)\n\u0393(s({~zi}, ks, ke) + \u03c3m + g({~zi}, ks, ke) + \u03b3m)\n\n(11)\nwhere s({~zi}, ks, ke) is the number of spikes and g({~zi}, ks, ke) is the number of gaps in {~zi}\nbetween the start interval ks and the end interval ke (both included). Furthermore, collect all contri-\nbutions to eqn.(8) that do not depend on the data (i.e. {~zi}) and store them in the array pr[M]:\n\nQM\n(cid:18) T \u2212 1\n\nm=0\n\n\u0393(\u03c3m+\u03b3m)\n\u0393(\u03c3m)\u0393(\u03b3m)\n\n(cid:19) .\n\npr[M] :=\n\nM\n\n3\n\n\fT\u22122X\n\nk1\u22121X\n\nMY\n\nkM\u22121=M\u22121\n\nk0=0\n\nm=1\n\nSubstituting eqn.(10) into eqn.(8) and using the de\ufb01nitions (11) and (12), we obtain\n\nP ({~zi}|M) \u221d\n\n. . .\n\ngetIEC(km\u22121 + 1, km, m)getIEC(0, k0, 0)\n\n(13)\n\nwith kM = T \u2212 1 and the constant of proportionality being pr[M]. Since the factors on the r.h.s.\ndepend only on two consecutive bin boundaries each, it is possible to apply dynamic programming\n[8]: rewrite the r.h.s. by \u2019pushing\u2019 the sums as far to the right as possible:\n\ngetIEC(kM\u22121+1, T \u22121, M)\n\ngetIEC(kM\u22122+1, kM\u22121, M\u22121)\n\nkM\u22121\u22121X\n\nkM\u22122=M\u22122\n\nP ({~zi}|M) \u221d T\u22122X\n\nkM\u22121=M\u22121\n\n\u00d7 . . .\n\nk1\u22121X\n\nk0=0\n\ngetIEC(k0 + 1, k1, 1)getIEC(0, k0, 0).\n\n(14)\n\nEvaluating the sum over k0 requires O(T ) operations (assuming that T (cid:29) M, which is likely to\nbe the case in real-world applications). As the summands depend also on k1, we need to repeat this\nevaluation O(T ) times, i.e. summing out k0 for all possible values of k1 requires O(T 2) operations.\nThis procedure is then repeated for the remaining M \u2212 1 sums, yielding a total computational\neffort of O(M T 2). Thus, initialize the array subE0[k] := getIEC(0, k, 0), and iterate for all m =\n1, . . . , M:\n\nsubEm[k] :=\n\ngetIEC(r + 1, k, m)subEm\u22121[r],\n\n(15)\n\nk\u22121X\n\nr=m\u22121\n\nA close look at eqn.(14) reveals that while we sum over kM\u22121, we need subEM\u22121[k] for k =\nM \u2212 1; . . . ; T \u2212 2 to compute the evidence of a model with its latest boundary at T \u2212 1. We can,\nhowever, compute subEM\u22121[T \u2212 1] with little extra effort, which is, up to a factor pr[M \u2212 1], equal\nto P ({~zi}|M \u2212 1), i.e. the evidence for a model with M \u2212 1 bin boundaries. Moreover, having\ncomputed subEm[k], we do not need subEm\u22121[k \u2212 1] anymore. Hence, the array subEm\u22121[k] can\nbe reused to store subEm[k], if overwritten in reverse order. In pseudo-code (E[m] contains the\nevidence of a model with m bin boundaries inside [tmin, tmax] after termination):\n\nTable 1: Computing the evidences of models with up to M bin boundaries\n1. for k := 0 . . . T \u2212 1 : subE[k] := getIEC(0, k, 0)\n2. E[0] := subE[T \u2212 1] \u00d7 pr[0]\n3. for m := 1 . . . M :\n\n(a) if m = M then l := T \u2212 1 else l := m\n(b) for k := T \u2212 1 . . . l\n\nsubE[k] :=Pk\u22121\n\n(c) E[m] = subE[T \u2212 1] \u00d7 pr[m]\n\nr:=m\u22121 subE[r] \u00d7 getIEC(r + 1, k, m)\n\n4. return E[]\n\n4 Predictive \ufb01ring rates and variances\nWe will now calculate the predictive \ufb01ring rate P (spike|\u02dck,{~zi}, M). For a given con\ufb01guration of\n{fm} and {km}, we can write\n\nP (spike|\u02dck,{fm},{km}, M) =\n\nfm1(\u02dck \u2208 {km\u22121 + 1, km})\n\n(16)\n\nwhere the indicator function 1(x) = 1 iff x is true and 0 otherwise. Note that the probability\nof a spike given {km} and {fm} does not depend on any observed data. Since the bins are non-\noverlapping, \u02dck \u2208 {km\u22121 + 1, km} is true for exactly one summand and P (spike|\u02dck,{~zi},{km})\nevaluates to the corresponding \ufb01ring rate.\n\nm=0\n\n4\n\nMX\n\n\fTo \ufb01nish we average eqn.(16) over the posterior eqn.(7). The denominator of eqn.(7) is independent\nof {fm},{km} and is obtained by integrating/summing the numerator via the algorithm in table 1.\nThus, we only need to multiply the integrand of eqn.(9) (i.e. the numerator of the posterior) with\nP (spike|\u02dck,{fm},{km}, M), thereby replacing eqn.(11) with\n\ngetIEC(ks, ke, m) :=\n\n\u0393(s({~zi}, ks, ke) + 1(\u02dck \u2208 {ks, ke}) + \u03c3m)\u0393(g({~zi}, ks, ke) + \u03b3m)\n\u0393(s({~zi}, ks, ke) + 1(\u02dck \u2208 {ks, ke}) + \u03c3m + g({~zi}, ks, ke) + \u03b3m)\n\n(17)\n\ni.e. we are adding an additional spike to the data at \u02dck. Call the array returned by this modi\ufb01ed\nalgorithm E\u02dck[]. By virtue of eqn.(7) we then \ufb01nd P (spike|\u02dck,{~zi}, M) = E\u02dck[M ]\n. To evaluate the\nE[M ]\nm. This can be computed by adding two spikes at \u02dck.\nvariance, we need the posterior expectation of f 2\n\n5 Model selection vs. model averaging\nTo choose the best M given {~zi}, or better, a probable range of Ms, we need to determine the model\nposterior\n\nP\nP (M|{~zi}) = P ({~zi}|M)P (M)\nm P ({~zi}|m)P (m)\n\n(18)\n\nwhere P (M) is the prior over M, which we assume to be uniform. The sum in the denominator\nruns over all values of m which we choose to include, at most 0 \u2264 m \u2264 T \u2212 1.\nOnce P (M|{~zi}) is evaluated, we could use it to select the most probable M0. However, making this\ndecision means \u2019contriving\u2019 information, namely that all of the posterior probability is concentrated\nat M0. Thus we should rather average any predictions over all possible M, even if evaluating such\nan average has a computational cost of O(T 3), since M \u2264 T \u2212 1. If the structure of the data allow,\nit is possible, and useful given a large enough T , to reduce this cost by \ufb01nding a range of M, such\nthat the risk of excluding a model even though it provides a good description of the data is low. In\nanalogy to the signi\ufb01cance levels of orthodox statistics, we shall call this risk \u03b1. If the posterior of\nM is unimodal (which it has been in most observed cases, see \ufb01g.3, right, for an example), we can\nthen choose the smallest interval of Ms around the maximum of P (M|{~zi}) such that\n\nP (Mmin \u2264 M \u2264 Mmax|{~zi}) \u2264 1 \u2212 \u03b1\n\n(19)\n\nand carry out the averages over this range of M after renormalizing the model posterior.\n\n6 Examples and comparison to other methods\n\n6.1 Data acquisition\n\nWe obtained data through [9], where the experimental protocols have been described. Brie\ufb02y, extra-\ncellular single-unit recordings were made using standard techniques from the upper and lower banks\nof the anterior part of the superior temporal sulcus (STSa) and the inferior temporal cortex (IT) of\ntwo monkeys (Macaca mulatta) performing a visual \ufb01xation task. Stimuli were presented for 333\nms followed by an 333 ms inter-stimulus interval in random order. The anterior-posterior extent of\nthe recorded cells was from 7mm to 9mm anterior of the interaural plane consistent with previous\nstudies showing visual responses to static images in this region [10, 11, 12, 13]. The recorded cells\nwere located in the upper bank (TAa, TPO), lower bank (TEa, TEm) and fundus (PGa, IPa) of STS\nand in the anterior areas of TE (AIT of [14]). These areas are rostral to FST and we collectively\ncall them the anterior STS (STSa), see [15] for further discussion. The recorded \ufb01ring patters were\nturned into distinct samples, each of which contained the spikes from \u2212300 ms before to 600 ms\nafter the stimulus onset with a temporal resolution of 1 ms.\n\n6.2\n\nInferring PSTHs\n\nTo see the method in action, we used it to infer a PSTH from 32 spiketrains recorded from one of the\navailable STSa neurons (see \ufb01g.2, A). Spikes times are relative to the stimulus onset. We discretized\nthe interval from \u2212100ms pre-stimulus to 500ms post-stimulus into \u2206t = 1ms time intervals and\n\n5\n\n\fA\n\nB\n\nC\n\nD\n\nFigure 2: Predicting a PSTH/SDF with 3 different methods. A: the dataset used in this comparison\nconsisted of 32 spiketrains recorded from a STSa neuron. Each tick mark represents a spike. B:\nPSTH inferred with our Bayesian binning method. The thick line represents the predictive \ufb01ring\nrate (section 4), the thin lines show the predictive \ufb01ring rate \u00b11 standard deviation. Models with\n4 \u2264 M \u2264 13 were included on a risk level of \u03b1 = 0.1 (see eqn.(19)). C: bar PSTH (solid lines),\noptimal binsize \u2248 26ms, and line PSTH (dashed lines), optimal binsize \u2248 78ms, computed by the\nmethods described in [1, 2]. D: SDF obtained by smoothing the spike trains with a 10ms Gaussian\nkernel.\n\ncomputed the model posterior (eqn.(18)) (see \ufb01g.3, right). The prior parameters were equal for all\nbins and set to \u03c3m = 1 and \u03b3m = 32. This choice corresponds to a \ufb01ring probability of \u2248 0.03 in\neach 1 ms time interval (30 spikes/s), which is typical for the neurons in this study1. Models with\n4 \u2264 M \u2264 13 (expected bin sizes between \u2248 23ms-148ms) were included on an \u03b1 = 0.1 risk level\nthe expected \ufb01ring rate,\n(eqn.(19)) in the subsequent calculation of the predictive \ufb01ring rate (i.e.\nhence the continuous appearance) and standard deviation (\ufb01g.2, B). Fig.2, C, shows a bar PSTH and\na line PSTH computed with the recently developed methods described in [1, 2]. Roughly speaking,\nthe \u03c3m, \u03b3m which maximize of P ({~zi}|\u03c3m, \u03b3m) =\n1Alternatively,\nM P ({~zi}|M )P (M|\u03c3m, \u03b3m), where P ({~zi}|M ) is given by eqn.(8). Using a uniform P (M|\u03c3m, \u03b3m),\n\none could search for\n\nP\n\nwe found \u03c3m \u2248 2.3 and \u03b3m \u2248 37 for the data in \ufb01g.2, A\n\n6\n\n|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||0102030spiketrain number00.050.1P(spike)00.050.1P(spike)-1000100200300400500600time, ms after stimulus onset00.050.1P(spike)\fthese methods try to optimize a compromise between minimal within-bin variance and maximal\nbetween-bin variance. In this example, the bar PSTH consists of 26 bins. Graph D in \ufb01g.2 depicts a\nSDF obtained by smoothing the spiketrains with a 10ms wide Gaussian kernel, which is a standard\nway of calculating SDFs in the neurophysiological literature.\nAll tested methods produce results which are, upon cursory visual inspection, largely consistent\nwith the spiketrains. However, Bayesian binning is better suited than Gaussian smoothing to model\nsteep changes, such as the transient response starting at \u2248 100ms. While the methods from [1, 2]\nshare this advantage, they suffer from two drawbacks: \ufb01rstly, the bin boundaries are evenly spaced,\nhence the peak of the transient is later than the scatterplots would suggest. Secondly, because the\nbin duration is the only parameter of the model, these methods are forced to put many bins even\nin intervals that are relatively constant, such as the baselines before and after the stimulus-driven\nresponse. In contrast, Bayesian binning, being able to put bin boundaries anywhere in the time span\nof interest, can model the data with less bins \u2013 the model posterior has its maximum at M = 6 (7\nbins), whereas the bar PSTH consists of 26 bins.\n\n6.3 Performance comparison\n\nFigure 3: Left: Comparison of Bayesian Binning with competing methods by 5-fold crossvalidation.\nThe CV error is the negative expected log-probability of the test data. The histograms show rela-\ntive frequencies of CV error differences between 3 competing methods and our Bayesian binning\napproach. Gaussian: SDFs obtained by Gaussian smoothing of the spiketrains with a 10 ms kernel.\nBar PSTH and line PSTH: PSTHs computed by the binning methods described in [1, 2]. Right:\nModel posterior P (M|{~zi}) (see eqn.(18)) computed from the data shown in \ufb01g.2. The shape is\nfairly typical for model posteriors computed from the neural data used in this paper: a sharp rise at\na moderately low M followed by a maximum (here at M = 6) and an approximately exponential\ndecay. Even though a maximum M of 699 would have been possible, P (M > 23|{~zi}) < 0.001.\nThus, we can accelerate the averaging process for quantities of interest (e.g. the predictive \ufb01ring\nrate, section 4) by choosing a moderately small maximum M.\n\nFor a more rigorous method comparison, we split the data into distinct sets, each of which contained\nthe responses of a cell to a different stimulus. This procedure yielded 336 sets from 20 cells with at\nleast 20 spiketrains per set. We then performed 5-fold crossvalidation, the crossvalidation error is\ngiven by the negative logarithm of the data (spike or gap) in the test sets:\n\nCV error = \u2212hlog(P (spike|t))i .\n\n(20)\nThus, we measure how well the PSTHs predict the test data. The Gaussian SDFs were discretized\ninto 1 ms time intervals prior to the procedure. We average the CV error over the 5 estimates to obtain\na single estimate for each of the 336 neuron/stimulus combinations. On average, the negative log\nlikelihood of our Bayesian approach predicting the test data (0.04556\u00b10.00029, mean \u00b1 SEM) was\nsigni\ufb01cantly better than any of the other methods (10ms Gaussian kernel: 0.04654 \u00b1 0.00028; Bar\nPSTH: 0.04739\u00b10.00029; Line PSTH: 0.04658\u00b10.00029). To directly compare the performance of\ndifferent methods we calculate the difference in the CV error for each neuron/stimulus combination.\nHere a positive value indicates that Bayesian binning predicts the test data more accurately than the\nalternative method. Fig.3, left, shows the relative frequencies of CV error differences between the\n3 other methods and our approach. Bayesian binning predicted the data better than the three other\n\n7\n\n00.20.400.20.4relative frequency00.0050.010.015CV error relative to Bayesian Binning00.20.410 ms Gaussianbar PSTHline PSTH0102030M00.050.1P(M|{z i})\fmethods in at least 295/336 cases, with a minimal difference of \u2248 \u22120.0008, indicating the general\nutility of this approach.\n\n7 Summary\n\nWe have introduced an exact Bayesian binning method for the estimation of PSTHs. Besides treating\nuncertainty \u2013 a real problem with small neurophysiological datasets \u2013 in a principled fashion, it also\noutperforms competing methods on real neural data. It offers automatic complexity control because\nthe model posterior can be evaluated. While its computational cost is signi\ufb01cantly higher than that\nof the methods we compared it to, it is still fast enough to be useful: evaluating the predictive\nprobability takes less than 1s on a modern PC2, with a small memory footprint (<10MB for 512\nspiketrains).\nMoreover, our approach can easily be adapted to extract other characteristics of neural responses in\na Bayesian way, e.g. response latencies or expected bin boundary positions. Our method reveals\na clear and sharp initial response onset, a distinct transition from the transient to the sustained part\nof the response and a well-de\ufb01ned offset. An extension towards joint PSTHs from simultaneous\nmulti-cell recordings is currently being implemented.\n\nReferences\n[1] H. Shimazaki and S. Shinomoto. A recipe for optimizing a time-histogram. In B. Sch\u00a8olkopf,\nJ. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages\n1289\u20131296. MIT Press, Cambridge, MA, 2007.\n\n[2] H. Shimazaki and S. Shinomoto. A method for selecting the bin size of a time histogram.\n\nNeural Computation, 19(6):1503\u20131527, 2007.\n\n[3] J.O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, 1985.\n[4] D. Endres and P. F\u00a8oldi\u00b4ak. Bayesian bin distribution inference and mutual information. IEEE\n\nTransactions on Information Theory, 51(11), 2005.\n\n[5] D. Endres. Bayesian and Information-Theoretic Tools for Neuroscience. PhD thesis, School\n\nof Psychology, University of St. Andrews, U.K., 2006. http://hdl.handle.net/10023/162.\n\n[6] M. Hutter.\n\nBayesian regression of piecewise constant functions.\n\narXiv:math/0606315v1, IDSIA-14-05, 2006.\n\nTechnical Report\n\n[7] M. Hutter. Exact bayesian regression of piecewise constant functions. Journal of Bayesian\n\nAnalysis, 2(4):635\u2013664, 2007.\n\n[8] D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scienti\ufb01c, 2000.\n[9] M. W. Oram, D. Xiao, B. Dritschel, and K.R. Payne. The temporal precision of neural signals:\nA unique role for response latency? Philosophical Transactions of the Royal Society, Series B,\n357:987\u20131001, 2002.\n\n[10] CJ Bruce, R Desimone, and CG Gross. Visual properties of neurons in a polysensory area in\n\nsuperior temporal sulcus of the macaque. Journal of Neurophysiology, 46:369\u2013384, 1981.\n\n[11] DI Perrett, ET Rolls, and W Caan. Visual neurons responsive to faces in the monkey temporal\n\ncortex. Expl. Brain. Res., 47:329\u2013342, 1982.\n\n[12] G.C. Baylis, E.T. Rolls, and C.M. Leonard. Functional subdivisions of the temporal lobe\n\nneocortex. 1987.\n\n[13] M. W. Oram and D. I. Perrett. Time course of neural responses discriminating different views\n\nof the face and head. Journal of Neurophysiology, 68(1):70\u201384, 1992.\n\n[14] K Tanaka, H Saito, Y Fukada, and M Moriya. Coding visual images of objects in the infer-\notemporal cortex of the macaque monkey. Journal of Neurophysiology, pages 170\u2013189, 1991.\n[15] N.E. Barraclough, D. Xiao, C.I. Baker, M.W. Oram, and D.I. Perrett. Integration of visual and\nauditory information by superior temporal sulcus neurons responsive to the sight of actions.\nJournal of Cognitive Neuroscience, 17, 2005.\n\n23.2 GHz Intel XeonTM, SuSE Linux 10.0\n\n8\n\n\f", "award": [], "sourceid": 160, "authors": [{"given_name": "Dominik", "family_name": "Endres", "institution": null}, {"given_name": "Mike", "family_name": "Oram", "institution": null}, {"given_name": "Johannes", "family_name": "Schindelin", "institution": null}, {"given_name": "Peter", "family_name": "Foldiak", "institution": null}]}