{"title": "Fully Bayesian inference for neural models with negative-binomial spiking", "book": "Advances in Neural Information Processing Systems", "page_first": 1898, "page_last": 1906, "abstract": "Characterizing the information carried by neural populations in the brain requires accurate statistical models of neural spike responses. The negative-binomial distribution provides a convenient model for over-dispersed spike counts, that is, responses with greater-than-Poisson variability. Here we describe a powerful data-augmentation framework for fully Bayesian inference in neural models with negative-binomial spiking. Our approach relies on a recently described latent-variable representation of the negative-binomial distribution, which equates it to a Polya-gamma mixture of normals. This framework provides a tractable, conditionally Gaussian representation of the posterior that can be used to design efficient EM and Gibbs sampling based algorithms for inference in regression and dynamic factor models. We apply the model to neural data from primate retina and show that it substantially outperforms Poisson regression on held-out data, and reveals latent structure underlying spike count correlations in simultaneously recorded spike trains.", "full_text": "Fully Bayesian inference for neural models with\n\nnegative-binomial spiking\n\nJonathan W. Pillow\n\nCenter for Perceptual Systems\n\nDepartment of Psychology\n\nThe University of Texas at Austin\npillow@mail.utexas.edu\n\nJames G. Scott\n\nDivision of Statistics and Scienti\ufb01c Computation\n\nMcCombs School of Business\n\nThe University of Texas at Austin\n\njames.scott@mccombs.utexas.edu\n\nAbstract\n\nCharacterizing the information carried by neural populations in the brain requires\naccurate statistical models of neural spike responses. The negative-binomial dis-\ntribution provides a convenient model for over-dispersed spike counts, that is,\nresponses with greater-than-Poisson variability. Here we describe a powerful\ndata-augmentation framework for fully Bayesian inference in neural models with\nnegative-binomial spiking. Our approach relies on a recently described latent-\nvariable representation of the negative-binomial distribution, which equates it to\na Polya-gamma mixture of normals. This framework provides a tractable, con-\nditionally Gaussian representation of the posterior that can be used to design ef-\n\ufb01cient EM and Gibbs sampling based algorithms for inference in regression and\ndynamic factor models. We apply the model to neural data from primate retina\nand show that it substantially outperforms Poisson regression on held-out data,\nand reveals latent structure underlying spike count correlations in simultaneously\nrecorded spike trains.\n\n1\n\nIntroduction\n\nA central problem in systems neuroscience is to understand the probabilistic representation of infor-\nmation by neurons and neural populations. Statistical models play a critical role in this endeavor, as\nthey provide essential tools for quantifying the stochasticity of neural responses and the information\nthey carry about various sensory and behavioral quantities of interest.\nPoisson and conditionally Poisson models feature prominently in systems neuroscience, as they\nprovide a convenient and tractable description of spike counts governed by an underlying spike rate.\nHowever, Poisson models are limited by the fact that they constrain the ratio between the spike count\nmean and variance to one. This assumption does not hold in many brain areas, particularly cortex,\nwhere responses are often over-dispersed relative to Poisson [1].\nA second limitation of Poisson models in regression analyses (for relating spike responses to stimuli)\nor latent factor analyses (for \ufb01nding common sources of underlying variability) is the dif\ufb01culty of\nperforming fully Bayesian inference. The posterior formed under Poisson likelihood and Gaussian\nprior has no tractable representation, so most theorists resort to either fast, approximate methods\nbased on Gaussians, [2\u20139] or slower, sampling-based methods that may scale poorly with data or\ndimensionality [10\u201315].\nThe negative-binomial (NB) distribution generalizes the Poisson with a shape parameter that con-\ntrols the tradeoff between mean and variance, providing an attractive alternative for over-dispersed\nspike count data. Although well-known in statistics, it has only recently been applied for neural\ndata [16\u201318]. Here we describe fully Bayesian inference methods for the neural spike count data\nbased on a recently developed representation of the NB as a Gaussian mixture model [19]. In the\n\n1\n\n\fA weights\nstimulus\n\nB\n\nshape\n\nresponse\n\nC\n\n300\n\n200\n\n100\n\ne\nc\nn\na\ni\nr\na\nv\n\n0\n\n0\n\nn\n\no\n\ns\n\nP o i s\n\n50\nmean\n\n100\n\nlatent\n\nFigure 1: Representations of the negative-binomial (NB) regression model. (A) Graphical model for\nstandard gamma-Poisson mixture representation of the NB. The linearly projected stimulus t =\nT xt de\ufb01nes the scale parameter for a gamma r.v. with shape parameter \u21e0, giving t \u21e0 Ga(e t,\u21e0 ),\nwhich is in turn the rate for a Poisson spike count: yt \u21e0 Poiss(t). (B) Graphical model illustrating\nnovel representation as a Polya-Gamma (PG) mixture of normals. Spike counts are represented as\nNB distributed with shape \u21e0 and rate pt = 1/(1 + e t). The latent variable !t is conditionally PG,\nwhile (and |x) are normal given (!t,\u21e0 ), which facilitates ef\ufb01cient inference. (C) Relationship\nbetween spike-count mean and variance for different settings of shape parameter \u21e0, illustrating super-\nPoisson variability of the NB model.\n\nfollowing, we review the conditionally Gaussian representation for the negative-binomial (Sec. 2),\ndescribe batch-EM, online-EM and Gibbs-sampling based inference methods for NB regression\n(Sec. 3), sampling-based methods for dynamic latent factor models (Sec. 4), and show applications\nto spiking data from primate retina.\n\n2 The negative-binomial model\nBegin with the single-variable case where the data Y = {yt} are scalar counts observed at times\nt = 1, . . . , N. A standard Poisson generalized linear model (GLM) assumes that yt \u21e0 Pois(e t),\nwhere the log rate parameter t may depend upon the stimulus. One dif\ufb01culty with this model is\nthat the variance of the Poisson distribution is equal to its mean, an assumption that is violated in\nmany data sets [20\u201322].\nTo relax this assumption, we can consider the negative binomial model, which can be described as a\ndoubly-stochastic or hierarchical Poisson model [18]. Suppose that yt arises according to:\n\n(yt | t) \u21e0 Pois(t)\n(t | \u21e0, t) \u21e0 Ga\u21e0, e t ,\np(yt | \u21e0, t) / (1 pt)\u21e0 pyt\n\n,\n\nt\n\nwhere we have parametrized the Gamma distribution in terms of its shape and scale parameters. By\nmarginalizing over the top-level model for t, we recover a negative-binomial distribution for yt:\n\nwhere pt is related to t via the logistic transformation:\n\npt =\n\ne t\n\n1 + e t\n\n.\n\nThe extra parameter \u21e0 therefore allows for over-dispersion compared to the Poisson, with the count\nyt having expected value \u21e0e t and variance \u21e0e t(1 + e t). (See Fig. 1).\nBayesian inference for models of this form has long been recognized as a challenging problem, due\nto the analytically inconvenient form of the likelihood function. To see the dif\ufb01culty, suppose that\nt is a linear function of known inputs xt = (xt1, . . . , xtP )T . Then the conditional posterior\n t = xT\ndistribution for , up to a multiplicative constant, is\n\np( | \u21e0, Y ) / p() \u00b7\n\n{exp(xT\n{1 + exp(xT\n\nt )}yt\nt )}\u21e0+yt\n\n,\n\n(1)\n\nwhere p() is the prior distribution, and where we have assumed for the moment that \u21e0 is \ufb01xed.\nThe two major issues are the same as those that arise in Bayesian logistic regression: the response\n\nNYt=1\n\n2\n\n\fdepends non-linearly upon the parameters, and there is no natural conjugate prior p() to facilitate\nposterior computation.\nOne traditional approach for Bayesian inference in logistic models is to work directly with the\ndiscrete-data likelihood. A variety of tactics along these lines have been proposed, including numer-\nical integration [23], analytic approximations to the likelihood [24\u201326], or Metropolis-Hastings [27].\nA second approach is to assume that the discrete outcome is some function of an unobserved contin-\nuous quantity or latent variable. This is most familiar in the case of Bayesian inference for the probit\nor dichotomized-Gaussian model [28,29], where binary outcomes yi are assumed to be thresholded\nversions of a latent Gaussian quantity zi. The same approach has also been applied to logistic\nand Poisson regression [30, e.g.]. Unfortunately, none of these schemes lead to a fully automatic\napproach to posterior inference, as they require either approximations (whose quality must be vali-\ndated) or the careful selection of tuning constants (as is typically required when using, for example,\nthe Metropolis\u2013Hastings sampler in very high dimensions).\nTo proceed with Bayesian inference in the negative-binomial model, we appeal to a recent latent-\nvariable construction (depicted in Fig. 1B) from [19] based on the theory of Polya-Gamma random\nvariables. The basic result we exploit is that the negative binomial likelihood can be represented as\na mixture of normals with Polya-Gamma mixing distribution. The algorithms that result from this\nscheme are both exact (in the sense of avoiding analytic approximations) and fully automatic.\nDe\ufb01nition 1. A random variable X has a Polya-Gamma distribution with parameters b > 0 and\nc 2 R, denoted X \u21e0 PG(b, c), if\nX D=\n\ngk\n\n,\n\n(2)\n\n1\n2\u21e12\n\n1Xk=1\n\n(k 1/2)2 + c2/(4\u21e12)\n\nwhere each gk \u21e0 Ga(b, 1) is an independent gamma random variable, and where D= denotes equality\nin distribution.\n\nWe make use of four important facts about Polya-Gamma variables from [19]. First, suppose that\np(!) denotes the density of the random variable ! \u21e0 PG(b, 0), for b > 0. Then for any choice of a,\n(3)\n\ne! 2/2 p(!) d! ,\n\n(e )a\n\n0\n\n(1 + e )b = 2be\uf8ff Z 1\n{1 + exp( t)}h+yt / e\uf8fft tZ 1\n\n{exp( t)}yt\n\n0\n\nwhere \uf8ff = a b/2. This integral identity allows us to rewrite each term in the negative binomial\nlikelihood (eq. 1) as\n(1 pt)\u21e0 pyt\n\ne!t 2/2 p(! | \u21e0 + yt, 0) d! ,\n\nt =\n\n(4)\n\nwhere \uf8fft = (yt \u21e0)/2, and where the mixing distribution is Polya-Gamma. Conditional upon !t,\nwe have a likelihood proportional to eQ( t) for some quadratic form Q, which will be conditionally\nconjugate to any Gaussian or mixture-of-Gaussians prior for t. This conditional Gaussianity can\nbe exploited to great effect in MCMC, EM, and sequential Monte Carlo algorithms, as described in\nthe next section.\nA second important fact is that the conditional distribution\n\ne! 2/2 p(!)\n\n0 e! 2/2 p(!) d!\n\np(! | ) =\n\nR 1\n\nis also in the Polya-Gamma class: (! | ) \u21e0 PG(b, ). In this sense, the Polya-Gamma distribution\nis conditionally conjugate to the NB likelihood, which is very useful for Gibbs sampling.\nThird, although the density of a Polya-Gamma random variable can be expressed only as an in\ufb01nite\nseries, its expected value is known in closed form: if ! \u21e0 PG(b, c), then\n\nE(!) =\n\ntanh(c/2) .\n\n(5)\n\nb\n2c\n\nAs we show in the next section, this expression comes up repeatedly when \ufb01tting negative-binomial\nmodels via expectation-maximization, where these moments of !t form a set of suf\ufb01cient statistics\nfor the complete-data log posterior distribution in .\n\n3\n\n\fFinally, despite the awkward form of the density function, it is still relatively easy to simulate random\nPolya-Gamma draws, avoiding entirely the need to truncate the in\ufb01nite sum in Equation 2. As the\nauthors of [19] show, this can be accomplished via a highly ef\ufb01cient accept-reject algorithm using\nideas from [31]. The proposal distribution requires only exponential, uniform, and normal random\nvariates; and the algorithm\u2019s acceptance probability is uniformly bounded below at 0.9992 (implying\nroughly 8 rejected draws out of every 10,000 proposals).\nAs we now describe, these four facts are suf\ufb01cient to allow straightforward Bayesian inference for\nnegative-binomial models. We focus \ufb01rst on regression models, for which we derive simple Gibbs\nsampling and EM algorithms. We then turn to negative-binomial dynamic factor models, which can\nbe \ufb01t using a variant of the forward-\ufb01lter, backwards-sample (FFBS) algorithm [32].\n\n3 Negative-binomial regression\n\n3.1 Fully Bayes inference via MCMC\n\nSuppose that t = xT\nbution of observation t to the likelihood is\n\nt for some p-vector of regressors xt. Then, conditional upon !t, the contri-\n\nt !t(xT\nLt() / exp{\uf8fftxT\n/ exp(\n2 \u2713 yt \u21e0\n!t\n\n2!t xT\n\nt )2/2}\n\nt \u25c62) .\n\nLet \u2326= diag(!1, . . . ,! n); let zt = (yt \u21e0)/(2!t); and let z denote the stacked vector of zt terms.\nCombining all terms in the likelihood leads to a Gaussian linear-regression model where\n\n(z | , \u2326) \u21e0 N (X, \u23261) .\n\nIt is usually reasonable to assume a conditionally Gaussian prior, \u21e0 N (c, C). Note that C itself\nmay be random, as in, for example, a Bayesian lasso or horseshoe prior [33\u201335]. Gibbs sampling\nproceeds in two simple steps:\n\nwhere PG denotes a Polya-Gamma draw, and where\n\n(!t | \u21e0, ) \u21e0 PG(yt + \u21e0, xT\n( | \u2326, z) \u21e0 N (m, V ) ,\n\nt )\n\nV = (X T \u2326X + C1)1\nm = V (X T \u2326z + C1c) .\n\nOne may update the dispersion parameter \u21e0 via Gibbs sampling, using the method described in [36].\n\n3.2 Batch EM for MAP estimation\n\nWe may also use the same data-augmentation trick in an expectation-maximization (EM) algorithm\nto compute the maximum a-posteriori (MAP) estimate \u02c6. Returning to the likelihood in (4) and\nignoring constants of proportionality, we may write the complete-data log posterior distribution,\ngiven !1, . . . ,! N, as\n\nQ() = log p( | Y, !1, . . . ,! N ) =\n\nNXt=1\u21e2(xT\n\nt ) \u00b7\n\nyt \u21e0\n\n2 !t\n\n(xT\n\nt )2\n\n2 + log p()\n\nfor some prior p(). This expression is linear in !t. Therefore we may compute E{Q()} by\nsubstituting \u02c6!t = E(!t | ), given the current value of , into the above expression. Appealing to\n(5), these conditional expectations are available in closed form:\n\nt \u25c6 tanh(xT\nt /2) ,\nwhere \uf8fft = (yt \u21e0)/2. In the M step, we re-express E{Q()} as\n\nE(!t | ) =\u2713 \uf8fft\n\nxT\n\nE{Q()} = \n\n1\n2\n\nT S + T d + log p() ,\n\n4\n\n\fwhere the complete-data suf\ufb01cient statistics are\n\nS = X T \u02c6\u2326X\nd = X T \uf8ff\n\nfor \u02c6\u2326= diag( \u02c6!1, . . . , \u02c6!N ) and \uf8ff = (\uf8ff1, . . . ,\uf8ff N )T . Thus the M step is a penalized weighted least\nsquares problem, which can be solved using standard methods. In fact, it is typically unnecessary\nto maximize E{Q()} exactly at each iteration. As is well established in the literature on the EM\nalgorithm, it is suf\ufb01cient to move to a value of that merely improves that observed-data objective\nfunction. We have found that it is much faster to take a single step of the conjugate conjugate-\ngradient algorithm (in which case in will be important to check for improvement over the previous\niteration); see, e.g. [37] for details.\n\n3.3 Online EM\n\nFor very large data sets, the above batch algorithm may be too slow. In such cases, we recommend\ncomputing the MAP estimate via an online EM algorithm [38], as follows. Suppose that our current\nestimate of the parameter is (t1), and that the current estimate of the complete-data log posterior\nis\n\nT S(t1) + T d(t1) + log p() ,\n\n(6)\n\nwhere\n\nQ() = \n\n1\n2\n\n\u02c6!ixixT\ni\n\n\uf8ffixi ,\n\nd(t1) =\n\nS(t1) =\n\nt1Xi=1\nt1Xi=1\n\u02c6!t = E(!t | yt, (t1)) =\u2713 \uf8fft\n\n t\u25c6 tanh( t/2) ,\n\nrecalling that \uf8ffi = (yi \u21e0)/2. After observing new data (yt, xt), we \ufb01rst compute the expected\nvalue of !t as\n\nwith t = xT\nthe suf\ufb01cient statistics recursively as\n\nt (t1) denoting the linear predictor evaluated at the current estimate. We then update\n\nS(t) = (1 t)S(t1) + t \u02c6!txtxT\nd(t) = (1 t)d(t1) + t\uf8fftxt ,\n\nt\n\nwhere t is the learning rate. We then plug these updated suf\ufb01cient statistics into (6), and solve the\nM step to move to a new value of . The data can also be processed in batches of size larger than 1,\nwith obvious modi\ufb01cations to the updates for S(t) and d(t); we have found that batch sizes of order\npp tend to work well, although we are unaware of any theory to support this choice.\nIn high-dimensional problems, the usual practice is to impose sparsity via an `1 penalty on the\nregression coef\ufb01cients, leading to a lasso-type prior. In this case, the M-step in the online algorithm\ncan be solved very ef\ufb01ciently using the modi\ufb01ed shooting algorithm, a coordinate-descent method\ndescribed in a different context by [39] and [40].\nThis online EM is guaranteed to converge to a stationary point of the log posterior distribution if the\nlearning rate decays in time such thatP1t=1 t = 1 andP1t=1 2\nt < 1. (If the penalty function is\nconcave and \u21e0 is \ufb01xed, then this stationary point will be the global maximum.) A simple choice for\nthe learning rate is t = 1/ta for a 2 (0.5, 1), with a = 0.7 being our default choice.\n4 Factor analysis for negative-binomial spiking\n\nLet t = ( t1, . . . , tK) denote a vector of K linear predictors at time t, corresponding to K\ndifferent neurons with observed counts Yt = (yt1, . . . , ytK)T . We propose a dynamic negative-\n\n5\n\n\fbinomial factor model for Yt, with a vector autoregressive (VAR) structure for the latent factors:\n\nytk \u21e0 NB(\u21e0, e tk )\n t = \u21b5 + Bft\nft = ft1 + \u270ft ,\u270f\n\nfor k = 1, . . . K\n\nt \u21e0 N(0,\u2327 2I) .\n\nHere ft denotes an L-vector of latent factors, with L typically much smaller than P . The K \u21e5 L\nfactor-loadings matrix B is restricted to have zeroes above the diagonal, and to have positive diag-\nonal entries. These restrictions are traditional in Bayesian factor analysis [41], and ensure that B\nis formally identi\ufb01ed. We also assume that is a diagonal matrix, and impose conjugate inverse-\ngamma priors on \u2327 2 to ensure that, marginally over the latent factors ft, the entries of t have\napproximately unit variance. Although we do not pursue the point here, the mean term \u21b5 can incor-\nporate the effect of known predictors with no additional complication to the analysis.\nBy exploiting the Polya-Gamma data-augmentation scheme, posterior inference in this model may\nproceed via straightforward Gibbs sampling\u2014something not previously possible for count-data fac-\ntor models. Prior work on latent variable modeling of spike data has relied on either Gaussian\napproximations [2\u20136, 8] or variants of particle \ufb01ltering [10\u201313].\nGibbs sampling proceeds as follows. Conditional upon B and ft, we update the latent variables as\n!tk \u21e0 PG(ytk + \u21e0, Bkft), where Bk denotes the kth row of the loadings matrix. The mean vector\n\u21b5 and factor-loadings matrix B can both be updated in closed-form via a Gaussian draw using the\nfull conditional distributions given in, for example, [42] or [43].\nGiven all latent variables and other parameters of the model, the factors ft can be updated in a single\nblock using the forward-\ufb01lter, backwards-sample (FFBS) algorithm from [32]. First, pass forwards\nthrough the data from y1 to yN, recursively computing the \ufb01ltered moments of ft as\n\nwhere\n\nMt = (V 1\nt + BT \u2326tB)1\nmt = Mt(BT \u2326tzt + V 1\n\nt mt1) ,\n\nVt = Mt1T + \u2327 2I\nzt = (zt1, . . . , ztK)T\n\n,\n\nztk =\n\n\u2326t = diag(!t1, . . . ,! tK) .\n\nytk \u21e0\n2!tk \u21b5k\n\nThen draw fN \u21e0 N(mN , MN ) from its conditional distribution. Finally, pass backwards through\nthe data, sampling ft as (ft | mt, Mt, ft+1) \u21e0 N(at, At), where\n\nA1\n\n= M1\nt\nat = A1\n\nt + \u23272I\nt (M1\n\nt mt + \u23272ft+1) .\n\nThis will result in a block draw of all N \u21e5 L factors from their joint conditional distribution.\n5 Experiments\n\nTo demonstrate our methods, we performed regression and dynamic factor analyses on a dataset of\n27 neurons recorded from primate retina (published in [44] and re-used with authors\u2019 permission).\nBrie\ufb02y, these data consist of spike responses from a simultaneously-recorded population of ON and\nOFF parasol retinal ganglion cells, stimulated with a \ufb02ickering, 120-Hz binary white noise stimulus.\n\n5.1 Regression\n\nFigure 2 shows a comparison of a Poisson model versus a negative-binomial model for each of the\n27 neurons in the retinal dataset. We binned spike counts in 8 ms bins, and regressed against a\ntemporally lagged stimulus, resulting in a 100-element (10 \u21e5 10 pixel) spatial receptive \ufb01eld for\neach neuron. To benchmark the two methods, we created 50 random train/test splits from a full\ndataset of 30,000 points, with 7,500 points held out for validation. Using each training set, we used\n\n6\n\n\fFigure 2: Boxplots of improvement in held-out log likelihoods (NB versus Poisson regression) for\n50 train/test splits on each of the 27 neurons in the primate retinal data.\n\nour online maximum-likelihood method to \ufb01t an NB model to each of the 27 neurons, and then used\nthese models to compute held-out log-likelihoods on the test set versus a standard Poisson GLM.\nAs Figure 2 shows, the NB model has a higher average held-out log-likelihood than the Poisson\nmodel. In some cases it is dozens of orders of magnitude better (as in neurons 12\u201314 and 22\u201327),\nsuggesting that there is substantial over-dispersion in the data that is not faithfully captured by the\nPoisson model. We emphasize that this is a \u201cweak-signal\u201d regime, and that overdispersion is likely\nto be less when the signal is stronger. Yet these results suggest, at the very least, that many of these\nneurons have marginal distributions that are quite far from Poisson. Moreover, regardless of the\nunderlying signal strength, the regression problem can be handled quite straightforwardly using our\nonline method, even in high dimensions, without settling for the restrictive Poisson assumption.\n\n5.2 Dynamic factor analysis\n\nTo study the factor-modeling framework, we conducted parallel experiments on both simulated and\nreal data. First, we simulated two different data sets comprising 1000 time points and 11 neurons,\neach from a two-factor model: one with high factor autocorrelation (= 0 .98), and one with low\nfactor autocorrelation (= 0 .5). The two questions of interest here are: how well does the fully\nBayesian method reconstruct the correlation structure among the unobserved rate parameters tk;\nand how well does it distinguish between a high-autocorrelation and low-autocorrelation regime in\nthe underlying low-dimensional representation?\nThe results in Figure 3 suggest that the results, on both counts, are highly accurate. It is especially\ninteresting to compare the left-most column of Figure 3 with the actual cross-sectional correlation\nof t, the systematic component of variation, in the second column. The correlation of the raw\ncounts yt show a dramatic attenuation effect, compared to the real latent states. Yet this structure is\nuncovered easily by the model, with together with a full assessment of posterior uncertainty. The\napproach behaves much like a model-based version of principal-components analysis, appropriate\nfor non-Gaussian data.\nFinally, Figure 4 shows the results of \ufb01tting a two-factor model to the primate retinal data. We\nare able to uncover latent structure in the data in a completely unsupervised fashion. As with the\nsimulated data, it is interesting to compare the correlation of the raw counts yt with the estimated\ncorrelation structure of the latent states. There is also strong support for a low-autocorrelation regime\nin the factors, in light of the posterior mean factor scores depicted in the right-most pane.\n\n6 Discussion\n\nNegative-binomial models have only recently been explored in systems neuroscience, despite their\nfavorable properties for handling data with larger-than-Poisson variation. Likewise, Bayesian infer-\nence for the negative binomial model has traditionally been a dif\ufb01cult problem, with the existence\nof a fully automatic Gibbs sampler only recently discovered [19]. Our paper has made three spe-\nci\ufb01c contributions to this literature. First, we have shown that negative-binomial models can lead to\n\n7\n\nNeuronIncrease in Held-Out Log Likelihood123456789101112131415161718192021222324252627020406080100120140\fn\no\nr\nu\ne\nN\n\nn\no\nr\nu\ne\nN\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n\nCorrelation Among Spike Counts\n\nActual Correlation Among Latent States\n\nEstimated Correlation Among Latent States\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nCorrelation Among Spike Counts\n\nActual Correlation Among Latent States\n\nEstimated Correlation Among Latent States\n\nFigure 3: Results for two simulated data sets with high factor autocorrelation (top row) and low\nfactor autocorrelation (bottom row). The three left-most columns show the raw correlation among\nt ), of the latent states; and the posterior mean estimator\nthe counts yt; the actual correlation, E( t T\nfor the correlation of the latent states. The right-most column shows the simulated spike trains for\nthe 11 neurons, along with the factors ft in blue (with 75% credible intervals), plotted over time.\n\nn\no\nr\nu\ne\nN\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nIndex\n\nCorrelation among spike counts\n\nEstimated correlation of latent states\n\nSpike counts\n\nPosterior mean factor scores\n\nFigure 4: Results for factor analysis of the primate retinal data.\n\nsubstantial improvements in \ufb01t, compared to the Poisson, for neural data exhibiting over-dispersion.\nSuch models can be \ufb01t straightforwardly via MCMC for a wide class of prior distributions over\nmodel parameters (including sparsity-inducing choices, such as the lasso). Second, we have pro-\nposed a novel online-EM algorithm for sparse NB regression. This algorithm inherits all the con-\nvergence properties of EM, but is scalable to extremely large data sets. Finally, we have embedded\na dynamic factor model inside a negative-binomial likelihood. This latter approach can be extended\nquite easily to spatial interactions, more general state-space models, or mixed models incorporating\nboth regressors and latent variables. All of these extensions, as well as the model-selection question\n(how many factors?) form promising areas for future research.\n\nAcknowledgments\n\nWe thank E. J. Chichilnisky, A. M. Litke, A. Sher and J. Shlens for retinal data, J. Windle for\nPG sampling code, and J. H. Macke for helpful comments. This work was supported by a Sloan\nResearch Fellowship, McKnight Scholar\u2019s Award, and NSF CAREER Award IIS-1150186 (JP).\n\n8\n\n\fReferences\n[1] Roland Baddeley, L. F. Abbott, Michael C. A. Booth, Frank Sengpiel, Tobe Freeman, Edward A. Wake-\nman, and Edmund T. Rolls. Proceedings of the Royal Society of London. Series B: Biological Sciences,\n264(1389):1775\u20131783, 1997.\n\n[2] E. Brown, L. Frank, D. Tang, M. Quirk, and M. Wilson. Journal of Neuroscience, 18:7411\u20137425, 1998.\n[3] L. Srinivasan, U. Eden, A. Willsky, and E. Brown. Neural Computation, 18:2465\u20132494, 2006.\n[4] B. M. Yu, J. P. Cunningham, G. Santhanam, S. I. Ryu, K. V. Shenoy, and M. Sahani. Journal of Neuro-\n\nphysiology, 102(1):614, 2009.\n\n[5] W. Wu, J.E. Kulkarni, N.G. Hatsopoulos, and L. Paninski. Neural Systems and Rehabilitation Engineer-\n\ning, IEEE Transactions on, 17(4):370\u2013378, 2009.\n\n[6] Liam Paninski, Yashar Ahmadian, Daniel Gil Ferreira, Shinsuke Koyama, Kamiar Rahnama Rad, Michael\n\nVidne, Joshua Vogelstein, and Wei Wu. J Comput Neurosci, Aug 2009.\n\n[7] J. W. Pillow, Y. Ahmadian, and L. Paninski. Neural Comput, 23(1):1\u201345, Jan 2011.\n[8] M Vidne, Y Ahmadian, J Shlens, J W Pillow, J Kulkarni, A M Litke, E J Chichilnisky, E P Simoncelli,\n\nand L Paninski. J. Computational Neuroscience, pages 1\u201325, 2012. To appear.\n\n[9] John P. Cunningham, Krishna V. Shenoy, and Maneesh Sahani. Proceedings of the 25th international\n\nconference on Machine learning, ICML \u201908, pages 192\u2013199, New York, NY, USA, 2008. ACM.\n\n[10] A. E. Brockwell, A. L. Rojas, and R. E. Kass. J Neurophysiol, 91(4):1899\u20131907, Apr 2004.\n[11] S. Shoham, L. Paninski, M. Fellows, N. Hatsopoulos, J. Donoghue, and R. Normann. IEEE Transactions\n\non Biomedical Engineering, 52:1312\u20131322, 2005.\n\n[12] Ayla Ergun, Riccardo Barbieri, Uri T. Eden, Matthew A. Wilson, and Emery N. Brown.\n\nIEEE Trans\n\nBiomed Eng, 54(3):419\u2013428, Mar 2007.\n\n[13] A. E. Brockwell, R. E. Kass, and A. B. Schwartz. Proceedings of the IEEE, 95:1\u201318, 2007.\n[14] R. P. Adams, I. Murray, and D. J. C. MacKay. Proceedings of the 26th Annual International Conference\n\non Machine Learning. ACM New York, NY, USA, 2009.\n\n[15] Y. Ahmadian, J. W. Pillow, and L. Paninski. Neural Comput, 23(1):46\u201396, Jan 2011.\n[16] M.C. Teich and W.J. McGill. Physical Review Letters, 36(13):754\u2013758, 1976.\n[17] Arno Onken, Steffen Grnewlder, Matthias H. J. Munk, and Klaus Obermayer. PLoS Comput Biol,\n\n5(11):e1000577, 11 2009.\n\nLake City, Utah, February 2012.\n\n[18] R Goris, E P Simoncelli, and J A Movshon. Computational and Systems Neuroscience (CoSyNe), Salt\n\n[19] N.G. Polson, J.G. Scott, and J. Windle. Arxiv preprint arXiv:1205.0310, 2012.\n[20] P. L\u00b4ansk`y and J. Vaillant. Biosystems, 58(1):27\u201332, 2000.\n[21] V. Ventura, C. Cai, and R.E. Kass. Journal of neurophysiology, 94(4):2928\u20132939, 2005.\n[22] Neural Comput, 18(11):2583\u20132591, Nov 2006.\n[23] A.M. Skene and J. C. Wake\ufb01eld. Statistics in Medicine, 9:919\u201329, 1990.\n[24] J. Carlin. Statistics in Medicine, 11:141\u201358, 1992.\n[25] Eric T. Bradlow, Bruce G. S. Hardie, and Peter S. Fader. Journal of Computational and Graphical Statis-\n\ntics, 11(1):189\u2013201, 2002.\n\n[26] A. Gelman, A. Jakulin, M.G. Pittau, and Y. Su. The Annals of Applied Statistics, 2(4):1360\u201383, 2008.\n[27] A. Dobra, C. Tebaldi, and M. West. Journal of Statistical Planning and Inference, 136(2):355\u201372, 2006.\n[28] James H. Albert and Siddhartha Chib. Journal of the American Statistical Association, 88(422):669\u201379,\n\n1993.\n\n[29] M. Bethge and P. Berens. Advances in neural information processing systems, 20:97\u2013104, 2008.\n[30] C. Holmes and L. Held. Bayesian Analysis, 1(1):145\u201368, 2006.\n[31] Luc Devroye. Statistics & Probability Letters, 79(21):2251\u20139, 2009.\n[32] Chris Carter and Robert Kohn. Biometrika, 81(541-53), 1994.\n[33] Trevor Park and George Casella. Journal of the American Statistical Association, 103(482):681\u20136, 2008.\n[34] Chris M. Hans. Biometrika, 96(4):835\u201345, 2009.\n[35] Carlos M. Carvalho, Nicholas G. Polson, and James G. Scott. Biometrika, 97(2):465\u201380, 2010.\n[36] Mingyuan Zhou, Lingbo Li, David Dunson, and Lawrence Carin. International Conference on Machine\n\nLearning (ICML), 2012.\n\nhttp://arxiv.org/abs/1103.5407v3, 2011.\n\n[37] Nicholas G. Polson and James G. Scott.\n\nTechnical\n\nreport, University of Texas at Austin,\n\n[38] O. Capp\u00b4e and E. Moulines. Journal of the Royal Statistical Society (Series B), 71(3):593\u2013613, 2009.\n[39] Suhrid Balakrishnan and David Madigan. Journal of Machine Learning Research, 9:313\u201337, 2008.\n[40] Liang Sun and James G. Scott. Technical report, University of Texas at Austin, 2012.\n[41] H. Lopes and M. West. Statistica Sinica, 14:41\u201367, 2004.\n[42] Joyee Ghosh and David B. Dunson. Journal of Computational and Graphical Statistics, 18(2):306\u201320,\n\n[43] P.R. Hahn, Carlos M. Carvalho, and James G. Scott. Journal of the Royal Statistical Society, Series C,\n\n[44] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, and E. P. Chichilnisky, E. J. Simoncelli. Nature,\n\n2009.\n\n2012.\n\n454:995\u2013999, 2008.\n\n9\n\n\f", "award": [], "sourceid": 942, "authors": [{"given_name": "Jonathan", "family_name": "Pillow", "institution": ""}, {"given_name": "James", "family_name": "Scott", "institution": ""}]}