{"title": "Scaling the Poisson GLM to massive neural datasets through polynomial approximations", "book": "Advances in Neural Information Processing Systems", "page_first": 3517, "page_last": 3527, "abstract": "Recent advances in recording technologies have allowed neuroscientists to record simultaneous spiking activity from hundreds to thousands of neurons in multiple brain regions. Such large-scale recordings pose a major challenge to existing statistical methods for neural data analysis. Here we develop highly scalable approximate inference methods for Poisson generalized linear models (GLMs) that require only a single pass over the data. Our approach relies on a recently proposed method for obtaining approximate sufficient statistics for GLMs using polynomial approximations [Huggins et al., 2017], which we adapt to the Poisson GLM setting. We focus on inference using quadratic approximations to nonlinear terms in the Poisson GLM log-likelihood with Gaussian priors, for which we derive closed-form solutions to the approximate maximum likelihood and MAP estimates, posterior distribution, and marginal likelihood. We introduce an adaptive procedure to select the polynomial approximation interval and show that the resulting method allows for efficient and accurate inference and regularization of high-dimensional parameters. We use the quadratic estimator to fit a fully-coupled Poisson GLM to spike train data recorded from 831 neurons across five regions of the mouse brain for a duration of 41 minutes, binned at 1 ms resolution. Across all neurons, this model is fit to over 2 billion spike count bins and identifies fine-timescale statistical dependencies between neurons within and across cortical and subcortical areas.", "full_text": "Scaling the Poisson GLM to massive neural datasets\n\nthrough polynomial approximations\n\nDavid M. Zoltowski\n\nPrinceton Neuroscience Institute\n\nPrinceton University; Princeton, NJ 08544\n\nzoltowski@princeton.edu\n\nJonathan W. Pillow\n\nPrinceton Neuroscience Institute & Psychology\n\nPrinceton University; Princeton, NJ 08544\n\npillow@princeton.edu\n\nAbstract\n\nRecent advances in recording technologies have allowed neuroscientists to record\nsimultaneous spiking activity from hundreds to thousands of neurons in multiple\nbrain regions. Such large-scale recordings pose a major challenge to existing\nstatistical methods for neural data analysis. Here we develop highly scalable\napproximate inference methods for Poisson generalized linear models (GLMs)\nthat require only a single pass over the data. Our approach relies on a recently\nproposed method for obtaining approximate suf\ufb01cient statistics for GLMs using\npolynomial approximations [7], which we adapt to the Poisson GLM setting.\nWe focus on inference using quadratic approximations to nonlinear terms in the\nPoisson GLM log-likelihood with Gaussian priors, for which we derive closed-form\nsolutions to the approximate maximum likelihood and MAP estimates, posterior\ndistribution, and marginal likelihood. We introduce an adaptive procedure to\nselect the polynomial approximation interval and show that the resulting method\nallows for ef\ufb01cient and accurate inference and regularization of high-dimensional\nparameters. We use the quadratic estimator to \ufb01t a fully-coupled Poisson GLM to\nspike train data recorded from 831 neurons across \ufb01ve regions of the mouse brain\nfor a duration of 41 minutes, binned at 1 ms resolution. Across all neurons, this\nmodel is \ufb01t to over 2 billion spike count bins and identi\ufb01es \ufb01ne-timescale statistical\ndependencies between neurons within and across cortical and subcortical areas.\n\n1\n\nIntroduction\n\nThe Poisson GLM is a standard model of neural encoding and decoding that has proved useful\nfor characterizing heterogeneity and correlations in neuronal populations [12, 19, 23, 15, 13]. As\nnew large-scale recording technologies such as the Neuropixels probe are generating simultaneous\nrecordings of spiking activity from hundreds or thousands of neurons [8, 21, 4], Poisson GLMs\nwill be a useful tool for investigating encoding and statistical dependencies within and across brain\nregions. However, the size of these datasets makes inference computationally expensive. For example,\nit may not be possible to store the design matrix and data in local memory.\nIn this work, we develop scalable approximate inference methods for Poisson GLMs to analyze\nsuch data. Our approach follows from the polynomial approximate suf\ufb01cient statistics for GLMs\nframework (PASS-GLM) developed in [7], which allows for inference to be performed with only\na single pass over the dataset. This method substantially reduces computation time and storage\nrequirements for inference in Poisson GLMs without sacri\ufb01cing time resolution, as the suf\ufb01cient\nstatistics are computed as sums over time.\nOur speci\ufb01c contributions are the following. Using quadratic approximations to nonlinear terms in\nthe log-likelihood, we derive the closed-form approximate maximum likelihood and MAP estimates\nof the parameters in Poisson GLMs with general link functions and Gaussian priors. We introduce\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: a. The Poisson GLM as a model of spiking activity. b. Quadratic and 4th-order Chebyshev\napproximations to f (x) and log f (x) for f (x) = exp(x) and f (x) = log(1 + exp(x)) over two\nexample intervals. The exponential approximation is over an interval for modeling spikes per second\nand the softplus approximation is over an interval for modeling spikes per bin.\n\na procedure to adaptively select the interval of the quadratic approximation for each neuron. The\nquadratic case is the most scalable PASS-GLM because it has the smallest memory footprint, and we\nfound that adaptive interval selection was necessary to realize these bene\ufb01ts. We also show that fourth\norder approximations are useable for approximating the log-likelihood of Poisson GLMs. Finally,\nwe use the quadratic approximation to derive a fast, closed-form approximation of the marginal\nlikelihood in Poisson GLMs, enabling ef\ufb01cient evidence optimization.\nAfter validating these estimators on simulated spike train data and a spike train recording from a\nprimate retinal ganglion cell, we demonstrate the scalability of these methods by \ufb01tting a fully-\ncoupled GLM to the responses of 831 neurons recorded across \ufb01ve different regions of the mouse\nbrain.\n\n2 Background\n\n2.1 Poisson GLM\n\nThe Poisson GLM in neuroscience is used to identify statistical dependencies between observed\nspiking activity and task-relevant variables such as environmental stimuli and recent spiking activity\nacross neurons (Figure 1a). The model is \ufb01t to binned spike counts yt for t = 1, ..., T with time bin\nsize \u2206. The spike counts are conditionally Poisson distributed given a vector of parameters w and\ntime-dependent vector of covariates xt\n\nyt|xt, w \u223c Poisson(yt; f (x(cid:62)\n\nt w)\u2206).\n\nT(cid:88)\n\n(cid:18)\n\nT(cid:88)\n\n(cid:19)\n\n(1)\n\n(2)\n\nThe log-likelihood of w given the vector of all observed spike counts y is\n\nlog p(y|X, w) =\n\nlog p(yt|xt, w) =\n\nyt log f (x(cid:62)\n\nt w) \u2212 f (x(cid:62)\n\nt w)\u2206\n\nt=1\n\nt=1\n\nwhere we have dropped terms constant in w and the t-th row of the design matrix X is xt. The\nmethods in this paper apply both when using the canonical log link function such that the nonlinearity\nis f (x) = exp(x) and when using alternative nonlinearities.\n\n2.2 Polynomial approximate suf\ufb01cient statistics for GLMs (PASS-GLM)\n\nWith moderate amounts of data, \ufb01rst or second order optimization techniques can be used to quickly\n\ufb01nd point estimates of the parameters w. However, inference can be prohibitive for large datasets,\nas each evaluation of the log-likelihood requires passing through the entire design matrix. The\nauthors of [7] recently described a powerful approach to overcome these limitations using polynomial\napproximations in GLM log-likelihoods. This method leads to approximate suf\ufb01cient statistics that\ncan be computed in a single pass over the dataset, and therefore is called PASS-GLM. Formally, [7]\n\n2\n\nPoissonexpstimulus filtercoupling filterspost-spike filterother neuronsnonlinearityexponential+stimulus-101234-200204060-101234-101234-20240123456-2024-3-2-1012softplusexactapproximationsabobservedspikes2nd4th\fconsiders log-likelihoods that are sums over K functions\n\nK(cid:88)\n\nlog p(yt|xt, w) =\n\nt \u03c6(k)(y\u03b2k\ny\u03b1k\n\nt x(cid:62)\n\nt w \u2212 aky)\n\n(3)\n\nk=1\n\nwhere the values \u03b1k, \u03b2k, ak \u2208 {0, 1} and the functions \u03c6(k) depend on the speci\ufb01c GLM. In a Poisson\nGLM with nonlinearity f (x), we have K = 2 with \u03c6(1)(x) = log f (x), \u03c6(2)(x) = f (x), \u03b11 = 1,\nand the other values set to zero. When f (x) = exp(x), the \ufb01rst function simpli\ufb01es to \u03c6(1)(x) = x.\nIn [7], each nonlinear \u03c6(k) is approximated with an M-th order polynomial \u03c6M\n(k) using a basis of\northogonal Chebyshev polynomials [11], and the authors show that this approximation leads to a\nlog-likelihood that is simply a sum over monomial terms. They provide theoretical guarantees on the\nquality of the MAP estimates using this method and on the quality of posterior approximations for\nspeci\ufb01c cases, including the Poisson GLM with an exponential nonlinearity. We use this approach to\nextend inference in Poisson GLMs to massive neural datasets.\n\n2.3 Computing Chebyshev polynomial approximations\n\nover [x0, x1] given by f (x) = (cid:80)\u221e\norder M and collecting terms, we obtain an approximation f (x) \u2248(cid:80)M\n\nWe use systems of orthogonal Chebyshev polynomials to compute polynomial approximations. Here,\nwe describe our procedure for computing the coef\ufb01cients of an M-th order polynomial approximation\nto a function f (x) over the interval [x0, x1]. We assume that f (x) has a Chebyshev expansion\nm=0 cmTm, where cm are coef\ufb01cients and Tm is the degree-m\nChebyshev polynomial of the \ufb01rst kind over [x0, x1] [11]. By truncating this expansion at the desired\nm=0 amxm. We note that the\ncoef\ufb01cients am for m = 0, ..., M can be estimated by solving a weighted least-squares problem, by\nminimizing the squared error over a grid of points on [x0, x1] between f (x) and an approximation\n1\u2212x2 over [\u22121, 1] and is\n\u02c6f (x) with monomial basis functions. The weighting function is w(x) = 1\u221a\nmapped to general intervals [x0, x1] via a change of variables.\n\n2.4 Related work\n\nAn alternative approach for ef\ufb01cient inference in Poisson GLMs is the expected log-likelihood\napproximation [14, 17], which replaces the nonlinear exponential term in the log-likelihood with\nits expectation across data points. This is justi\ufb01ed using knowledge of the covariance structure of\nthe stimulus or by arguments invoking the central limit theorem. The bene\ufb01ts of the polynomial\napproximation approach are that it applies to arbitrary stimulus and covariate distributions, it does\nnot inherently require large amounts of data, and it can tradeoff storage costs with higher order\napproximations for increased accuracy.\nImportantly, in contrast to the expected log-likelihood\napproximation, the approach in this paper is easily extended to non-canonical link functions.\n\n3 Polynomial approximations for Poisson GLMs\n\n3.1 Quadratic approximation to exponential nonlinearity\n\nWe \ufb01rst apply the polynomial approximation framework to Poisson GLMs with an exponential\nnonlinearity using a quadratic approximation. With the canonical link function, the log f (x) term in\nthe log-likelihood is linear in the parameters and we only need to approximate the nonlinear term\nf (x) = exp(x)\u2206. We approximate this term as\n\nexp(x)\u2206 \u2248 a2x2 + a1x + a0\n\n(4)\n\nwhere the coef\ufb01cients a2, a1, and a0 are computed using a Chebyshev polynomial approximation\nover the interval [x0, x1] using the methods described in Section 2.3. We currently consider [x0, x1]\nto be a \ufb01xed approximation interval and in Section 4.1 we discuss selection of this interval. Example\n\n3\n\n\flog p(y|X, w) =\n\nytx(cid:62)\n\nt w \u2212 exp(x(cid:62)\n\nt w)\u2206\n\n(cid:19)\n\n(cid:18)\n(cid:18)\n\nT(cid:88)\n\u2248 T(cid:88)\n\nt=1\n\n(cid:19)\n\n(5)\n\n(6)\n\napproximations are shown in Figure 1b. We use this approximation to rewrite the log-likelihood as\n\nytx(cid:62)\n\nt w \u2212 a2(x(cid:62)\n\nt w)2 \u2212 a1(x(cid:62)\n= w(cid:62)X(cid:62)(y \u2212 a1) \u2212 w(cid:62)a2X(cid:62)Xw\n\nt=1\n\nt w) \u2212 a0\n\ndepend on w. This form has approximate suf\ufb01cient statistics(cid:80)T\n\n(7)\nwhere a1 is a vector with each element equal to a1 and throughout we have dropped terms that do not\nt=1 xtx(cid:62)\nt .\nThe approximate log-likelihood is a quadratic function in w and therefore is amenable to analytic\ninference. First, the closed-form maximum likelihood (ML) estimate of the parameters is\n\nt=1 ytxt, and(cid:80)T\n\nt=1 xt,(cid:80)T\n\n(8)\nNext, with a zero-mean Gaussian prior on w with covariance C such that w \u223c N (0, C), the\napproximate MAP estimate and posterior distribution are\n\n\u02c6wmle\u2212pa2 = (2a2X(cid:62)X)\u22121X(cid:62)(y \u2212 a1).\n\n\u02c6wmap\u2212pa2 = (2a2X(cid:62)X + C\u22121)\u22121X(cid:62)(y \u2212 a1)\n\np(w|X, y, C) \u2248 N (w; \u03a3X(cid:62)(y \u2212 a1), \u03a3)\n\n(9)\n(10)\nwhere \u03a3 = (2a2X(cid:62)X + C\u22121)\u22121 is the approximate posterior covariance. This enables ef\ufb01cient\nusage of a host of Bayesian regularization techniques. In our experiments, we implement ridge\nregression with C\u22121 = \u03bbI, Bayesian smoothing with C\u22121 = \u03bbL where L is the discrete Laplacian\noperator [16, 14], and automatic relevance determination (ARD) with C\u22121\nii = \u03bbi [10, 22, 18, 25]. In\nsection (4.2), we introduce a fast approximate evidence optimization scheme for the Poisson GLM to\noptimize parameters in these priors.\n\n3.2 Extension to non-canonical link functions\n\nNonlinearities such as the softplus function f (x) = log(1 + exp(x)) are often used in Poisson GLMs.\nWe extend the above methods to general nonlinearities f (x) by approximating both terms involving\nf (x(cid:62)\n\nt w) in the log-likelihood\n\n(11)\n(12)\nBoth sets of coef\ufb01cients are computed using Chebyshev polynomials over the same interval [x0, x1].\nThe approximate log-likelihood is\n\nt w + a0\nt w + b0.\n\nf (x(cid:62)\nlog f (x(cid:62)\n\nt w)\u2206 \u2248 a2(x(cid:62)\nt w) \u2248 b2(x(cid:62)\n\nt w)2 + a1x(cid:62)\nt w)2 + b1x(cid:62)\n\nyt(b2(x(cid:62)\n\nt w)2 + b1x(cid:62)\n\nt w) \u2212 (a2(x(cid:62)\n\nt w)2 + a1x(cid:62)\n\nt w)\n\n(13)\n\nt=1\n\nWith non-canonical\n\n= w(cid:62)X(cid:62)(b1y \u2212 a1) \u2212 w(cid:62)(a2X(cid:62)X \u2212 b2X(cid:62) diag(y)X)w.\n\n(cid:80)T\nt=1 ytxtx(cid:62)\n\n(14)\nlink functions we have one additional approximate suf\ufb01cient statistic\nt . As in the previous section, we can solve this equation to get closed form approxima-\ntions for the maximum likelihood and MAP estimates of w and posterior over w. In particular, with\na N (0, C) prior on w the MAP estimate (and posterior mean) is \u02c6wmap\u2212pa2 = \u03a3X(cid:62)(b1y \u2212 a1)\nwhere \u03a3 = (2a2X(cid:62)X \u2212 2b2X(cid:62) diag(y)X + C\u22121)\u22121 is the posterior covariance.\n\n3.3 Higher order approximations\n\nThe approximation accuracy increases with the order of the polynomial approximation (Figure 1b).\nHere, we investigate higher order approximations and return to the exponential nonlinearity. Unfortu-\nnately, a third order approximation of the exponential over intervals of interest leads to a negative\nleading coef\ufb01cient, and therefore makes optimization of the log-likelihood trivial by increasing the\ninner product x(cid:62)w to in\ufb01nity. However, a fourth order approximation is useable, which is in contrast\n\n4\n\nlog p(y|X, w) \u2248 T(cid:88)\n\n\f(cid:18)\n\nt=1\n\n(cid:19)\n\nytx(cid:62)\n\nwith logistic regression, for which a sixth order approximation was the next useable order [7]. We\napproximate the exponential using a fourth order polynomial over [x0, x1]\nexp(x)\u2206 \u2248 a4x4 + a3x3 + a2x2 + a1x + a0\n\n(15)\n\nusing Chebyshev polynomials and compute the approximate log-likelihood\n\nlog p(y|X, w) \u2248 T(cid:88)\nwhere \u00af\u00d7n is the tensor n-mode vector product and X 3 =(cid:80)T\n\n= w(cid:62)X(cid:62)(y \u2212 a1) \u2212 w(cid:62)a2X(cid:62)Xw\n\nt w)4 \u2212 a3(x(cid:62)\n\nt w \u2212 a4(x(cid:62)\n\n\u2212 a3X 3 \u00af\u00d71w \u00af\u00d72w \u00af\u00d73w \u2212 a4X 4 \u00af\u00d71w \u00af\u00d72w \u00af\u00d73w \u00af\u00d74w\n\n(17)\nt=1 xt\u25e6xt\u25e6\nxt \u25e6 xt are the third and fourth order moment tensors summed across data points. The approximate\nsuf\ufb01cient statistics are the third and fourth order moment tensors in addition to those from the quadratic\napproximation. We note that computing and storing these higher order moments is expensive and\nwe can no longer analytically compute the maximum likelihood and MAP solutions. Once the\napproximate suf\ufb01cient statistics are computed, point estimates of the parameters can be identi\ufb01ed via\noptimization of the paGLM-4 objective.\n\nt=1 xt\u25e6xt\u25e6xt and X 4 =(cid:80)T\n\nt w)3 \u2212 a2(x(cid:62)\n\nt w)2 \u2212 a1(x(cid:62)\n\nt w)\n\n(16)\n\n4 Optimizing hyperparameters\n\n4.1 Approximation interval selection\n\nWhen using quadratic approximations for Poisson GLMs we found that the parameter estimates were\nsensitive to the approximation interval [x0, x1], especially when using the exponential nonlinearity\n(Figure 3a,b). Further, different approximation intervals will be appropriate for different nonlinearities,\nbin sizes, and neurons (Figure 3). We therefore found it crucial to adaptively select the approximation\ninterval for each neuron and we provide a procedure to accomplish this.\nWe \ufb01rst generated a set of putative approximation intervals based on the nonlinearity and bin size,\nwhich determine the expected centers and lengths of the approximation intervals. For example,\nwith f (x) = exp(x)\u2206 the output of exp(x) is a rate in spikes per second, so the approximation\nintervals should be in the range x = \u22124 to x = 6, depending on the response properties of the\nneuron. We found that approximation intervals with lengths 4 through 8 provided a balance between\napproximation accuracy and coverage of a desired input range for the exponential nonlinearity, while\nwider intervals could be used for the softplus nonlinearity.\nFor each interval in this set, we computed the approximate ML or MAP estimate of the parameters. We\nthen computed the exact log-likelihood of the estimate given a random subset of training data, whose\nsize was small enough to store in memory. We selected the approximation interval that maximized\nthe log-likelihood of the random subset of data. We emphasize that different approximation intervals\ncan be ef\ufb01ciently tested in the quadratic case as this only requires solving a least-squares problem for\neach approximation interval, and the subset of data can be stored during the single pass through the\ndataset. We note that cross-validation could also be used to select the approximation interval.\nEmpirically, this procedure provided large improvements in accuracy of the parameter estimates\nand in the log-likelihood of training and held-out data (Figure 3a,b and Figure 5b). In general, we\nconjecture that procedures to adapt the approximation interval will be useful for other implementations\nof PASS-GLMs, for adapting the approximation interval to different datasets and models and for\nre\ufb01ning the approximation interval post-hoc if the algorithm is making poor predictions in practice.\n\n4.2 Marginal likelihood approximation\n\nWe are often interested in \ufb01tting Poisson GLMs with high-dimensional parameters, correlated\ninput (e.g. arising from naturalistic stimuli), and/or sparse spiking observations. For these reasons,\nregularization of the parameters is important even when the number of spike count observations is\nlarge [18, 5, 20, 6, 14, 3, 9, 1]. In this section, we derive an approximation to the marginal likelihood\nin Poisson GLMs that follows directly from approximating the log-likelihood with a quadratic\npolynomial. The approximation is closed-form such that approximate evidence optimization can be\nperformed ef\ufb01ciently, and we use it to optimize ridge and ARD hyperparameters.\n\n5\n\n\fFigure 2: Simulated data experiment. a. The exact ML and paGLM-2 \ufb01lter estimates are similar to\nthe true \ufb01lters. b. The paGLM-2 estimate provides comparable mean squared error (MSE) to the\nexact ML estimate (left) while the paGLM-2 method shows favorable computational scaling (right).\n\nWe restrict ourselves to the exponential nonlinearity, although the same approach can be used for\nalternative nonlinearities. With a zero-mean Gaussian prior on w and a quadratic approximate\nlog-likelihood as in section (3.1), we recognize that we can analytically marginalize w from the joint\np(y, w|X, C) to obtain the following approximation to the marginal likelihood\n\nlog p(y|X, C) \u2248 1\n2\n\nlog |\u03a3| \u2212 1\n2\n\nlog |C| +\n\n(y \u2212 a1)(cid:62)X\u03a3X(cid:62)(y \u2212 a1)\n\n(18)\nwhere \u03a3 = (2a2X(cid:62)X + C\u22121)\u22121 is the covariance of the approximate posterior and we have\ndropped terms that do not depend on C. It is important to note that evidence optimization for\nPoisson GLMs already requires approximations such as the Laplace approximation and our approach\nis a computationally cheaper alternative, as it does not require a potentially expensive numerical\noptimization to \ufb01nd the MAP estimate. Instead, the approximate marginal likelihood can be directly\noptimized using standard techniques.\n\n1\n2\n\n5 Experiments\n\nThroughout our experiments, we refer to estimates obtained using quadratic approximations by\npaGLM-2, estimates obtained using fourth order approximations by paGLM-4, and estimates obtained\nthrough optimization of the exact log-likelihood by exact.\n\n5.1 Simulated data\n\nWe \ufb01rst tested the approximate maximum likelihood estimates when using a quadratic approximation\nto the exponential function on simulated data. We simulated spike count data from a Poisson GLM\nwith stimulus, post-spike, and coupling \ufb01lters when stimulated with a binary stimulus. The stimulus\n\ufb01lter had 10 weights governing 10 basis functions, the post-spike and coupling \ufb01lters each had 5\nweights governing 5 basis functions, and a bias parameter was included. In a sample dataset, we\nfound that the exact ML and paGLM-2 estimates of the \ufb01lters were similar and close to the true \ufb01lters\n(Figure 2a, 1 million spike count observations). Next, we compared the scaling properties of the\npaGLM-2 and exact ML approaches across 25 simulated data sets at each of 5 different amounts\nof training data. We found that the mean squared error between the true weights and the estimated\nweights decreased as the number of observations increased for both estimators. The optimization time\nscaled more ef\ufb01ciently for paGLM-2 than the exact method (Figure 2b, quasi-Newton optimization\nfor exact vs. solving least squares equation for paGLM-2). We used an approximation interval of\n[0, 3] for each run of this simulation. Interestingly, for smaller amounts of data, the exact ML estimate\nhad an increased mean squared error between the true and \ufb01t parameters, as it sometimes over\ufb01t the\ndata. Empirically, the approximation appears to help regularize this over\ufb01tting.\n\n5.2 Retinal ganglion cell analysis\n\nWe next tested the paGLM-2 estimator using spike train data recorded from a single parasol retinal\nganglion cell (RGC) in response to a full \ufb01eld binary \ufb02icker stimulus binned at 8.66 ms [24]. First,\nwe \ufb01t a Poisson GLM with an exponential nonlinearity, a stimulus \ufb01lter, and a baseline \ufb01ring rate\nto the responses in approximately 144,000 spike count bins. The stimulus \ufb01lter was parameterized\nby a vector of 25 weights which linearly combine the previous 25 bins of the stimulus at each time\npoint, and we set \u2206 = 8.66 ms such that the \ufb01ring rate was in spikes per second. On a grid of\n\n6\n\nMSEexact MLpaGLM-20.511.522.5time (s)-0.200.82stimulustrueexact MLpaGLM-20.020.120.52post-spike0.51.5couplinggaintime (ms)ab0.020.121040105106105106# of obs# of obs\fFigure 3: Analysis of paGLM estimators on RGC data with exponential (a.-d.) or softplus (e.-h.)\nnonlinearities. a. For the exponential nonlinearity with \u2206 = 8.66 ms, comparison of the exact ML\nand paGLM-2 estimates of the stimulus \ufb01lter for different approximation intervals (upper left of each\npanel). The paGLM-2 estimate can be too large or small relative to the exact ML estimate. b. The\ntraining log-likelihood (indicated by color) of the paGLM-2 estimate for different approximation\nintervals. c. The distribution of the inner products between the covariates xt and the exact ML\nestimate wmle. The interval [0, 6] covers most of this distribution. d. The exact ML, paGLM-2, and\npaGLM-4 estimates for the exponential nonlinearity computed with approximation interval [0, 6].\ne.-h. Same as (a.-d.), except for the softplus nonlinearity and with \u2206 = 1 such that the rate is in\nspikes per bin. The approximation interval is [\u22126, 3].\n\napproximation intervals, we computed the paGLM-2 estimate of the parameters and evaluated the\ntraining log-likelihood of the data given the paGLM-2 estimate. The stimulus \ufb01lter estimates and\ntraining log-likelihood varied considerably as a function of the approximation interval (Figure 3a,b).\nIn particular, the paGLM-2 estimate was highly similar to the exact ML estimate for some intervals\nwhile too large or small for other intervals. Adaptive interval selection using either the full dataset\nor a random subset of the data identi\ufb01ed the interval [0, 6], demonstrating the importance of this\napproach. This interval tightly covered the distribution of inner products between the covariates and\nthe exact ML estimate of the weights (Figure 3c). The paGLM-2 and paGLM-4 estimates of the\nstimulus \ufb01lter computed over [0, 6] closely matched the exact ML estimate (Figure 3d).\nThe performance of paGLM-2 was more robust to the approximation interval for the Poisson GLM\nwith a softplus nonlinearity and with \u2206 = 1, such that the rate is in spikes per bin (Figure 3e,f). Due to\nthe change in nonlinearity and in \u2206, the distribution of inner products between the covariates and the\nexact ML estimate shifted to the left and widened (Figure 3g). The interval [\u22126, 3] covered most of\nthis distribution, and the paGLM-2 estimate of the stimulus \ufb01lter computed using this approximation\ninterval was indistinguishable from the exact ML stimulus \ufb01lter (Figure 3h).\nTo investigate the quality of paGLM-2 MAP estimates and to verify our procedure for approximate\nevidence optimization, we increased the dimensionality of the stimulus \ufb01lter to 100 weights and \ufb01t the\n\ufb01lter to a smaller set of 21,000 spike counts binned at \u2206 = 1.66 ms, where the stimulus was upsampled.\nWe used ridge regression to regularize the weights. We selected the approximation interval using\na random subset of the data and optimized the ridge penalty by optimizing the approximate log-\nlikelihood (18). The computation time for this procedure, including computing the \ufb01nal paGLM-2\n\n7\n\ngc0.25060relative frequencytime (ms)estimated filterexactpaGLM-2paGLM-41-2000-100200.23-60relative frequencya[-1,5][-1,6][-1,7][0,5][0,6][0,7][1,5][1,6][1,7]time (ms)exactpaGLM-2-24-2150stimulus filter estimates< -200e3-75e3training log-likelihoodupper limitlower limit 46108-20-42exp e[-7,2][-7,3][-7,4][-6,2][-6,3][-6,4][-5,2][-5,3][-5,4]time (ms)exactpaGLM-2-2150stimulus filter estimatessoftplus20time (ms)-2000-100-110e3-63e3training log-likelihoodupper limitlower limit 2486-8-6-10-4-1.53.5fbhdestimated filterinner productdistribution inner productdistribution exactpaGLM-2\fFigure 4: Analysis of paGLM-2 MAP estimators on RGC data binned at 1.66 ms with adaptive\ninterval selection and evidence optimization in a Poisson GLM with exponential nonlinearity. a. The\napproximate log marginal likelihood with a ridge prior computed via the Laplace approximation\nor via the quadratic approximation (18) (left) and the exact and paGLM-2 MAP estimates with the\noptimal ridge prior value (right). b. The exact and paGLM-2 MAP estimates with a smoothing prior.\n\nMAP estimate, was 0.7 seconds, compared to 0.7 seconds for computing the exact MAP once and\n30 seconds for identifying the optimal ridge parameter using the Laplace approximation. The two\nmethods provided similar estimates of the marginal likelihood and MAP stimulus \ufb01lter (Figure 4a).\nFinally, we found that the exact MAP and paGLM-2 MAP estimates computed with a Bayesian\nsmoothing prior were also similar (Figure 4b).\n\n5.3 Fully-coupled GLM \ufb01t to 831 neurons\n\nn=1 hn \u2217 yhist(t)\n\nn\n\nn\n\n) where \u00b5 is the baseline log \ufb01ring rate and yhist(t)\n\ntime t was \u03bbt = exp(\u00b5 +(cid:80)N\n\nWe \ufb01t a fully-coupled Poisson GLM to the spiking responses of N = 831 neurons simultaneously\nrecorded from the mouse thalamus, visual cortex, hippocampus, striatum, and motor cortex using two\nNeuropixels probes [8]. These responses were recorded during spontaneous activity for 46 minutes.\nTo maintain precise spike-timing information, we binned the data at 1 ms (Figure 5a). For each\nneuron, the GLM consisted of a baseline rate parameter, a post-spike \ufb01lter, and coupling \ufb01lters from\nall other neurons. We used an exponential nonlinearity such that the \ufb01ring rate \u03bbt of a neuron at\nis\nthe spike train history of the n-th neuron at time t. We parametrized each \ufb01lter hn as a weighted\ncombination of three raised cosine bumps [15]. For each neuron we \ufb01t in total 2494 parameters for\nthe baseline rate, post-spike \ufb01lter, and coupling \ufb01lters.\nTo compare the performance of the exact and paGLM-2 MAP estimators on this dataset, we \ufb01rst\nrestricted ourselves to the \ufb01rst 11 minutes of the recording so that we could store the entire dataset and\ndesign matrix in memory. We held out the \ufb01rst minute as a validation set and used the next 10 minutes\nto compute the exact and paGLM-2 MAP estimates with a \ufb01xed ridge prior, as hyperparameter\noptimization was computationally infeasible in the exact MAP case. We used a random subset of\nthe training data to select the approximation interval for each neuron and we computed the exact\nMAP estimates using 50 iterations of quasi-Newton optimization. We performed this analysis on\n50 randomly selected neurons with \ufb01ring rates above 0.5 Hz due to the cost of computing the exact\nMAP. On average, the \ufb01tting time was about 3 seconds for the paGLM-2 MAP estimates and about\n3 minutes for the exact MAP estimates (Figure 5b). Despite the computational difference, the two\nestimates provided highly similar training and held-out performance. The paGLM-2 MAP estimates\ncomputed using the adaptive interval outperformed the estimates using a \ufb01xed interval.\nWe then \ufb01t the model to the responses from the \ufb01rst 41 minutes of spiking responses, giving 2.46\nmillion time bins of observations for each neuron. In this case, the design matrix was too large to\nstore in memory (>30 GB). By storing only the approximate summary statistics and one minute of\ndata, we reduced storage costs by a factor of \u224840. We computed paGLM-2 MAP estimates using\nadaptive interval selection and evidence optimization. We placed an ARD prior over each set of 3\ncoupling weights incoming from other neurons, and optimized the ARD hyperparameters using the\n\ufb01xed-point update equations [2, 18]. We found that sometimes this method under-regularized and\ntherefore we thresholded the prior precisions values from below at 26. Fitting the entire model took\nabout 3.6 seconds per neuron.\n\n8\n\nlog marginal likelihoodtime (ms)ridge parametertime (ms)exp MAP (ridge)exp MAP (smoothing)paGLM-2LaplaceabexactpaGLM-2\fFigure 5: Neuropixels recording analysis. a. Raster plot of spikes recorded from motor cortex (MCX),\nstriatum (STR), visual cortex (VCX), hippocampus (HC), and thalamus (TH). b. Computation time\nper neuron (left), training performance (middle), and held-out performance (right) for the exact and\npaGLM-2 MAP estimates \ufb01t to 10 minutes of Neuropixels data, with adaptive and \ufb01xed intervals\nof approximation. The \ufb01xed interval [\u22122, 6] was chosen to tightly cover the optimal approximation\nintervals across neurons. For c.-e., the model was \ufb01t to the \ufb01rst 41 minutes and validated on the last 5\nminutes of responses. c. Histogram of spike prediction accuracy on validation data for neurons with\n\ufb01ring rates greater than 0.5 Hz. d. Example paGLM-2 MAP estimates of post-spike and coupling\n\ufb01lters. e. Coupling matrix with summed MAP estimates of the coupling \ufb01lters before exponentiation.\n\nThe \ufb01t model had positive spike prediction accuracy for 79.6% of neurons (469 out of 589) whose\n\ufb01ring rates were greater than 0.5 Hz in both the training and validation periods (Figure 5c). This\nmeasure quanti\ufb01es the improvement in prediction we obtain from the \ufb01t parameters of the GLM over\npredictions given the mean \ufb01ring rate of the held-out spikes. Coupling \ufb01lters for example neurons\nare shown in Figure 5d. To summarize the coupling \ufb01lters across all of the neurons, we computed a\ncoupling matrix whose i, j-th entry was the summed coupling \ufb01lter before exponentiation for neuron\nj \ufb01t to the spiking responses of neuron i (Figure 5e). The rows of this matrix correspond to the set of\nincoming coupling \ufb01lters for a neuron. We thresholded values with magnitudes larger than 10 (0.05%\nof coupling \ufb01lters) to show the dynamic range. The coupling in the \ufb01t model was often stronger\nwithin regions and between neurons that were anatomically closer to each other, as the neurons in the\ncoupling matrix are sorted by depth on the probe, with the thalamus, hippocampus, and visual cortex\non probe one and the striatum and motor cortex on probe two [8].\n\n6 Conclusion\n\nWe have developed a method for scalable inference in Poisson GLMs that is suitable for large-scale\nneural recordings. 1 The method is based on polynomial approximations for approximate suf\ufb01cient\nstatistics in GLMs [7]. This method substantially reduces storage and computation costs yet retains\nthe ability to model \ufb01ne time-scale statistical dependencies. While we focused on Gaussian priors\nin this paper, optimizing the paGLM objective with non-Gaussian, sparsity-inducing priors is an\ninteresting direction of future work. As the approximate suf\ufb01cient statistics scale with the number of\nparameters, scaling the method to larger numbers of parameters may require low-rank approximations\nto the suf\ufb01cient statistics. Finally, ef\ufb01cient computation and storage of higher order moments will\nmake the more accurate fourth order methods appealing.\n\n1An implementation of paGLM is available at https://github.com/davidzoltowski/paglm.\n\n9\n\nad1 minuteeexample neuronsbexactpaGLM-2090180fitting time (s)10 minutes-10-3exactpaGLM-2adaptivefixed [-2,6]-3-10training log-likelihood / spikeheld-out log-likelihood/spike-10-3exactpaGLM-2-3-10-10010VCXMCXTHHCSTR<-303log-likelihood (bits ber spike)050100150frequencyfiring rate > 0.5 Hzc41 minutesVCXMCXTHHCSTR120213130102030time (ms)13gainVCXMCXTHHCSTRTH (n=244)VCX (n=76)HC (n=68)STR (n=200)MCX (n=243)post-spikeVCXMCXTHHCSTR\fAcknowledgments\n\nDMZ was supported by NIH grant T32MH065214 and JWP was supported by grants from the Simons\nFoundation (SCGB AWD1004351 and AWD543027), the NIH (R01EY017366, R01NS104899) and\na U19 NIH-NINDS BRAIN Initiative Award (NS104648-01). The authors thank Nick Steinmetz for\nsharing the Neuropixels data and thank Rob Kass, Stephen Keeley, Michael Morais, Neil Spencer,\nNick Steinmetz, Nicholas Roy, and the anonymous reviewers for providing helpful comments.\n\nReferences\n[1] Mikio Aoi and Jonathan W Pillow. Scalable bayesian inference for high-dimensional neural receptive\n\n\ufb01elds. bioRxiv, page 212217, 2017.\n\n[2] Christopher M Bishop. Bayesian pca. In Advances in neural information processing systems, pages\n\n382\u2013388, 1999.\n\n[3] Ana Calabrese, Joseph W Schumacher, David M Schneider, Liam Paninski, and Sarah MN Woolley. A\ngeneralized linear model for estimating spectrotemporal receptive \ufb01elds from responses to natural sounds.\nPloS one, 6(1):e16104, 2011.\n\n[4] Jason E Chung, Hannah R Joo, Jiang Lan Fan, Daniel F Liu, Alex H Barnett, Supin Chen, Charlotte\nGeaghan-Breiner, Mattias P Karlsson, Magnus Karlsson, Kye Y Lee, Hexin Liang, Jeremy F Magland,\nAngela C Tooker, Leslie F Greengard, Vanessa M Tolosa, and Loren M Frank. High-density, long-lasting,\nand multi-region electrophysiological recordings using polymer electrode arrays. bioRxiv, 2018.\n\n[5] Stephen V David, Nima Mesgarani, and Shihab A Shamma. Estimating sparse spectro-temporal receptive\n\n\ufb01elds with natural stimuli. Network: Computation in Neural Systems, 18(3):191\u2013212, 2007.\n\n[6] Sebastian Gerwinn, Jakob H Macke, and Matthias Bethge. Bayesian inference for generalized linear\n\nmodels for spiking neurons. Frontiers in computational neuroscience, 4:12, 2010.\n\n[7] Jonathan Huggins, Ryan P Adams, and Tamara Broderick. Pass-glm: polynomial approximate suf\ufb01cient\nstatistics for scalable bayesian glm inference. In Advances in Neural Information Processing Systems,\npages 3614\u20133624, 2017.\n\n[8] James J Jun, Nicholas A Steinmetz, Joshua H Siegle, Daniel J Denman, Marius Bauza, Brian Barbarits,\nAlbert K Lee, Costas A Anastassiou, Alexandru Andrei, \u00c7a\u02d8gatay Ayd\u0131n, et al. Fully integrated silicon\nprobes for high-density recording of neural activity. Nature, 551(7679):232, 2017.\n\n[9] Scott Linderman, Ryan P Adams, and Jonathan W Pillow. Bayesian latent structure discovery from\nmulti-neuron recordings. In Advances in neural information processing systems, pages 2002\u20132010, 2016.\n\n[10] David JC MacKay et al. Bayesian nonlinear modeling for the prediction competition. ASHRAE transactions,\n\n100(2):1053\u20131062, 1994.\n\n[11] John C Mason and David C Handscomb. Chebyshev polynomials. CRC Press, 2002.\n\n[12] Liam Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network:\n\nComputation in Neural Systems, 15(4):243\u2013262, 2004.\n\n[13] Il Memming Park, Miriam LR Meister, Alexander C Huk, and Jonathan W Pillow. Encoding and decoding\nin parietal cortex during sensorimotor decision-making. Nature neuroscience, 17(10):1395\u20131403, 2014.\n\n[14] Il Memming Park and Jonathan W Pillow. Bayesian spike-triggered covariance analysis. In Advances in\n\nneural information processing systems, pages 1692\u20131700, 2011.\n\n[15] Jonathan W Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M Litke, EJ Chichilnisky, and\nEero P Simoncelli. Spatio-temporal correlations and visual signalling in a complete neuronal population.\nNature, 454(7207):995, 2008.\n\n[16] Kamiar Rahnama Rad and Liam Paninski. Ef\ufb01cient, adaptive estimation of two-dimensional \ufb01ring rate\nsurfaces via gaussian process methods. Network: Computation in Neural Systems, 21(3-4):142\u2013168, 2010.\n\n[17] Alexandro D Ramirez and Liam Paninski. Fast inference in generalized linear models via expected\n\nlog-likelihoods. Journal of computational neuroscience, 36(2):215\u2013234, 2014.\n\n[18] Maneesh Sahani and Jennifer F Linden. Evidence optimization techniques for estimating stimulus-response\n\nfunctions. In Advances in neural information processing systems, pages 317\u2013324, 2003.\n\n10\n\n\f[19] Eero P Simoncelli, Liam Paninski, Jonathan Pillow, Odelia Schwartz, et al. Characterization of neural\n\nresponses with stochastic stimuli. The cognitive neurosciences, 3(327-338):1, 2004.\n\n[20] Ian H Stevenson, James M Rebesco, Nicholas G Hatsopoulos, Zach Haga, Lee E Miller, and Konrad P Ko-\nrding. Bayesian inference of functional connectivity and network structure from spikes. IEEE Transactions\non Neural Systems and Rehabilitation Engineering, 17(3):203\u2013213, 2009.\n\n[21] Carsen Stringer, Marius Pachitariu, Nicholas Steinmetz, Charu Bai Reddy, Matteo Carandini, and Ken-\nneth D Harris. Spontaneous behaviors drive multidimensional, brain-wide population activity. bioRxiv,\npage 306019, 2018.\n\n[22] Michael E Tipping. Sparse bayesian learning and the relevance vector machine. Journal of machine\n\nlearning research, 1(Jun):211\u2013244, 2001.\n\n[23] Wilson Truccolo, Uri T Eden, Matthew R Fellows, John P Donoghue, and Emery N Brown. A point\nprocess framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic\ncovariate effects. Journal of neurophysiology, 93(2):1074\u20131089, 2005.\n\n[24] VJ Uzzell and EJ Chichilnisky. Precision of spike trains in primate retinal ganglion cells. Journal of\n\nNeurophysiology, 92(2):780\u2013789, 2004.\n\n[25] David P Wipf and Srikantan S Nagarajan. A new view of automatic relevance determination. In Advances\n\nin neural information processing systems, pages 1625\u20131632, 2008.\n\n11\n\n\f", "award": [], "sourceid": 1803, "authors": [{"given_name": "David", "family_name": "Zoltowski", "institution": "Princeton University"}, {"given_name": "Jonathan", "family_name": "Pillow", "institution": "Princeton University"}]}