{"title": "Unlocking neural population non-stationarities using hierarchical dynamics models", "book": "Advances in Neural Information Processing Systems", "page_first": 145, "page_last": 153, "abstract": "Neural population activity often exhibits rich variability. This variability is thought to arise from single-neuron stochasticity, neural dynamics on short time-scales, as well as from modulations of neural firing properties on long time-scales, often referred to as non-stationarity. To better understand the nature of co-variability in neural circuits and their impact on cortical information processing, we introduce a hierarchical dynamics model that is able to capture inter-trial modulations in firing rates, as well as neural population dynamics. We derive an algorithm for Bayesian Laplace propagation for fast posterior inference, and demonstrate that our model provides a better account of the structure of neural firing than existing stationary dynamics models, when applied to neural population recordings from primary visual cortex.", "full_text": "Unlocking neural population non-stationarity\n\nusing a hierarchical dynamics model\n\nMijung Park1, Gergo Bohner1, Jakob H. Macke2\n\n1 Gatsby Computational Neuroscience Unit, University College London\n2 Research Center caesar, an associate of the Max Planck Society, Bonn\n\nMax Planck Institute for Biological Cybernetics,\n\nBernstein Center for Computational Neuroscience T\u00a8ubingen\n\n{mijung, gbohner}@gatsby.ucl.ac.uk, jakob.macke@caesar.de\n\nAbstract\n\nNeural population activity often exhibits rich variability. This variability can arise\nfrom single-neuron stochasticity, neural dynamics on short time-scales, as well as\nfrom modulations of neural \ufb01ring properties on long time-scales, often referred\nto as neural non-stationarity. To better understand the nature of co-variability in\nneural circuits and their impact on cortical information processing, we introduce\na hierarchical dynamics model that is able to capture both slow inter-trial modula-\ntions in \ufb01ring rates as well as neural population dynamics. We derive a Bayesian\nLaplace propagation algorithm for joint inference of parameters and population\nstates. On neural population recordings from primary visual cortex, we demon-\nstrate that our model provides a better account of the structure of neural \ufb01ring than\nstationary dynamics models.\n\n1\n\nIntroduction\n\nNeural spiking activity recorded from populations of cortical neurons can exhibit substantial vari-\nability in response to repeated presentations of a sensory stimulus [1]. This variability is thought to\narise both from dynamics generated endogenously within the circuit [2] as well as from variations in\ninternal and behavioural states [3, 4, 5, 6, 7]. An understanding of how the interplay between sensory\ninputs and endogenous dynamics shapes neural activity patterns is essential for our understanding\nof how information is processed by neuronal populations. Multiple statistical [8, 9, 10, 11, 12, 13]\nand mechanistic [14] models for characterising neuronal population dynamics have been developed.\nIn addition to these dynamics which take place on fast time-scales (milliseconds up to few seconds),\nthere are also processes modulating neural \ufb01ring activity which take place on much slower time-\nscales (seconds to hours). Slow drifts in rates across an experiment can be caused by \ufb02uctuations in\narousal, anaesthesia level or other physiological properties of the experimental preparation [15, 16,\n17]. Furthermore, processes such as learning and short-term plasticity can lead to slow changes in\nneural \ufb01ring properties [18]. The statistical structure of these slow \ufb02uctuations has been modelled\nusing state-space models and related techniques [19, 20, 21, 22, 23]. Recent experimental \ufb01ndings\nhave shown that slow, multiplicative \ufb02uctuations in neural excitability are a dominant source of\nneural covariability in extracellular multi-cell recordings from cortical circuits [5, 17, 24].\nTo accurately capture the the structure of neural dynamics and to disentangle the contributions of\nslow and fast modulatory processes to neural variability and co-variability, it is therefore important\nto develop models that can capture neural dynamics both on fast (i.e., within experimental trials) and\nslow (i.e., across trials) time-scales. Few such models exist: Czanner et al. [25] presented a statistical\nmodel of single-neuron \ufb01ring in which within-trial dynamics are modelled by (generalised) linear\ncoupling from the recent spiking history of each neuron onto its instantaneous \ufb01ring rate, and across-\ntrial dynamics were modelled by de\ufb01ning a random walk model over parameters. More recently,\n\n1\n\n\fMangion et al [26] presented a latent linear dynamical system model with Poisson observations\n(PLDS, [8, 11, 13]) with a one-dimensional latent space, and used a heuristic \ufb01ltering approach\nfor tracking parameters, again based on a random-walk model. Rabinowitz et al [27] presented\na technique for identifying slow modulatory inputs from the recordings of single neurons using a\nGaussian Process model and an ef\ufb01cient inference technique using evidence optimisation.\nHere, we present a hierarchical model that consists of a latent dynamical system with Poisson ob-\nservations (PLDS) to model neural population dynamics, combined with a Gaussian process (GP)\n[28] to model modulations in \ufb01ring rates or model-parameters across experimental trials. The use\nof an exponential nonlinearity implies that latent modulations have a multiplicative effect on neural\n\ufb01ring rates. Compared to previous models using random walks over parameters, using a GP is a\nmore \ufb02exible and powerful way of modelling the statistical structure of non-stationarity, and makes\nit possible to use hyper-parameters that model the variability and smoothness of parameter-changes\nacross time.\nIn this paper, we focus on a concrete variant of this general model: We introduce a new set of\nvariables which control neural \ufb01ring rate on each trial to capture non-stationarity in \ufb01ring rates.\nWe derive a Bayesian Laplace propagation method for inferring the posterior distributions over the\nlatent variables and the parameters from population recordings of spiking activity. Our approach\ngeneralises the 1-dimensional latent states in [26] to models with multi-dimensional states, as well\nas to a Bayesian treatment of non-stationarity based on Gaussian Process priors. The paper is or-\nganised as follows: In Sec. 2, we introduce our framework for constructing non-stationary neural\npopulation models, as well as the concrete model we will use for analyses. In Sec. 3, we derive\nthe Bayesian Laplace propagation algorithm. In Sec. 4, we show applications to simulated data and\nneural population recordings from visual cortex.\n\n2 Hierarchical non-stationary models of neural population dynamics\n\nWe start by introducing a hierarchical model for capturing short time-scale population dynamics as\nwell as long time-scale non-stationarities in \ufb01ring rates. Although we use the term \u201cnon-stationary\u201d\nto mean that the system is best described by parameters that change over time (which is how the term\nis often used in the context of neural data analysis), we note that the distribution over parameters\ncan be described by a stochastic process which might be strictly stationary in the statistical sense1.\nModelling framework We assume that the neural population activity of p neurons yt \u2208 Rp de-\npends on a k-dimensional latent state xt \u2208 Rk and a modulatory factor h(i) \u2208 Rk which is different\nfor each trial i = {1, . . . , r}. The latent state x models short-term co-variability of spiking activity\nand the modulatory factor h models slowly varying mean \ufb01ring rates across experimental trials.\nWe model neural spiking activity as conditionally Poisson given the latent state xt and a modulator\nh(i), with a log \ufb01ring rate which is linear in parameters and latent factors,\n\nyt|xt, C, h(i), d \u223c Poiss(yt| exp(C(xt + h(i)) + d)),\n\nwhere the loading matrix C \u2208 Rp\u00d7k speci\ufb01es how each neuron is related to the latent state and the\nmodulator, d \u2208 Rp is an offset term that controls the mean \ufb01ring rate of each cell, and Poiss(yt|w)\nmeans that the ith entry of yt is drawn independently from Poisson distribution with mean wi (the\nith entry of w). Because of the use of an exponential \ufb01ring-rate nonlinearity, latent factors have a\nmultiplicative effect on neural \ufb01ring rates, as has been observed experimentally [17, 5].\nFollowing [11, 13, 26], we assume that the latent dynamics evolve according to a \ufb01rst-order autore-\ngressive process with Gaussian innovations,\n\nxt|xt\u22121, A, B, Q \u223c N (xt|Axt\u22121 + But, Q).\n\nHere, we allow for sensory stimuli (or experimental covariates), ut \u2208 Rd to in\ufb02uence the latent\nstates linearly. The dynamics matrix A \u2208 Rk\u00d7k determines the state evolution, B \u2208 Rk\u00d7d models\nthe dependence of latent states on external inputs, and Q \u2208 Rk\u00d7k is the covariance of the innovation\nnoise. We set Q to be the identity matrix, Q = Ik as in [29], and we assume x(i)\n\n0 \u223c N (0, Ik).\n\n1A stochastic process is strict-sense stationary if its joint distribution over any two time-points t and s only\n\ndepends on the elapsed time t \u2212 s.\n\n2\n\n\fFigure 1:\nSchematic of hierarchical non-\nstationary Poisson observation Latent Dynam-\nical System (N-PLDS)\nfor capturing non-\nstationarity in mean \ufb01ring rates. The parameter\nh slowly varies across trials and leads to \ufb02uc-\ntuations in mean \ufb01ring rates.\n\nThe parameters in this model are \u03b8 = {A, B, C, d, h(1:r)}. We refer to this general model as non-\nstationary PLDS (N-PLDS). Different variants of N-PLDS can be constructed by placing priors on\nindividual parameters which allow them to vary across trials (in which case they would then depend\non the trial index i) or by omitting different components of the model2.\nFor the modulator h, we assume that it varies across trials according to a GP with mean mh and\n(modi\ufb01ed) squared exponential kernel, h(i) \u223c GP(mh, K(i, j)), where the (i, j)th block of K (size\n\nk \u00d7 k) is given by K(i, j) = (\u03c32 + \u0001\u03b4i,j) exp(cid:0)\u2212 1\n\n2\u03c4 2 (i \u2212 j)2(cid:1) Ik. Here, we assume the independent\n\nnoise-variance on the diagonal (\u0001) to be constant and small as in [30]. When \u03c32 = \u0001 = 0, the\nmodulator vanishes, which corresponds to the conventional PLDS model with \ufb01xed parameters [11,\n13]. When \u03c32 > 0, the mean \ufb01ring rates vary across trials, and the parameter \u03c4 determines the time-\nscale (in units of \u2018trials\u2019) of these \ufb02uctuations. We impose ridge priors on the model parameters (see\nAppendix for details), so that the total set of hyperparameters of the model is \u03a6 = {mh, \u03c32, \u03c4 2, \u03c6},\nwhere \u03c6 is the set of ridge parameters.\n\n3 Bayesian Laplace propagation\n\nOur goal is to infer parameters and latent variables in the model. The exact posterior distribution\nis analytically intractable due to the use of a Poisson likelihood, and we therefore assume the joint\nposterior over the latent variables and parameters to be factorising,\n\np(\u03b8, x(1:r)\n\n1:T |y(1:r)\n\n1:T , \u03a6) \u221d p(y(1:r)\n\n1:T |x(1:r)\n\n1:T , \u03b8)p(x(1:r)\n\n1:T |\u03b8, \u03a6)p(\u03b8|\u03a6) \u2248 q(\u03b8, x(1:r)\n\n1:T ) = q\u03b8(\u03b8)\n\nqx(x(i)\n\n0:T ).\n\nr(cid:89)\n\ni=1\n\nThis factorisation simpli\ufb01es computing the integrals involved in calculating a bound on the marginal\nlikelihood of the observations,\n\nlog p(y(1:r)\n\n1:T |\u03a6) = log\n(cid:90)\n\n\u2265\n\nd\u03b8 dx(1:r)\n\n1:T p(\u03b8, x(1:r)\n\n1:T , y(1:r)\n\n1:T |\u03a6),\np(\u03b8, x(1:r)\n\n1:T |\u03a6)\n1:T , y(1:r)\nq(\u03b8, x(1:r)\n1:T )\n\nd\u03b8 dx(1:r)\n\n1:T q(\u03b8, x(1:r)\n\n1:T ) log\n\n.\n\n(1)\n\n(cid:90)\n\n(cid:20)(cid:90)\n\n(cid:21)\n1:T |\u03b8)\n\n(cid:21)\n1:T |\u03b8)\n\nSimilar to variational Bayesian expectation maximization (VBEM) algorithm [29], our inference\nprocedure consists of the following three steps: (1) we compute the approximate posterior over\nlatent variables qx(x(1:r)\n\n0:T ) by integrating out the parameters\n\n0:T ) \u221d exp\n(cid:20)(cid:90)\n\nqx(x(1:r)\n\nd\u03b8q\u03b8(\u03b8) log p(x(1:r)\n\n1:T , y(1:r)\n\n,\n\n(2)\n\nwhich is performed by forward-backward message passing relying on the order-1 dependency in\nlatent states. Then, (2) we compute the approximate posterior over parameters q\u03b8(\u03b8) by integrating\nout the latent variables,\n\nq\u03b8(\u03b8) \u221d p(\u03b8) exp\n\ndx(1:r)\n\n0:T qx(x(1:r)\n\n0:T ) log p(x(1:r)\n\n0:T , y(1:r)\n\n,\n\n(3)\n\nand (3) we update the hyperparameters by computing the gradients of the bound on the eq. 1 after\nintegrating out both latent variables and parameters. We iterate the three steps until convergence.\nUnfortunately, the integrals in both eq. 2 and eq. 3 are not analytically tractable, even with the Gaus-\nsian distributions for qx(x(1:r)\n0:T ) and q\u03b8(\u03b8). For tractability and fast computation of messages in\n2A second variant of the model, in which the dynamics matrix determining the spatio-temporal correlations\n\nin the population varies across trials, is described in the Appendix.\n\n3\n\nrecording 1recording r\fthe forward-backward algorithm for eq. 2, we utilise the so-called Laplace propagation or Laplace\nexpectation propagation (Laplace-EP) [31, 32, 33], which makes a Gaussian approximation to each\nmessage based on Laplace approximation, then propagates the messages forward and backward.\nWhile Laplace propagation in the prior work is commonly coupled with point estimates of parame-\nters, we consider the posterior distribution over parameters. For this reason, we refer to our inference\nmethod as Bayesian Laplace propagation. The use of approximate message passing in the Laplace\npropagation implies that there is no longer a guarantee that the lower bound will increase monoton-\nically in each iteration, which is the main difference between our method and the VBEM algorithm.\nWe therefore monitored the convergence of our algorithm by computing one-step ahead prediction\nscores [13]. The algorithm proceeds by iterating the following three steps:\n\n(cid:90)\n\n(1) Approximating the posterior over latent states: Using the \ufb01rst-order dependency in latent\nstates, we derive a sequential forward/backward algorithm to obtain qx(x(1:r)\n0:T ), generalising the\napproach of [26] to multi-dimensional latent states. Since this step decouples across trials, it is easy\nto parallelize, and we omit the trial-indices for clarity. We note that computation of the approximate\nposterior in this step is not more expensive than Bayesian inference of the latent state in a \u2018\ufb01xed\nparameter\u2019 PLDS. The forward message \u03b1(xt) at time t is given by\n\ndxt\u22121\u03b1(xt\u22121) exp(cid:2)(cid:104)log(p(xt|xt\u22121)p(yt|xt))(cid:105)q\u03b8 (\u03b8)\n\n(4)\nAssuming that the forward message at time t\u2212 1 denoted by \u03b1(xt\u22121) is Gaussian, the Poisson\nlikelihood term will render the forward message at time t non-Gaussian, but we will approximate\n\u03b1(xt) as a Gaussian using the \ufb01rst and second derivatives of the right-hand side of eq. 4 with respect\nto xt.\nSimilarly, the backward message at time t \u2212 1 is given by\n\n\u03b1(xt) \u221d\n\n(cid:3) .\n\n(cid:90)\n\ndxt\u03b2(xt) exp(cid:0)(cid:104)log(p(xt|xt\u22121)p(yt|xt))(cid:105)q\u03b8 (\u03b8)\n\n(cid:1) ,\n\n\u03b2(xt\u22121) \u221d\n\n(5)\n\nwhich we also approximate to a Gaussian for tractability in computing backward messages.\nUsing the forward/backward messages, we compute the posterior marginal distribution over latent\nvariables (See Appendix). We need to compute the cross-covariance between neighbouring latent\nvariables to obtain the suf\ufb01cient statistics of latent variables (which we will need for updating the\nposterior over parameters). The pairwise marginals of latent variables are given by\n\np(xt, xt+1|y1:T ) \u221d \u03b2(xt+1) exp(cid:0)(cid:104)log(p(yt+1|xt+1)p(xt+1|xt))(cid:105)q\u03b8 (\u03b8)\n\n(6)\nwhich we approximate as a joint Gaussian distribution by using the \ufb01rst/second derivatives of eq. 6\nand extracting the cross-covariance term from the joint covariance matrix.\n\n(cid:1) \u03b1(xt),\n\n(2) Approximating the posterior over parameters: After inferring the posterior over latent\nstates, we update the posterior distribution over the parameters. The posterior over parameters fac-\ntorizes as\n\nq\u03b8(\u03b8) = qa,b(a, b) qc,d,h(c, d, h(1:r)),\n\n(7)\nwhere used the vectorized notations b = vec(B(cid:62)) and c = vec(C(cid:62)). We set c, d to the maximum\nlikelihood estimates \u02c6c, \u02c6d for simplicity in inference. The computational cost of this algorithm is\ndominated by the cost of calculating the posterior distribution over h(1:r), which involves manipula-\ntion of a rk-dimensional Gaussian. While this was still tractable without further approximations for\nthe data-set sizes used in our analyses below (hundreds of trials), a variety of approximate methods\nfor GP-inference exist which could be used to improve ef\ufb01ciency of this computation. In particular,\nwe will typically be dealing with systems in which \u03c4 (cid:29) 1, which means that the kernel-matrix is\nsmooth and could be approximated using low-rank representations [28].\n\n(3) Estimating hyperparameters:\nFinally, after obtaining the the approximate posterior\nq(\u03b8, x(1:r)\n0:T ), we update the hyperparameters of the prior by maximizing the lower bound with re-\nspect to the hyperparameters. The variational lower bound simpli\ufb01es to (see Ch.5 in [29] for details,\nnote that the usage of Gaussian approximate posteriors ensures that this step is analogous to hyper\nparameter updating in a fully Gaussian LDS)\n\nlog p(y(1:r)\n\n1:T |\u03a6) \u2265 \u2212KL(\u03a6) + c,\n\n(8)\n\n4\n\n\fFigure 2: Illustration of non-stationarity in \ufb01ring rates (simulated data). A, B Spike rates of 40\nneurons are in\ufb02uenced by two slowly varying \ufb01ring rate modulators. The log mean \ufb01ring rates of the\ntwo groups of neurons are z1(red, group 1) and z2(blue, group 2) across 100 trials. C, D Raster plots\nshow the extreme cases, i.e. trials 25 and 75. The traces show the posterior mean of z estimated\nby N-PLDS (light blue for z2, light red for z1), independent PLDSs (\ufb01t a PLDS to each trial data\nindividually, dark gray), and PLDS (light gray). E Total and conditional (on each trial) covariance of\nrecovered neural responses from each model (averaged across all neuron pairs, and then normalised\nfor visualisation). The covariances recovered by our model (red) well match the true ones (black),\nwhile those by independent PLDSs (gray) and a single PLDS (light gray) do not.\n\n2Tr(cid:2)\u03a3\u22121\n\n\u03a6 \u03a3(cid:3) + 1\n\nwhere c is a constant. Here, the KL divergence between the prior and posterior over parameters,\ndenoted by N (\u00b5\u03a6, \u03a3\u03a6) and N (\u00b5, \u03a3), respectively, is given by\n\nKL(\u03a6) = \u2212 1\n\n2 log |\u03a3\u22121\n\n\u03a6 \u03a3| + 1\n\n(9)\nwhere the prior mean and covariance depend on the hyperparameters. We update the hyperparame-\nters by taking the derivative of KL w.r.t. each hyper parameter. For the prior mean, the \ufb01rst derivative\nexpression provides a closed-form update. For \u03c4 (time scale of inter-trial \ufb02uctuations in \ufb01ring rates)\nand \u03c32 (variance of inter-trial \ufb02uctuations), their derivative expressions do not provide a closed form\nupdate, in which case we compute the KL divergence on the grid de\ufb01ned in each hyperparameter\nspace and choose the value that minimises KL.\n\n2 (\u00b5 \u2212 \u00b5\u03a6)(cid:62)\u03a3\u22121\n\n\u03a6 (\u00b5 \u2212 \u00b5\u03a6) + c,\n\nPredictive distributions for test data.\nIn our model, different trials are no longer considered to\nbe independent, so we can predict parameters for held-out trials. Using the GP model on h and our\napproximations, we have Gaussian predictive distributions on h\u2217 for test data D\u2217 given training data\nD:\n\np(h\u2217|D,D\u2217) = N (mh + K\u2217K\u22121(\u00b5h \u2212 mh), K\u2217\u2217 \u2212 K\u2217(K + H\u22121\n\n(10)\nwhere K is the prior covariance matrix on D and K\u2217\u2217 is on D\u2217, and K\u2217 is their prior cross-\ncovariance as introduced in Ch.2 of [28], and the negative Hessian Hh is de\ufb01ned as\n\nh )\u22121K\u2217(cid:62)),\n\nr(cid:88)\n\n(cid:90)\n\n[\n\nHh = \u2212 \u22022\n\n\u22022h(1:h)\n\nT(cid:88)\n\ndx(i)\n\n0:T q(x(i)\n0:T )\n\nlog p(y(i)\nt\n\n|x(i)\n\nt\n\n, \u02c6c, \u02c6d, h(i))].\n\n(11)\n\ni=1\n\nt=1\n\nIn the applications to simulated and neurophysiological data described in the following, we used this\napproach to predict the properties of neural dynamics on held-out trials.\n\n4 Applications\n\nSimulated data: We \ufb01rst illustrate the performance of N-PLDS on a simulated population record-\ning from 40 neurons consisting of 100 trials of length T = 200 time steps each. We used a\n4-dimensional latent state and assumed that the population consisted of two homogeneous sub-\npopulations of size 20 each, with one modulatory input controlling rate \ufb02uctuations in each group\n(See Fig. 2 A). In addition, we assumed that for half of each trial, there was a time-varying stimulus\n(\u2018drifting grating\u2019), represented by a 3-dimensional vector which consisted of the sine and cosine\n\n5\n\ntrue z1A group 1\u22122\u221210102030405060708090100\u22122\u22121trialslog mean firing rateB group 2N-PLDSPLDSIndep-PLDSneuronstrial # 25true z2N-PLDSPLDSIndep-PLDSlog mean firing rate010strial # 75010sneuronsneuronstrial # 25010strial # 75010sneurons\u22120.200.20.40.60.81-5s-2.5s02.5s5stotal cov (z)condi cov (z)N-PLDSIndep-PLDSPLDStrueC group 1 population activityD group 2 population activityE covariance estimation0102030405060708090100trials\fFigure 3: Non-stationary \ufb01ring rates in a population of V1 neurons. A: Mean \ufb01ring rates of\nneurons (black trace) across trials. Left: The 5 most non-stationary neurons. Right: The 5 most\nstationary neurons. The \ufb01tted (solid line) and the predicted (circles) mean \ufb01ring rates are also shown\nfor N-PLDS (in red) and PLDS (in gray). B Left: The RMSE in predicting single neuron \ufb01ring rates\nacross 5 most non-stationary neurons for varying latent dimensionalities k, where N-PLDS achieves\nsigni\ufb01cantly lower RMSE. Middle: RMSE for the 5 most stationary neurons, where there is no\ndifference between two methods (apart from an outlier at k=8). Right: RMSE for the all 64 neurons.\n\nof the time-varying phase of the stimulus (frequency 0.4 Hz) as well as an additional binary term\nwhich indicated whether the stimulus was active.\n\nWe \ufb01t N-PLDS to the data, and found that it successfully captures the non-stationarity in (log) mean\n\ufb01ring rates, de\ufb01ned by z = C(x + h) + d, as shown in Fig. 2, and recovers the total and trial-\nconditioned covariances (the across-trial mean of the single-trial covariances of z). For comparison,\nwe also \ufb01t 100 separate PLDSs to the data from each trial, as well as a single PLDS to the entire\ndata. The naive approach of \ufb01tting an individual PLDS to each trial can, in principle, follow the\nmodulation. However, as each model is only \ufb01t to one trial, the parameter-estimates are very noisy\nsince they are not suf\ufb01ciently constrained by the data from each trial.\n\nWe note that a single PLDS with \ufb01xed parameters (as is conventionally used in neural data analysis)\nis able to track the modulations in \ufb01ring rates in the posterior mean here\u2013 however, a single PLDS\nwould not be able to extrapolate \ufb01ring rates for unseen trials (as we will demonstrate in our analyses\non neural data below). In addition, it will also fail to separate \u2018slow\u2019 and \u2018fast\u2019 modulations into\ndifferent parameters. By comparing the total covariance of the data (averaged across neuron pairs) to\nthe \u2018trial-conditioned\u2019 covariance (calculated by estimating the covariance on each trial individually,\nand averaging covariances) one can calculate how much of the cross-neuron co-variability can be\nexplained by across-trials \ufb02uctuations in \ufb01ring rates (see e.g., [17]). In this simulation shown in\nFig. 2 (which illustrates an extreme case dominated by strong across-trial effects), the conditional\ncovariance is much smaller than the full covariance.\n\n6\n\nTrial0255075100cell#1cell#2cell#3Acell#4cell#5cell#6Mean firing rate (Hz)Bk12345678RMSE 0.10.2N-PLDSPLDS0.010.02k123456785 most non-stationary neurons5 most stationary neuronsN-PLDSPLDSdatak123456780.050.07all neurons (64)5 most non-stationary neurons5 most stationary neurons01051501051501002cell#702cell#802cell#902cell#1002Trial0255075100\fNeurophysiological data: How big are non-stationarities in neural population recordings, and\ncan our model successfully capture them? To address these questions, we analyzed a population\nrecording from anaesthetized macaque primary visual cortex consisting of 64 neurons stimulated by\nsine grating stimuli. The details of data collection are described in [5], but our data-set also included\nunits not used in the original study. We binned the spikes recorded during 100 trials of length 4s\n(stimulus was on for 2s) of the same orientation using 50ms bins, resulting in trials of length T = 80\nbins. Analogously to the simulated dataset above, we parameterised the stimulus as a 3-dimensional\nvector of the sine and cosine with the same temporal frequency of the drifting grating, as well as an\nindicator that speci\ufb01es whether there is a stimulus or not.\nWe used 10-fold cross validation to evaluate performance of the model, i.e. repeatedly divided the\ndata into test data (10 trials) and training data (the remaining 90 trials). We \ufb01t the model on each\ntraining set, and using the estimated parameters from the training data, we made predictions on the\nmodulator h on test data by using the mean of the predictive distribution over h. We note that, in\ncontrast to conventional applications of cross-validation which assume i.i.d. trials, our model here\nalso takes into correlations in \ufb01ring rates across trials\u2013 therefore, we had to keep the trial-indices\nin order to compute predictive distributions for test data using formulas in eq. 10. Using these\nparameters, we drew samples for spikes for the entire trials to compute the mean \ufb01ring rates of each\nneuron at each trial. For comparison, we also \ufb01t a single PLDS to the data. As this model does not\nallow for across-trial modulations of \ufb01ring rates, we simply kept the parameters estimated from the\ntraining data. For visualisation of results, we quanti\ufb01ed the \u2018non-stationarity\u2019 of each neuron by \ufb01rst\nsmoothing its \ufb01ring rate across trials (using a kernel of size 10 trials), calculating the variance of the\nsmoothed \ufb01ring rate estimate, and displaying \ufb01ring rates for the 5 most non-stationary neurons in\nthe population (Fig. 3A, left) as well as 5 most stationary neurons (Fig. 3A, right). Importantly, the\n\ufb01ring-rates were also correctly interpolated for held out trials (circles in Fig. 3A).\nTo evaluate whether the additional parameters in N-PLDS result in a superior model compared to\nconventional PLDS [13], we tested the model with different latent dimensionalities ranging from\nk = 1 to k = 8, and compared each model against a \u2018\ufb01xed\u2019 PLDS of matched dimensionality\n(Fig. 3B). We estimated predicted \ufb01ring rates on held out trials by sampling 1000 replicate trials\nfrom the predictive distribution for both models and compared the median (across samples) of the\nmean \ufb01ring rates of each neuron to those of the data. The shown RMSE values are the errors of\npredicted \ufb01ring rate (in Hz) per neuron per held out trial (population mean across all neurons and\ntrials is 4.54 Hz). We found that N-PLDS outperformed PLDS provided that we had suf\ufb01ciently\nmany latent states, at least k > 3. For large latent dimensionalities (k > 8) performance degraded\nagain, which could be a consequence of over\ufb01tting. Furthermore, we show that for non-stationary\nneurons there is a large gain in predictive power (Fig. 3B, left), whereas for stationary neurons PLDS\nand N-PLDS have similar prediction accuracy (Fig. 3B, middle). The RMSE on \ufb01ring rates for all\nneurons (Fig. 3B, right) suggests that our model correctly identi\ufb01ed the \ufb02uctuation in \ufb01ring rates.\nWe also wanted to gain insights into the temporal scale of the underlying non-stationarities. We \ufb01rst\nlooked at the recovered time-scales \u03c4 of the latent modulators, and found them to be highly preserved\nacross multiple training folds, and, importantly, across different values of the latent dimensionalities,\nconsistently peaked near 10 trials (Fig. 4 A). We made sure that the peak near 10 trials is not merely\na consequence of parameter initialization\u2013 parameters were initialised by \ufb01tting a Gaussian Process\nwith a exponentiated quadratic one-dimensional kernel to each neuron\u2019s mean \ufb01ring rate over trials\nindividually, then taking the mean time-scale over neurons as the initial global time-scale for our\nkernel. The initial values were 8.12 \u00b1 0.01, differing slightly between training sets. Similarly, we\nchecked that the parameters of the \ufb01nal model (after 30 iterations of Bayesian Laplace propagation),\nwere indeed superior to the initial values, by monitoring the prediction error on held-out trials.\nFurthermore, due to introducing a smooth change with the correct time scale in the latent space\n(e.g., the posterior mean of h across trials shown in Fig. 4B), we \ufb01nd that N-PLDS recovers more\nof the time-lagged covariance of neurons compared to the \ufb01xed PLDS model (Fig. 4C).\n\n5 Discussion\n\nNon-stationarities are ubiquitous in neural data: Slow modulations in \ufb01ring properties can result\nfrom diverse processes such as plasticity and learning, \ufb02uctuations in arousal, cortical reorganisation\nafter injury as well as development and aging. In addition, non-stationarities in neural data can also\nbe a consequence of experimental artifacts, and can be caused by \ufb02uctuations in anaesthesia level,\n\n7\n\n\fFigure 4: Non-stationary \ufb01ring rates in a population of V1 neurons (continued). A: Histogram\nof time-constants across different latent dimensionalities and training sets. Mean at 10.4 is indicated\nby the vertical red line. B: Estimated 7-dimensional modulator (the posterior mean of h). The\nmodulator with an estimated length scale of approximately 10 trials is smoothly varying across\ntrials. C: Comparison of normalized mean auto-covariance across neurons.\n\nstability of the physiological preparation or electrode drift. Whatever the origins of non-stationarities\nare, it is important to have statistical models which can identify them and disentangle their effects\nfrom correlations and dynamics on faster time-scales [16].\nWe here presented a hierarchical model for neural population dynamics in the presence of non-\nstationarity. Speci\ufb01cally, we concentrated on a variant of this model which focuses on non-\nstationarity in \ufb01ring rates. Recent experimental studies have shown that slow \ufb02uctuations in neural\nexcitability which have a multiplicative effect on neural \ufb01ring rates are a dominant source of noise\ncorrelations in anaesthetized visual cortex [17, 5, 24]. Because of the exponential spiking nonlin-\nearity employed in our model, the latent additive \ufb02uctuations in the modulator-variables also have\na multiplicative effect on \ufb01ring rates. Applied to a data-set of neurophysiological recordings, we\ndemonstrated that this modelling approach can successfully capture non-stationarities in neurophys-\niological recordings from primary visual cortex.\nIn our model, both neural dynamics and latent modulators are mediated by the same low-dimensional\nsubspace (parameterised by C). We note, however, that this assumption does not imply that neurons\nwith strong short-term correlations will also have strong long-term correlations, as different dimen-\nsions of this subspace (as long as it is chosen big enough) could be occupied by short and long term\ncorrelations, respectively. In our applications to neural data, we found that the latent state had to be\nat least three-dimensional for the non-stationary model to outperform a stationary dynamics model,\nand it might be the case that at least three dimensions are necessary to capture both fast and slow\ncorrelations. It is an open question of how correlations on fast and slow timescales are related [17],\nand the techniques presented have the potential to be of use for mapping out their relationships.\nThere are limitations to the current study: (1) We did not address the question of how to select\namongst multiple different models which could be used to model neural non-stationarity for a given\ndataset; (2) we did not present numerical techniques for how to scale up the current algorithm for\nlarger trial numbers (e.g., using low-rank approximations to the covariance matrix) or large neural\npopulations; and (3) we did not address the question of how to overcome the slow convergence\nproperties of GP kernel parameter estimation [34]. (4) While Laplace propagation is \ufb02exible, it is\nan approximate inference technique, and the quality of its approximations might vary for different\nmodels of tasks. We believe that extending our method to address these questions provides an\nexciting direction for future research, and will result in a powerful set of statistical methods for\ninvestigating how neural systems operate in the presence of non-stationarity.\n\nAcknowledgments\n\nWe thank Alexander Ecker and the lab of Andreas Tolias for sharing their data with us [5] (see\nhttp://toliaslab.org/publications/ecker-et-al-2014/), and for allowing us\nto use it in this publication, as well as Maneesh Sahani and Alexander Ecker for valuable comments.\nThis work was funded by the Gatsby Charitable Foundation (MP and GB) and the German Federal\nMinistry of Education and Research (MP and JHM) through BMBF; FKZ:01GQ1002 (Bernstein\nCenter T\u00a8ubingen). Code available at http://www.mackelab.org/code.\n\n8\n\nAN-PLDSPLDSdata\u221250510Trial index0100\u221210Estimated Modulators51015200Time-scale estimatesCountB10515Normalized mean autocovarianceCTime-scale (trials)Time lag (ms)-500050000.20.40.60.81\fReferences\n[1] A. Renart and C. K. Machens. Variability in neural activity and behavior. Curr Opin Neurobiol, 25:211\u2013\n\n[2] A. Destexhe. Intracellular and computational evidence for a dominant role of internal network activity in\n\ncortical computations. Curr Opin Neurobiol, 21(5):717\u2013725, 2011.\n\n[3] G. Maimon. Modulation of visual physiology by behavioral state in monkeys, mice, and \ufb02ies. Curr Opin\n\nNeurobiol, 21(4):559\u201364, 2011.\n\n[4] K. D. Harris and A. Thiele. Cortical state and attention. Nat Rev Neurosci, 12(9):509\u2013523, 2011.\n[5] Ecker et al. State dependence of noise correlations in macaque primary visual cortex. Neuron, 82(1):235\u2013\n\n20, 2014.\n\n48, 2014.\n\n[6] Ralf M Haefner, Pietro Berkes, and J\u00b4ozsef Fiser. Perceptual decision-making as probabilistic inference\n\nby neural sampling. arXiv preprint arXiv:1409.0257, 2014.\n\n[7] Alexander S Ecker, George H Den\ufb01eld, Matthias Bethge, and Andreas S Tolias. On the structure of\n\npopulation activity under \ufb02uctuations in attentional state. bioRxiv, page 018226, 2015.\n\n[8] A. C. Smith and E. N. Brown. Estimating a state-space model from point process observations. Neural\n\nComput, 15(5):965\u201391, 2003.\n\n[9] U. T. Eden, L. M. Frank, R. Barbieri, V. Solo, and E. N. Brown. Dynamic analysis of neural encoding by\n\npoint process adaptive \ufb01ltering. Neural Comput, 16(5):971\u201398, 2004.\n\n[10] B. M. Yu, A. Afshar, G. Santhanam, S. I. Ryu, K. Shenoy, and M. Sahani. Extracting dynamical structure\n\nembedded in neural activity. In NIPS 18, pages 1545\u20131552. MIT Press, Cambridge, MA, 2006.\n\n[11] J. E. Kulkarni and L. Paninski. Common-input models for multiple neural spike-train data. Network,\n\n18(4):375\u2013407, 2007.\n\n[12] W. Truccolo, L. R. Hochberg, and J. P. Donoghue. Collective dynamics in human and monkey sensori-\n\nmotor cortex: predicting single neuron spikes. Nat Neurosci, 13(1):105\u2013111, 2010.\n\n[13] J. H. Macke, L. Buesing, J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani. Empirical models of\n\nspiking in neural populations. In NIPS, pages 1350\u20131358, 2011.\n\n[14] C. van Vreeswijk and H. Sompolinsky. Chaos in neuronal networks with balanced excitatory and in-\n\nhibitory activity. Science, 274(5293):1724\u20136, 1996.\n\n[15] G. J. Tomko and D. R. Crapper. Neuronal variability: non-stationary responses to identical visual stimuli.\n\nBrain Res, 79(3):405\u201318, 1974.\n\n[16] C. D. Brody. Correlations without synchrony. Neural Comput, 11(7):1537\u201351, 1999.\n[17] R. L. T. Goris, J. A. Movshon, and E. P. Simoncelli. Partitioning neuronal variability. Nat Neurosci,\n\n17(6):858\u201365, 2014.\n\n[18] C. D. Gilbert and W. Li. Adult visual cortical plasticity. Neuron, 75(2):250\u201364, 2012.\n[19] E. N. Brown, D. P. Nguyen, L. M. Frank, M. A. Wilson, and V. Solo. An analysis of neural receptive \ufb01eld\n\nplasticity by point process adaptive \ufb01ltering. Proc Natl Acad Sci U S A, 98(21):12261\u20136, 2001.\n\n[20] Frank et al. Contrasting patterns of receptive \ufb01eld plasticity in the hippocampus and the entorhinal cortex:\n\nan adaptive \ufb01ltering approach. J Neurosci, 22(9):3817\u201330, 2002.\n\n[21] N. A. Lesica and G. B. Stanley. Improved tracking of time-varying encoding properties of visual neurons\n\nby extended recursive least-squares. IEEE Trans Neural Syst Rehabil Eng, 13(2):194\u2013200, 2005.\n\n[22] V. Ventura, C. Cai, and R.E. Kass. Trial-to-Trial Variability and Its Effect on Time-Varying Dependency\n\nBetween Two Neurons, 2005.\n\n[23] C. S. Quiroga-Lombard, J. Hass, and D. Durstewitz. Method for stationarity-segmentation of spike train\n\ndata with application to the pearson cross-correlation. J Neurophysiol, 110(2):562\u201372, 2013.\n\n[24] Sch\u00a8olvinck et al. Cortical state determines global variability and correlations in visual cortex. J Neurosci,\n\n35(1):170\u20138, 2015.\n\n[25] Gabriela C., Uri T. E., Sylvia W., Marianna Y., Wendy A. S., and Emery N. B. Analysis of between-trial\n\nand within-trial neural spiking dynamics. Journal of Neurophysiology, 99(5):2672\u20132693, 2008.\n\n[26] Mangion et al. Online variational inference for state-space models with point-process observations. Neu-\n\nral Comput, 23(8):1967\u20131999, 2011.\n\n[27] Neil C Rabinowitz, Robbe LT Goris, Johannes Ball\u00b4e, and Eero P Simoncelli. A model of sensory neural\n\nresponses in the presence of unknown modulatory inputs. arXiv preprint arXiv:1507.01497, 2015.\n\n[28] C.E. Rasmussen and C.K.I. Williams. Gaussian processes for machine learning. MIT Press Cambridge,\n\n[29] M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby Unit, Uni-\n\n[30] Yu et al. Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population\n\nMA, USA, 2006.\n\nversity College London, 2003.\n\nactivity. 102(1):614\u2013635, 2009.\n\n[31] A. J. Smola, V. Vishwanathan, and E. Eskin. Laplace propagation. In Sebastian Thrun, Lawrence K. Saul,\n\nand Bernhard Sch\u00a8olkopf, editors, NIPS, pages 441\u2013448. MIT Press, 2003.\n\n[32] A. Ypma and T. Heskes. Novel approximations for inference in nonlinear dynamical systems using\n\nexpectation propagation. Neurocomput., 69(1-3):85\u201399, 2005.\n\n[33] K. V. Shenoy B. M. Yu and M. Sahani. Expectation propagation for inference in non-linear dynamical\nmodels with poisson observations. In Proc IEEE Nonlinear Statistical Signal Processing Workshop, 2006.\n[34] I. Murray and R. P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. In\n\nNIPS 23, pages 1723\u20131731. 2010.\n\n9\n\n\f", "award": [], "sourceid": 77, "authors": [{"given_name": "Mijung", "family_name": "Park", "institution": "UCL"}, {"given_name": "Gergo", "family_name": "Bohner", "institution": "Gatsby Unit, UCL"}, {"given_name": "Jakob", "family_name": "Macke", "institution": "research center caesar & BCCN Tubingen"}]}