{"title": "A Bayesian model for identifying hierarchically organised states in neural population activity", "book": "Advances in Neural Information Processing Systems", "page_first": 3095, "page_last": 3103, "abstract": "Neural population activity in cortical circuits is not solely driven by external inputs, but is also modulated by endogenous states which vary on multiple time-scales. To understand information processing in cortical circuits, we need to understand the statistical structure of internal states and their interaction with sensory inputs. Here, we present a statistical model for extracting hierarchically organised neural population states from multi-channel recordings of neural spiking activity. Population states are modelled using a hidden Markov decision tree with state-dependent tuning parameters and a generalised linear observation model. We present a variational Bayesian inference algorithm for estimating the posterior distribution over parameters from neural population recordings. On simulated data, we show that we can identify the underlying sequence of population states and reconstruct the ground truth parameters. Using population recordings from visual cortex, we find that a model with two levels of population states outperforms both a one-state and a two-state generalised linear model. Finally, we find that modelling of state-dependence also improves the accuracy with which sensory stimuli can be decoded from the population response.", "full_text": "A Bayesian model for identifying hierarchically\norganised states in neural population activity\n\nPatrick Putzky1,2,3, Florian Franzen1,2,3, Giacomo Bassetto1,3, Jakob H. Macke1,3\n\n1Max Planck Institute for Biological Cybernetics, T\u00a8ubingen\n\n2Graduate Training Centre of Neuroscience, University of T\u00a8ubingen\n\n3Bernstein Center for Computational Neuroscience, T\u00a8ubingen\n\npatrick.putzky@gmail.com, florian.franzen@tuebingen.mpg.de\ngiacomo.bassetto@tuebingen.mpg.de, jakob@tuebingen.mpg.de\n\nAbstract\n\nNeural population activity in cortical circuits is not solely driven by ex-\nternal inputs, but is also modulated by endogenous states which vary on\nmultiple time-scales. To understand information processing in cortical cir-\ncuits, we need to understand the statistical structure of internal states\nand their interaction with sensory inputs. Here, we present a statistical\nmodel for extracting hierarchically organised neural population states from\nmulti-channel recordings of neural spiking activity. Population states are\nmodelled using a hidden Markov decision tree with state-dependent tuning\nparameters and a generalised linear observation model. We present a varia-\ntional Bayesian inference algorithm for estimating the posterior distribution\nover parameters from neural population recordings. On simulated data, we\nshow that we can identify the underlying sequence of population states and\nreconstruct the ground truth parameters. Using population recordings from\nvisual cortex, we \ufb01nd that a model with two levels of population states out-\nperforms both a one-state and a two-state generalised linear model. Finally,\nwe \ufb01nd that modelling of state-dependence also improves the accuracy with\nwhich sensory stimuli can be decoded from the population response.\n\n1 Introduction\n\nIt has long been recognised that the \ufb01ring properties of cortical neurons are not constant\nover time, but that neural systems can exhibit multiple distinct \ufb01ring regimes. For example,\ncortical circuits can be in a \u2018synchronised\u2019 state during slow-wave sleep, exhibiting synchro-\nnised \ufb02uctuations of neural excitability [1] or in a \u2018desynchronised\u2019 state in which \ufb01ring is\nirregular. Neural activity in anaesthetised animals exhibits distinct states which lead to\nwidespread modulations of neural \ufb01ring rates and contribute to cross-neural correlations\n[2]. Changes in network state can be brought about through the in\ufb02uence of inter-area\ninteractions [3] and a\ufb00ect communication between cortical and subcortical structures [4].\n\nGiven the strong impact of cortical states on neural \ufb01ring [3, 5, 4], an understanding of the\ninterplay between internal states and external stimuli is essential for understanding how pop-\nulations of cortical neurons collectively process information. Multi-cell recording techniques\nallow to record neural activity from dozens or even hundreds of neurons simultaneously,\nmaking it possible to identify the signatures of underlying states by \ufb01tting appropriate\nstatistical models to neural population activity.\n\nIt is thought that the state-dependence of neocortical circuits is not well described using a\nglobal bi-modal state. Instead, the structure of cortical states is more accurately described\n\n1\n\n\fFigure 1: Illustration of the model. A) Generative model. At time t, the cortical state st\nis determined using a Hidden Markov Decision Tree (HMDT) and depends on the previous\nstate st\u22121, population activity yt\u22121 and on the current stimulus xt.\nIn our simulations,\nwe assumed that the \ufb01rst split of the tree determined whether to transition into an up\nor down-state. Up-states contained transient periods of high \ufb01ring across the population\n(up-high) as well as sustained periods of irregular \ufb01ring (up-low). Each cortical state is\nthen associated with di\ufb00erent spike-generation dynamics, modelling state-dependence of\n\ufb01ring properties such as \u2018burstiness\u2019. B) State-transition probabilities depend on the tree-\nstructure. Transition matrices are depicted as Hinton diagrams where each block represents\na probability and each column sums to 1. Each row corresponds to the possible future state\nst (see colour), and each column to the current state.\n(1) A model in which transition-probabilities in the \ufb01rst level of the tree (up/down) are\nbiased towards the up-state (green squares are bigger than gray ones), and weakly depend on\nthe previous state st\u22121. In this example, both high/low phases are equally likely within up-\nstates (second level of tree, depicted in second column) and do not depend on the previous\nstate (all orange/red squares have same size). The resulting 3 \u00d7 3 matrix of transition\nprobabilities across all states can be calculated from the transition-probabilities in the tree.\n(2) Changing the properties of the second-level node only leads to a local change in the\ntransition matrix: It a\ufb00ects the proportion between the orange/red states, but leaves the\ngreen state unchanged.\n\nusing multiple states which vary both between and within brain regions [6]. In addition,\nthe \u2018state\u2019 of a neural population can vary across multiple time scales from milliseconds to\nseconds or more [6]: For example, cortical recordings can switch between up- and down-\nphases. During an up-phase cortical activity can exhibit \u2018volleys\u2019 of synchronised activity\n[7]\u2014sometimes referred to as population bursts\u2014which can be modelled as transient states.\n\nThese observations suggest that the structure of cortical states could be captured by a\nhierarchical organisation in which each state can give rise to multiple temporally nested\n\u2018sub-states\u2019. This structure naturally yields a binary tree: States can be divided into sub-\nclasses, with states further down the tree operating at faster time-scales determined by\ntheir parent node. We hypothesise that other cortical states also exhibit similar hierarchical\nstructure. Our goal here is to provide a statistical model which can identify cortical states\nand their hierarchical organisation from recordings of population activity. As a running\nexample of such a hierarchical organisation we use a model in which the population exhibits\nsynchronised population bursts during up-states, but not during down-states. This system\nis modelled using two levels of states (\ufb01rst: up/down), for which the up-state is further\ndivided into two sub-states (transient high-\ufb01ring events and normal \ufb01ring, see 1A).\n\nWe present an inhomogeneous hidden Markov model (HMM) [8] to model the temporal\ndynamics of state-transitions [9, 10]. Our approach is most closely related to [10], who\ndeveloped a state-dependent generalised linear model [11] in which both the tuning prop-\n\n2\n\n\ferties and state-transitions can be modelled to depend on external covariates. However,\nour formulation also allows for hierarchically organised state-structures. In addition, pre-\nvious population models based on discrete latent states [10, 12] used point-estimation for\nparameter learning. In contrast, we present algorithms for full Bayesian inference over the\nparameters of our model, making it possible to identify states in smaller or noisier data\n[13]. This is important for neural population recordings which are typically characterised\nby short recording times relative to the dimensionality of the data and by high variability.\nIn addition, estimates of posterior distributions are important for visualising uncertainty\nand for optimising experimental paradigms with active-learning methods [14, 15].\n\n2 Methods\n\nWe use a hidden Markov decision tree (HMDT) [16] to model hierarchically organised states\nwith binary splits and a generalised linear observation model (GLM). An HMDT combines\nthe properties of a hidden Markov model (to model temporal structure) with a hierarchical\nmixture of experts (HME, to model a hierarchy of latent states) [17]. In general the hierar-\nchical approach can represent richer dependence of states on external covariates, analogous\nto the di\ufb00erence between multi-class logistic regression and multi-class binary decision trees.\nFor example, a two-level binary tree can separate four point clouds situated at the corners\nof a square whereas a 4-class multinomial regression cannot. We use Bayesian logistic regres-\nsion [18] to model transition gates and emissions. In the following, we describe the model\nstructure and propose a variational algorithm [8, 19] for inferring its parameters.\n\nK(cid:89)\n\nC(cid:89)\n\n(cid:16)\n\n(cid:16)\n\nK(cid:89)\n\nK(cid:89)\n\ni=1\n\nj=1\n\n(cid:17)s(i)\n\nt\n\n(cid:17)s(i)\n\nt s(j)\nt\u22121\n\n2.1 Hierarchical hidden Markov model for multivariate binary data\nWe consider discrete time-series data of multivariate binary1 neural spiking events yt \u2208\n{0, 1}C where C is the number of cells. We assume that neural spiking can be in\ufb02uenced\nby (observed) covariates xt \u2208 RD. The covariates xt could represent external stimuli,\nspiking history of neurons or other measures such as the total population spike count. In\nour analyses below, we assume that correlations across neurons arise only from the joint\ncoupling to the population state, and we do not include couplings between neurons as is\nsometimes done with GLMs [11]. Dependence of neural \ufb01ring on internal states is modelled\nby including a 1-of-K latent state vector st, where K is the number of latent states. The\nemission probabilities for the observable vector yt (i.e. the probability of spiking for each\nneuron) are thus given by\n\np (yt|xt, st, \u03a6) =\n\ny(c)\nt\n\n|x(c)\n\nt\n\n, \u03c6(c)\n\ni\n\np\n\n,\n\n(1)\n\nwhere \u03a6 is a set of model parameters. We allow the external covariate xt to be di\ufb00erent for\neach neuron c.\n\ni=1\n\nc=1\n\nTo model temporal dynamics over st, we use a hidden Markov model (HMM) [10], where\nthe state transitions take the form\n\np (st|st\u22121, xt, \u03a8) =\n\ns(i)\nt\n\n|s(j)\nt\u22121, xt, \u03a8\n\np\n\n,\n\n(2)\n\nwhere \u03a8 is a set of parameters of the transition model. The model allows state-transitions\nto be dependent on an external input xt\u2014 this can e.g. be used to model state-transitions\ncaused by stimulation of subcortical structures involved in controlling cortical states [20].\nMoving beyond this standard input output HMM formulation [21], we introduce hierarchi-\ncally organised auxiliary latent variables zt which represent the current state st through a\nbinary tree. Using HME terminology, we refer to the nodes representing zt as \u2018gates\u2019. Each\nof the K leaves of the tree (or, equivalently, each path through the tree) corresponds to one\nof the K entries of st and we can thus represent st in the form\n\nL(cid:89)\n\n(cid:16)\n\n(cid:17)A(l,k)\nL (cid:16)\n\ns(k)\nt =\n\nz(l)\nt\n\n(cid:17)A(l,k)\n\nR\n\n1 \u2212 z(l)\n\nt\n\n,\n\n(3)\n\n1All derivations below can be generalised to model the emission probabilities by any kind of\n\ngeneralised linear model.\n\nl=1\n\n3\n\n\f(cid:16)\n\np(z(l)\n\nt = 1|x(l)\n\nt\n\nwhere AL and AR are adjacency matrices which indicate whether state k is in the left or\nright branch of gate l, respectively (see [19]). Using this representation, st is deterministic\ngiven zt which signi\ufb01cantly simpli\ufb01es the inference process. The auxiliary latent variables\nz(l)\nare Bernoulli random variables and we chose their conditional probability distribution\nt\nto be\n\n.\n\n, st\u22121, vl) = \u03c3\n\n(4)\nHere, \u03c3(\u00b7) is the logistic sigmoid, vl are the parameters of the l-th gate and ut represents\na concatenation of the previous state st\u22121, the input xt (which could for example represent\npopulation \ufb01ring rate, time in trial or an external stimulus) and a constant term of unit\nvalue to model the prior probability of z(l)\n0 = 1. This parametrisation signi\ufb01cantly reduces\nthe number of parameters used for the transition probabilities as compared to [10]. To\nenforce stronger temporal locality and less jumping between states we could also reduce\nthis probability to be conditioned only on previous activations of a sub-tree of the HMDT\ninstead of all population states.\n\nv(cid:62)\nl u(l)\n\nt\n\n(cid:17)\n\n2.2 Learning & Inference\n\nFor posterior inference over the model parameters we would need to infer the joint distri-\nbution over all stochastic variables conditioned on X,\n\np (Y, S, \u03a6, \u03a8, \u03bb, \u03bd|X) =p (Y|S, X, \u03a6) p (S|X, \u03a8) p (\u03a6|\u03bb) p (\u03bb) p (\u03a8|\u03bd) p (\u03bd)\n\n(5)\nwhere Y is the set of yt\u2019s, \u03a6 and \u03a8 are the sets of parameters for the emission and gating\ndistributions, respectively, and \u03bb and \u03bd are the hyperparameters for the parameter priors.\nSince there is no closed form solution for this distribution, we use a variational approximation\n[8]. We assume that the posterior factorises as\n\nq (S, \u03a6, \u03a8, \u03bb, \u03bd) =q (S) q (\u03a6) q (\u03a8) q (\u03bb) q (\u03bd)\n\nK(cid:89)\n\nC(cid:89)\n\n(cid:16)\n\n(cid:17)\n\n(cid:16)\n\n(cid:17) L(cid:89)\n\n=q (S)\n\nq\n\n\u03c6(c)\nk\n\nq\n\n\u03bb(c)\nk\n\nk=1\n\nc=1\n\nl=1\n\nq (\u03c8l) q (\u03bdl) ,\n\n(6)\n\n(7)\n\nand \ufb01nd the variational approximation to the posterior over parameters, q (S, \u03a6, \u03a8, \u03bb, \u03bd),\nby optimising the variational lower bound L(q) to the evidence\n\nL(q) :=\n\n(cid:90)(cid:90)(cid:90)(cid:90)\n(cid:88)\n(cid:90)(cid:90)(cid:90)(cid:90)\n(cid:88)\n\nS\n\n\u2264 ln\n\nS\n\nq (S, \u03a6, \u03a8, \u03bb, \u03bd) ln\n\nd\u03a6d\u03a8d\u03bbd\u03bd\n\np (Y, S, \u03a6, \u03a8, \u03bb, \u03bd|X)\n\nq (S, \u03a6, \u03a8, \u03bb, \u03bd)\n\np (Y, S, \u03a6, \u03a8, \u03bb, \u03bd|X) d\u03a6d\u03a8d\u03bbd\u03bd = ln p (Y|X) .\n\n(8)\n\n(9)\n\n(10)\n\nWe use variational Expectation-Maximisation (VBEM) to perform alternating updates on\nthe posterior over latent state variables and the posterior over model parameters. To infer\nthe posterior over latent variables (i.e. responsibilities), we use a modi\ufb01ed forward-backward\nalgorithm as proposed in [22] (see also [8]). In order to perform the forward and backward\nsteps, they propose the use of subnormalised probabilities of the form\n\n(cid:16)\n\n\u02dcp\n\ns(i)\nt\n\n(cid:17)\n\n(cid:104)\n\n(cid:16)E\u03a8\n\n(cid:16)\n\n|s(j)\nt\u22121, xt, \u03a8\n\u02dcp (yt|xt, \u03a6i) := exp (E\u03a6i [ln p (yt|xt, \u03a6i)])\n\n:= exp\n\n|s(j)\nt\u22121, xt, \u03a8\n\ns(i)\nt\n\nln p\n\n(cid:17)(cid:105)(cid:17)\n\n(11)\nfor the state-transition probabilities and emission probabilities. Since all relevant probabili-\nties in our model are over discrete variables, it would be straightforward to normalise those\nprobabilities, but we found that normalisation did not noticeably change results.\n\nWith the approximations from above, the forward probability can thus be written as\n\n(cid:17)\n\n(cid:16)\n\n(cid:17)\n\n\u03b1\n\ns(i)\nt\n\n(cid:16)\n\n=\n\n1\n\u02dcCt\n\n\u02dcp\n\n(cid:16)\n\n(cid:17) K(cid:88)\n\nj=1\n\n(cid:17)\n\n(cid:16)\n\nyt|s(i)\n\nt\n\n, xt, \u03c6\n\n\u03b1\n\ns(j)\nt\u22121\n\n\u02dcp\n\ns(i)\nt\n\n|s(j)\nt\u22121, xt, \u03a8\n\n,\n\n(12)\n\nwhere \u03b1(s(i)\ngiven previous time steps and \u02dcCt is a\nnormalisation constant. Similar to the forward step, the backward recursion takes the form\n\nt ) is the probability-mass of state s(i)\n\nt\n\n4\n\n\f(cid:16)\n\n\u03b2\n\ns(i)\nt\n\n(cid:17)\n\n(cid:16)\n\n\u03b2t\n\nK(cid:88)\n\nj=1\n\n(cid:17)\n\n(cid:16)\n\n\u02dcp\n\ns(j)\nt+1\n\n=\n\n1\n\u02dcCt\n\n(cid:17)\n\n(cid:16)\n\n\u02dcp\n\n(cid:17)\n\n.\n\nt+1|s(i)\ns(j)\n\nt\n\n, xt, \u03a8\n\n(13)\n\nyt+1|s(j)\n\nt+1, xt+1, \u03c6\n\nT(cid:88)\n\n(cid:16)\n\n(cid:17)\n\nUsing the forward and backward equation steps we can infer the state posteriors [8]. Given\nthe state posteriors, the logarithm of the approximate parameter posterior for each of the\nnodes takes the form\n\nln q(cid:63) (\u03c9n) =\n\n\u03b7(n)\nt\n\n\u00b5(n)\nt\n\n|x(n)\n\nt\n\n+ E\u03b3n [ln p (\u03c9n|\u03b3n)] + const.\n\nt\n\nt=1\n\nln p\n\n, \u03c9n, (. . . )\n\n(14)\nwhere \u03c9n are the parameters of the n-th node and p (\u03c9n|\u03b3n) is the prior over the param-\neters. Here, \u03b7(n)\nis the posterior responsibility or estimated in\ufb02uence of node n on the\ntth observation and \u00b5(n)\ndenotes the expected output (known for state nodes) of node n\n(see supplement for details). This equation also holds for a tree structure with multinomial\ngates and for non-binary emission models such as Poisson and linear models. The above\nequations are valid for maximum likelihood inference, except that all parameter priors are\nremoved, and the expectations over log-likelihoods reduce to log-likelihoods. We use logistic\nregression for all emission probabilities and gates, and a local variational approximation to\nthe logistic sigmoid as presented in [18].\n\nt\n\nAs parameter priors we use anisotropic Gaussians with individual Gamma priors on each\ndiagonal entry of the precision matrix. With this prior structure we can perform automatic\nrelevance determination [23]. We chose shape parameter a0 =1 \u00d7 10\u22122 and rate parameter\nb0 = 1 \u00d7 10\u22124, leading to a broad Gamma hyperprior [19]. In many applications, it will\nbe reasonable to assume that neurons in close-by states of the tree show similar response\ncharacteristics (similar parameters). The hierarchical organisation of the model yields a\nnatural structure for hierarchical priors which can encourage parameter similarity2.\n\n2.3 Details of simulated and neurophysiological data\n\nTo assess and illustrate our model, we simulated a population recording with trials of 3 s\nlength (20 neurons, 10 ms time bins). As illustrated in Fig. 1 A, we modelled one low-\ufb01ring-\nrate down state (down, base \ufb01ring rate 0.5 Hz) and two up states (up-low and up-high, with\nbase \ufb01ring rates of 5, and 50 Hz respectively). The root node switched between up and\ndown states, whereas a second node controlled transitions between the two types of up-\nstates. Up-high states only occurred transiently, modelling synchronised bouts of activity.\nIn the down state, neurons have a 10 ms refractory period, during up states they exhibit\nbursting activity. Transitions from down to up go mainly via up-high to up-low, while down-\ntransitions go from up-low to down; stimulation increases the probability of being in one of\nthe up states. A pulse-stimulus occurred at time 1 s of each trial. Each model was \ufb01t on a\nset of 20 trials and evaluated on a di\ufb00erent test set of 20 trials. For each training set, 24\nrandom parameter initialisations were drawn and the one with highest evidence was chosen\nfor evaluation. State predictions were evaluated using the Viterbi algorithm [24, Ch. 13].\n\nWe analysed a recording from visual cortex (V1) of an anaesthetised macaque [2]. The\ndata-set consisted of 1600 presentations of drifting gratings (16 directions, 100 trials each),\neach lasting 2 s. Experimental details are described in [2]. For each trial, we kept a segment\nof 500 ms before and after a stimulus presentation, resulting in trials of length 3 s each. We\nbinned and binarised spike trains in 50 ms bins. Additional spikes (present in (5.45 \u00b1 1.56) %\nof bins) were discarded by the binarisation procedure. We chose the representation of\nthe stimulus to be the outer product of the two vectors [1, sin(\u03d1), cos(\u03d1)], where \u03d1 is\nthe phase of the grating, and [1, sin(\u03b8), cos(\u03b8), sin(2\u03b8), cos(2\u03b8)] for the direction \u03b8 of the\ngrating. This resulted in a 15 dimensional stimulus-parametrisation, and made it possible to\nrepresent tuning-curves with orientation and direction selectivity, as well as modulation of\n\ufb01ring rates by stimulus phase. The only gate input was chosen to be an indicator function\nwith unit value during stimulus presentation and zero value otherwise. Post-spike \ufb01lters\nwere parametrised using \ufb01ve cubic b-splines for the last 10 bins with a bin width of 50 ms.\n\n2See supplement for an example of how this could be implemented with Gaussian priors.\n\n5\n\n\fFigure 2: Performance of the model on simulated data. A) Example rasters sam-\npled using ground truth (GT) parameters, colors indicate sequence of underlying population\nstates. B) For the sample from (A), the state-sequence decoded with our variational Bayes\n(VB) method matches the decoded sequence using GT parameters. C) Comparison of state-\ndecoding performance using GT parameters, VB and maximum likelihood (ML) learning\n(Wilcoxon ranksum, * p < 0.05; *** p (cid:28) 0.001). D) Model performance quanti\ufb01ed using\nper-data-point log-likelihood di\ufb00erence between estimated and GT-model on test-set. Our\nVB method outperforms ML (Wilcoxon ranksum, *** p (cid:28) 0.001), and both models con-\nsiderably outperform a 1-state GLM (not shown). E) Estimated post-spike \ufb01lters match\nthe GT values well (depicted are the \ufb01lters from one of the cross-validated models). F)\nComparison of the autocorrelation of the ground truth data and samples drawn from the\nVB \ufb01t as in (E). G) GT and VB estimated transition matrices in absence (left) or presence\n(right) of a stimulus.\n\n3 Results\n\n3.1 Results on simulated data\n\nTo illustrate our model and to evaluate the estimation procedure on data with known ground\ntruth, we used a simulated population recording of 20 neurons by sampling from our model\n(details in Methods, see Fig. 2 A). In this simulation, the up-state had much higher \ufb01ring\nrates than the down-state. It was therefore possible to decode the underlying states from\nthe population spike trains with high accuracy (Fig. 2 B). For the VB method, we used\nthe posterior mean over parameters for state-inference. In addition, we compared both of\nthese approaches to state-decoding based on a model estimated using maximum likelihood\nlearning. All three models showed similar performance, but the decoding advantage of the\n3-state VB model was statistically signi\ufb01cant (using pairwise comparisons, Fig. 2 C).\n\nWe also directly evaluated performance of the VB and ML methods for parameter estimation\nby calculating the log-likelihood of the data on held-out test-data, and found that our VB\nmethod performed signi\ufb01cantly better than the ML method (Fig. 2 D). Finally, we also\ncompared the estimated post-spike \ufb01lters (Fig. 2 E), auto-correlation functions (Fig. 2 F)\nand state-transition matrices (Fig. 2 G) and found an excellent agreement between the GT\nparameters and the estimates returned by VB.\n\nTo test whether the VB method is able to determine the correct model complexity, we\n\ufb01t an over-parameterised model with 3 layers and potentially 8 states to the simulation\ndata. The best model \ufb01t from 200 random restarts (lower bound of \u22122.24 \u00d7 104, no cross-\nvalidation, results not shown) only used 3 out of the 8 possible states (the other 5 states\nhad a probability of less than 0.5 %). Therefore, in this example, the best lower bound is\nachieved by a model with correct, and low, complexity.\n\n6\n\n\fFigure 3: Results for population recordings from V1. A) Raster plot of population\nresponse to a drifting grating with orientation 67.5\u25e6. Arrows indicate stimulus onset and\no\ufb00set, colours show the most likely state sequence inferred with the 3-state variational Bayes\n(3S-VB) model. B) Cross-validated log-likelihoods per trial, relative to the 3S-VB model.\nC) Stimulus decoding performance, in percentage of correctly decoded stimuli (16 discrete\nstimuli, chance level 6.25 %), using maximum-likelihood decoding.\ni) Orientation tuning calculated from the\nD) Tuning properties of an example neuron.\ntuning-parameters of 3S-VB (red, orange, green) or 1-state GLM (purple).\niii) Temporal\ncomponent of tuning parameters. ii) Orientation tuning measured from sampled data of the\nestimated model, each line representing one state. Note that the \ufb01ring rate also depends\non state-transitions and post-spike \ufb01lters.\niv) Peri-stimulus time-histograms (PSTHs) es-\ntimated from samples of the estimated models. v) Post-spike \ufb01lters for each state, and\ncomparison with 1-state GLM (purple). E) Distributions of times spent in each state, i.e.\ninter-transition intervals (ITIs), estimated from the empirical data using 3S-VB. F) Compar-\nison between distribution of ITIs in samples from model 3S-VB and in the Viterbi-decoded\npath (from E).\nG) Histogram of population rates (i.e. number of synchronous spikes across the popula-\ntion in each 50 ms bin) for 3S-VB (blue), 1S (purple), and data (gray). H) Histograms of\npopulation rate for each state.\n\n3.2 Results on neurophysiological recordings\n\nWe analysed a neural population recording from V1 to determine whether we could success-\nfully identify cortical states by decoding the activity of the neural population, and whether\naccounting for state-dependence resulted in a more accurate statistical model of neural \ufb01ring.\nWhile neurons generally responded robustly to the stimulus (3 D), \ufb01ring rates were strongly\nmodulated by internal states [2] (Fig. 3 A). We \ufb01t di\ufb00erent models to data, and found that\nour 3-state model estimated with VB resulted in better cross-validation performance than\neither the 3-state model estimated with ML, the 2-state model or a 1-state GLM (i.e. a\nGLM without cross-neural couplings, Fig. 3 B). In addition we \ufb01t a fully coupled GLM\n(with cross-history terms as in [11, 13]), as well as one in which the total population count\nwas used as a history feature using VB. These models were intermediate between the 1-state\nGLM and the 2-state model, i.e. both worse than the 3-state one. A \u2019\ufb02at\u2019 3-states model\nwith a single multinomial gate estimated with ML performed similarly to the hierarchical\n3S-ML model. This is to be expected, as any di\ufb00erences in expressive power between the\ntwo models will only become substantial for a di\ufb00erent choice of xt or larger models.\n\n7\n\n-500010002000up high-rateup low-ratedownsampledempirical1S-GLMADEFGHBCcoupled 1S2SML 3SPR 1S1S\u0394loglikelihood***************-2-10accuracy (%)3Scoupled 1S2SML 3SPR 1S1S01020304050direction (deg)p\u03b8(spike)09018027036000.20.40.6direction (deg)spikes (hz)090180270360051015time (ms)pt(spike|\u03b8=67.5\u00ba)01000200000.20.40.6time (ms)spikes (hz)modulation010002000051015time (ms)5025050000.511.5empirical ITIs (ms)events per trial025050075000.511.5empirical ITIssampeled ITIs10-210010-410-2100number of spikes (per bin)population rate (%)0510010203040number of spikes (per bin)population rate (%)051001020304050iiiiiiivv\fWe also evaluated the ability of di\ufb00erent models to decode the stimulus, (i.e. the direction\nof the presented grating) from population spike trains. We evaluated the likelihood of each\npopulation spike train for each of the 16 stimulus directions, and decoded the stimulus which\nyielded the highest likelihood. The 3-state VB model shows best decoding performance\namong all tested models (3 C), and all models with state-dependence (3-state VB, 3-state\nML, 2-state) outperformed the 1-state GLM. We sampled from the estimated 3S-VB model\nto evaluate to what extent the model captures the tuning properties of neurons (Fig. 3 D(ii\n& iv)). The example neuron shows strong modulation of base \ufb01ring rate dependent on the\npopulation state, but not a qualitative change of the tuning properties (Fig. 3 D i-iv). The\ndown-state post-spike \ufb01lter (Fig. 3 D v) exhibits a small oscillatory component which is not\npresent in the post-spike \ufb01lters of the other states or the 1-state GLM.\n\nInvestigation of inter-transition-interval (ITI) distributions from the data (after Viterbi-\ndecoding) shows heavy tails (Fig. 3 E). Comparison of ITI-distribution estimated from the\nempirical data and from sampled data (3S-VB) show good agreement, apart from small\nde\ufb01ciencies of the model to capture the heavy tails of the empirical ITI distribution (Fig.\n3 F). Finally, population rates (i.e. total number of spikes across the population) are often\nused as a summary-measure for characterizing cortical states [6]. We found that the dis-\ntribution of population rates in the data was well matched by the distribution estimated\nfrom our model (Fig. 3 G) with the three states having markedly di\ufb00erent population rate\ndistributions (Fig. 3 H). Although a 1-state GLM also captured the tuning-properties of\nthis neuron (Fig. 3 D) it failed to recover the distribution of population rates (Fig. 3 G).\n\n4 Discussion\n\nWe presented a statistical method for extracting cortical states from multi-cell recordings of\nspiking activity. Our model is based on a \u2018state-dependent\u2019 GLM [10] in which the states are\norganised hierarchically and evolve over time according to a hidden Markov model. Whether,\nand in which situations, the best descriptions of cortical states are multi-dimensional, dis-\ncrete or continuous [25, 2] is an open question [6], and models like the one presented here\nwill help shed light on these questions. We showed that the use of variational inference\nmethods makes it possible to estimate the posterior over parameters. Bayesian inference\nprovides better model performance on limited data [13], uncertainty information, and is\nalso an important building block for active learning approaches [14]. Finally, it can be used\nto determine the best model complexity: For example, one could start inference with a\nmodel containing only one state and iteratively add states (as in divisive clustering) until\nthe variational bound stops increasing.\n\nCortical states can have a substantial impact on the \ufb01ring and coding properties of cortical\nneurons [6] and interact with inter-area communication [4, 3]. Therefore, a better under-\nstanding of the interplay between cortical states and sensory information, and the role of\ncortical states in gating information in local cortical circuits will be indispensable for our\nunderstanding of how populations of neurons collectively process information. Advances in\nexperimental technology enable us to record neural activity in large populations of neurons\ndistributed across brain areas. This makes it possible to empirically study how cortical\nstates vary across the brain, to identify pathways which in\ufb02uence state, and ultimately to\nunderstand their role in neural coding and computation. The combination of such data with\nstatistical methods for identifying the organisation of cortical states holds great promise for\nmaking progress on understanding state-dependent information processing in the brain.\n\nAcknowledgements\n\nWe are grateful to the authors of [2] for sharing their data (toliaslab.org/publications/ecker-\net-al-2014/) and to Alexander Ecker, William McGhee, Marcel Nonnenmacher and David\nJanssen for comments on the manuscript. This work was funded by the German Federal Min-\nistry of Education and Research (BMBF; FKZ: 01GQ1002, Bernstein Center T\u00a8ubingen) and\nthe Max Planck Society. Supplementary details and code are available at www.mackelab.org.\n\n8\n\n\fReferences\n\n[1] M. Steriade and R. W. McCarley, Brain Control of Wakefulness and Sleep. Kluwer Aca-\n\ndemic/plemum publishers, 2005.\n\n[2] A. S. Ecker, P. Berens, R. J. Cotton, M. Subramaniyan, G. H. Den\ufb01eld, C. R. Cadwell, S. M.\nSmirnakis, M. Bethge, and A. S. Tolias, \u201cState dependence of noise correlations in macaque\nprimary visual cortex,\u201d Neuron, vol. 82, no. 1, 2014.\n\n[3] E. Zagha, A. E. Casale, R. N. S. Sachdev, M. J. McGinley, and D. A. McCormick, \u201cMotor\ncortex feedback in\ufb02uences sensory processing by modulating network state,\u201d Neuron, vol. 79,\nno. 3, 2013.\n\n[4] N. K. Logothetis, O. Eschenko, Y. Murayama, M. Augath, T. Steudel, H. C. Evrard,\nM. Besserve, and A. Oeltermann, \u201cHippocampal-cortical interaction during periods of sub-\ncortical silence,\u201d Nature, vol. 491, no. 7425, 2012.\n\n[5] T. Bezdudnaya, M. Cano, Y. Bereshpolova, C. R. Stoelzel, J.-M. Alonso, and H. A. Swadlow,\n\n\u201cThalamic burst mode and inattention in the awake LGNd,\u201d Neuron, vol. 49, no. 3, 2006.\n\n[6] K. D. Harris and A. Thiele, \u201cCortical state and attention,\u201d Nature reviews. Neuroscience,\n\nvol. 12, no. 9, 2011.\n\n[7] M. A. Kisley and G. L. Gerstein, \u201cTrial-to-Trial Variability and State-Dependent Modulation\n\nof Auditory-Evoked Responses in Cortex,\u201d J. Neurosci., vol. 19, no. 23, 1999.\n\n[8] M. J. Beal, \u201cVariational algorithms for approximate bayesian inference,\u201d 2003.\n[9] L. M. Jones, A. Fontanini, B. F. Sadacca, P. Miller, and D. B. Katz, \u201cNatural stimuli evoke\n\ndynamic sequences of states in sensory cortical ensembles,\u201d PNAS, vol. 104, no. 47, 2007.\n\n[10] S. Escola, A. Fontanini, D. Katz, and L. Paninski, \u201cHidden Markov models for the stimulus-\nresponse relationships of multistate neural systems,\u201d Neural Computation, vol. 23, no. 5, 2011.\n[11] L. Paninski, J. Pillow, and J. Lewi, \u201cStatistical models for neural encoding, decoding, and\n\noptimal stimulus design,\u201d Progress in Brain Research, vol. 165, 2007.\n\n[12] Z. Chen, S. Vijayan, R. Barbieri, M. A. Wilson, and E. N. Brown, \u201cDiscrete- and continuous-\ntime probabilistic models and algorithms for inferring neuronal UP and DOWN states,\u201d Neural\nComputation, vol. 21, no. 7, 2009.\n\n[13] S. Gerwinn, J. H. Macke, and M. Bethge, \u201cBayesian inference for generalized linear models\n\nfor spiking neurons,\u201d Frontiers in Computational Neuroscience, vol. 4, no. 12, 2010.\n\n[14] J. Lewi, R. Butera, and L. Paninski, \u201cSequential optimal design of neurophysiology experi-\n\nments,\u201d Neural Computation, vol. 21, no. 3, 2009.\n\n[15] B. Shababo, B. Paige, A. Pakman, and L. Paninski, \u201cBayesian inference and online experimen-\ntal design for mapping neural microcircuits,\u201d in Advances in Neural Information Processing\nSystems 26, pp. 1304\u20131312, Curran Associates, Inc., 2013.\n\n[16] M. I. Jordan, Z. Ghahramani, and L. K. Saul, \u201cHidden markov decision trees,\u201d in Advances\n\nin Neural Information Processing Systems 9, pp. 501\u2013507, MIT Press, 1997.\n\n[17] M. I. Jordan and R. A. Jacobs, \u201cHierarchical Mixtures of Experts and the EM Algorithm,\u201d\n\nNeural Computation, vol. 6, no. 2, 1994.\n\n[18] T. S. Jaakkola and M. I. Jordan, \u201cA variational approach to Bayesian logistic regression models\n\nand their extensions,\u201d 1996.\n\n[19] C. M. Bishop and M. Svenskn, \u201cBayesian hierarchical mixtures of experts,\u201d in Proceedings of\nthe Nineteenth Conference on Uncertainty in Arti\ufb01cial Intelligence, UAI\u201903, (San Francisco,\nCA, USA), pp. 57\u201364, Morgan Kaufmann Publishers Inc., 2003.\n\n[20] G. Aston-Jones and J. D. Cohen, \u201cAn integrative theory of locus coeruleus-norepinephrine\nfunction: Adaptive gain and optimal performance,\u201d in Annual Review of Neuroscience, vol. 28,\npp. 403\u2013450, Annual Reviews, 2005.\n\n[21] Y. Bengio and P. Frasconi, \u201cAn input output hmm architecture,\u201d in Advances in Neural\n\nInformation Processing Systems 7, pp. 427\u2013434, MIT Press, 1995.\n\n[22] D. J. C. MacKay, \u201cEnsemble Learning for Hidden Markov Models,\u201d tech. rep., Cavendish\n\nLaboratory, University of Cambridge, 1997.\n\n[23] D. J. C. MacKay, \u201cBayesian Non-linear Modeling for the Prediction Competition,\u201d ASHRAE\n\nTransactions, vol. 100, no. 2, pp. 1053\u20131062, 1994.\n\n[24] C. M. Bishop, Pattern Recognition and Machine Learning. Information science and statistics,\n\nNew York: Springer, 2006.\n\n[25] J. H. Macke, L. Buesing, J. P. Cunningham, B. M. Yu, K. V. Shenoy, and M. Sahani, \u201cEmpir-\nical models of spiking in neural populations,\u201d in Advances in Neural Information Processing\nSystems, vol. 24, Curran Associates, Inc., 2011.\n\n9\n\n\f", "award": [], "sourceid": 1602, "authors": [{"given_name": "Patrick", "family_name": "Putzky", "institution": "Max Planck Institute for Biological Cybernetics"}, {"given_name": "Florian", "family_name": "Franzen", "institution": null}, {"given_name": "Giacomo", "family_name": "Bassetto", "institution": "Max Planck Institute for Biological Cybernetics"}, {"given_name": "Jakob", "family_name": "Macke", "institution": "MPI for Biological Cybernetics"}]}