{"title": "Analysis of Brain States from Multi-Region LFP Time-Series", "book": "Advances in Neural Information Processing Systems", "page_first": 2483, "page_last": 2491, "abstract": "The local field potential (LFP) is a source of information about the broad patterns of brain activity, and the frequencies present in these time-series measurements are often highly correlated between regions. It is believed that these regions may jointly constitute a ``brain state,'' relating to cognition and behavior. An infinite hidden Markov model (iHMM) is proposed to model the evolution of brain states, based on electrophysiological LFP data measured at multiple brain regions. A brain state influences the spectral content of each region in the measured LFP. A new state-dependent tensor factorization is employed across brain regions, and the spectral properties of the LFPs are characterized in terms of Gaussian processes (GPs). The LFPs are modeled as a mixture of GPs, with state- and region-dependent mixture weights, and with the spectral content of the data encoded in GP spectral mixture covariance kernels. The model is able to estimate the number of brain states and the number of mixture components in the mixture of GPs. A new variational Bayesian split-merge algorithm is employed for inference. The model infers state changes as a function of external covariates in two novel electrophysiological datasets, using LFP data recorded simultaneously from multiple brain regions in mice; the results are validated and interpreted by subject-matter experts.", "full_text": "Analysis of Brain States\n\nfrom Multi-Region LFP Time-Series\n\nKafui Dzirasa 2\n\nand Lawrence Carin 1\n\n1 Department of Electrical and Computer Engineering\n2 Department of Psychiatry and Behavioral Sciences\n\nKyle Ulrich 1, David E. Carlson 1, Wenzhao Lian 1, Jana Schaich Borg 2,\n\n{kyle.ulrich, david.carlson, wenzhao.lian, jana.borg,\n\nDuke University, Durham, NC 27708\n\nkafui.dzirasa, lcarin}@duke.edu\n\nAbstract\n\nThe local \ufb01eld potential (LFP) is a source of information about the broad patterns\nof brain activity, and the frequencies present in these time-series measurements\nare often highly correlated between regions. It is believed that these regions may\njointly constitute a \u201cbrain state,\u201d relating to cognition and behavior. An in\ufb01nite\nhidden Markov model (iHMM) is proposed to model the evolution of brain states,\nbased on electrophysiological LFP data measured at multiple brain regions. A\nbrain state in\ufb02uences the spectral content of each region in the measured LFP.\nA new state-dependent tensor factorization is employed across brain regions, and\nthe spectral properties of the LFPs are characterized in terms of Gaussian pro-\ncesses (GPs). The LFPs are modeled as a mixture of GPs, with state- and region-\ndependent mixture weights, and with the spectral content of the data encoded in\nGP spectral mixture covariance kernels. The model is able to estimate the number\nof brain states and the number of mixture components in the mixture of GPs. A\nnew variational Bayesian split-merge algorithm is employed for inference. The\nmodel infers state changes as a function of external covariates in two novel elec-\ntrophysiological datasets, using LFP data recorded simultaneously from multiple\nbrain regions in mice; the results are validated and interpreted by subject-matter\nexperts.\n\nIntroduction\n\n1\nNeuroscience has made signi\ufb01cant progress in learning how activity in speci\ufb01c neurons or brain ar-\neas correlates with behavior. One of the remaining mysteries is how to best represent and understand\nthe way whole-brain activity relates to cognition: in other words, how to describe brain states [1].\nAlthough different brain regions have different functions, neural activity across brain regions is of-\nten highly correlated. It has been proposed that the speci\ufb01c way brain regions are correlated at any\ngiven time may represent a \u201cstate\u201d designed speci\ufb01cally to optimize neural computations relevant\nto the behavioral context an organism is in [2]. Unfortunately, although there is great interest in the\nconcept of global brain states, little progress has been made towards developing methods to identify\nor characterize them.\nThe study of arousal is an important area of research relating to brain states. Arousal is a hotly\ndebated topic that generally refers to the way the brain dynamically responds to varying levels of\nstimulation [3]. One continuum of arousal used in the neuroscience literature is sleep (low arousal) to\nwakefulness (higher arousal). Another is calm (low arousal) to excited or stressed (high arousal) [4].\nA common electrophysiological measurement used to determine arousal levels is local \ufb01eld poten-\ntials (LFPs), or low-frequency (< 200 Hz) extracellular neural oscillations that represent coordinated\n\n1\n\n\f...\n\ns(a)\nw\u22121\n\nz(ar)\nw\u22121\n\ny(ar)\nw\u22121\n\n\u03b3\n\nR\n\ny1\n\ny2\n\nF\u21d0\u21d2\n\nClassify yw\n\nREM\n\nSWS\n\nWK\n\ns(a)\nw\n\nz(ar)\nw\n\ny(ar)\nw\n\nk(cid:96)(\u03c4 )\n\nR\n\ns(a)\nw+1\n\nz(ar)\nw+1\n\ny(ar)\nw+1\n\n...\n\nR\n\nA\n\n\u03b8(cid:96)\n\nL\n\nFigure 1: Left: Graphical representation of our state space model.\nWe \ufb01rst assign a sequence of brain states, {s(a)\n1 , with Markovian\ndynamics to animal a. Given state s(a)\nw , each region is assigned to\na cluster, z(ar)\nis generated\nfrom a Gaussian process with covariance function k(\u03c4 ; \u03b8(cid:96), \u03b3). Top:\nExample of two windows of an LFP time-series; we wish to classify\neach window based on spectral content. Spectral densities of known\nsleep states (REM, SWS, WK) in the hippocampus are shown.\n\nw = (cid:96) \u2208 {1, . . . , L}, and the data y(ar)\n\nw }W\n\nw\n\nneural activity across distributed spatial and temporal scales. LFPs are useful for describing overall\nbrain states since they re\ufb02ect activity across many neural networks. We examine brain states under\ndifferent levels of arousal by recording LFPs simultaneously in multiple regions of the mouse brain,\n\ufb01rst, as mice pass through different stages of sleep, and second, as mice are moved from a familiar\nenvironment to a novel environment to induce interest and exploration.\nIn neuroscience, the analysis of electrophysiological time-series data is largely centered around dy-\nnamic causal modeling (DCM) [5], where continuous state-space models are formulated based on\ndifferential equations that are speci\ufb01cally crafted around knowledge of underlying neurobiological\nprocesses. However, DCM is not suitable for exploratory analysis of data, such as inferring un-\nknown arousal levels, for two reasons: because the differential equations are driven by inputs of\nexperimental conditions, and the analysis is dependent on a priori hypotheses about which neuronal\npopulations and interactions are important. This work focuses on methods suitable for exploratory\nanalysis.\nPreviously published neuroscience studies distinguished between slow-wave sleep (SWS), rapid-\neye-movement (REM), and wake (WK) using proportions of high-frequency (33-55 Hz) gamma\noscillations and lower frequency theta (4-9 Hz) oscillations in a brain area called the Hippocam-\npus [6, 7]. As an alternative approach, recent statistical methods for tensor factorization [8] can be\napplied to short time Fourier transform (STFT) coef\ufb01cients by factorizing a 3\u2013way LFP tensor, with\ndimensions of brain region, frequency band and time. Distinct sleep states may then be revealed by\nclustering the inferred sequence of time-varying score vectors.\nAlthough good \ufb01rst steps, the above two methods have several shortcomings: 1) They do not con-\nsider the time dependency of brain activity, and therefore cannot capture state-transition properties.\n2) They cannot directly work on raw data, but require preprocessing that only considers spectral\ncontent in prede\ufb01ned frequency bins, thus leading to information loss. 3) They do not allow for in-\ndividual brain regions to take on their own set of sub-state characteristics within a given global brain\nstate. 4) Finally, they cannot leverage the shared information of LFP data across multiple animals.\nIn this paper we overcome the shortcomings of previously published brain-state methods by de\ufb01ning\na sequence of brain states over a sliding window of raw, \ufb01ltered LFP data, where we impose an\nin\ufb01nite hidden Markov model (iHMM) [9] on these state assignments. Conditioned on this brain\nstate, each brain region is assigned to a cluster in a mixture model. Each cluster is associated with a\nspeci\ufb01c spectral content (or density) pattern, manifested through a spectral mixture kernel [10] of a\nGaussian process. Each window of LFP data is generated as a draw from this mixture of Gaussian\nprocesses. Thus, all animals share an underlying brain state space, of which, all brain regions share\nthe underlying components of the mixture model.\n2 Model\nFor each animal a \u2208 {1, . . . A}, we have time-series of the LFP in R different regions, mea-\n\u2208 RN\nsured simultaneously. These time-series are split into sequential, sliding windows, y(ar)\nfor w \u2208 {1, . . . , W}, such that windows are common across regions. These windows are chosen\nto be overlapping, thereby sharing data points between consecutive windows; nonoverlapping win-\n\nw\n\n2\n\n0.511.52\u22122\u22121012NormalizedPotentialSecondsExampleLFPData4.28.312.516.700.020.040.060.08Frequency,HzSpectralDensity\f1 , . . . , s(a)\n\ndows may also be used. Each window is considered as a single observation vector, and we wish to\nmodel the generative process of these observations, {y(ar)\nw }.\nThe proposed model aims to describe the spectral content in each of these LFP signals, as a function\nof brain region and time. This is done by \ufb01rst assigning a joint \u201cbrain state\u201d to each time window,\n{s(a)\nW }, shared across all brain regions {1, . . . , R}. The brain state is assumed to evolve in\ntime as a latent Markov process. The LFP data from a particular brain region is assumed drawn from\na mixture of Gaussian processes. The characteristics of each mixture component are shared across\nbrain states and brain regions, with mixture weights that are dependent on these two entities.\n2.1 Brain state assignment\nWithin the generative process, each animal has a latent brain state for every time window, w. This\nbrain state is represented through a categorical latent variable s(a)\nw , and an in\ufb01nite hidden Markov\nmodel (iHMM) is placed on the state dynamics [9, 12]. This process is formulated as\n\n),\n\ns(a)\nw\u22121\n\nh\n\nw \u223c Categorical(\u03bb(a)\ns(a)\n\n\u03b2 \u223c GEM(\u03b30),\n(1)\nh \u223c Beta(1, \u03b30). Here,\nwhere GEM is the stick-breaking process \u03b2h = \u03b2(cid:48)\n{\u03b2h}H\nh=1 represents global transition probabilities to each state in a potentially in\ufb01nite state space.\nFor the stick-breaking process, H \u2192 \u221e, but in a \ufb01nite collection of data only a \ufb01nite number of\nstate transitions will be used and H can be ef\ufb01ciently truncated. Since the state space is shared\n1 \u223c\nacross animals, we cannot prede\ufb01ne initial state assignments, s(a)\nCategorical(\u03c8(a)) and place a discrete uniform prior on \u03c8(a) over the truncated state space.\n\ng \u223c DP(\u03b10\u03b2),\n(cid:81)h\u22121\n\u03bb(a)\ni=1 (1 \u2212 \u03b2(cid:48)\n\n1 . To remedy this, we allow s(a)\n\ni) with \u03b2(cid:48)\n\ng\n\nsuch that the transition from state g to state h for animal a is \u03bb(a)\n\nEach animal is given a transition matrix \u039b(a), where each row of this matrix is a transition proba-\nbility vector \u03bb(a)\ngh , each centered\naround the global transition vector \u03b2. Because each animal\u2019s brain can be structured differently\n(e.g., as an extreme case, consider a central nervous system disorder), we allow \u039b(a) to vary from\nanimal to animal.\n2.2 Assigning brain regions to clusters\nFor each brain state, mixture weights are drawn to de\ufb01ne the distribution over clusters independently\nfor each region r, centered around a global mixture \u03b7 using a hierarchical Dirichlet Process [12]:\n\nh \u223c DP(\u03b11\u03b7),\n\u03c6(r)\n\n\u03b7 \u223c GEM(\u03b31),\n\n(2)\n\nwhere \u03c6(r)\ncluster assignment can be written as\n\nh(cid:96) is the probability of assigning region r of a window with brain state h to cluster (cid:96). This\n\nw |s(a)\nz(ar)\n\nw \u223c Categorical(\u03c6(r)\n\ns(a)\nw\n\n).\n\n(3)\n\nFor each cluster (cid:96) there is a set of parameters, \u03b8(cid:96), describing a Gaussian process (GP), detailed in\nSection 2.3. One could consider the joint probability over cluster assignments for all brain regions\nas an extension of a latent nonnegative PARAFAC tensor decomposition [11, 13]. We refer to the\nSupplemental Material for details. Our clustering model differs from the in\ufb01nite tensor factorization\n(ITF) model of [11] in three signi\ufb01cant ways: we place Markovian dynamics on state assignments\nfor each animal, we model separate draws from the prior jointly for each animal, and we share\ncluster atoms across all regions through use of an HDP.\n2.3\n2.3.1 Gaussian processes and the spectral mixture kernel\n\nIn\ufb01nite mixture of Gaussian processes\n\nw \u2208 RN , we wish to model the data in the limit of a continuous-\nFor a single window of data, y(ar)\ntime function (allowing N \u2192 \u221e), motivating a GP formulation, and we are interested in the spectral\nproperties of the LFP signal in this window. Previous research has established a link between the\nkernel function of a GP and its spectral properties [10]. We write a distribution over the time-series:\n\ny(t) \u223c GP(m(t), k(t, t(cid:48))),\n\n(4)\n\n3\n\n\fwhere m(t) is known as the mean function, and k(t, t(cid:48)) is the covariance function [14]. This frame-\nwork provides a \ufb02exible, structured method to model time-series data. The structure of observations\nin the output space, y, is de\ufb01ned through a careful choice of the covariance function. Since this work\naims to model the spectral content of the LFP signal, we set the mean function to 0, and use a re-\ncently proposed spectral mixture (SM) kernel [10]. This kernel is de\ufb01ned through a spectral domain\nrepresentation, S(s), of the stationary kernel, represented by a mixture of Q Gaussian components:\n\n\u03c6(s) =\n\n\u03c9qN (s; \u00b5q, \u03bdq),\n\nS(s) =\n\n[\u03c6(s) + \u03c6(\u2212s)],\n\n1\n2\n\n(5)\n\nq=1\n\n(cid:88)Q\n\n(cid:88)Q\n\nwhere \u03c6(s) is re\ufb02ected about the origin to obtain a valid spectral density, and \u00b5q, \u03bdq, and \u03c9q respec-\ntively de\ufb01ne the mean, variance, and relative weight of the q-th Gaussian component in the spectral\ndomain. Priors may be placed on these parameters; for example, we use the uninformative priors\n\u00b5q \u223c Uniform(\u00b5min, \u00b5max), \u03bdq \u223c Uniform(0, \u03bdmax) and \u03c9q \u223c Gamma(e0, f0). A bandpass \ufb01lter\nis applied to the LFP signal from \u00b5min to \u00b5max Hz as a preprocessing step, so this prior knowledge\nis justi\ufb01ed. Also, \u03bdmax is set to prevent over\ufb01tting, and e0 and f0 are set to manifest a broad prior.\nWe assume that only a noisy version of the true function is observed, so the kernel is de\ufb01ned as the\nFourier transform of the spectral density S(s) plus white Gaussian noise:\n\nq=1\n\nk(\u03c4 ; \u03b8, \u03b3) = f (\u03c4 ; \u03b8) + \u03b3\u22121\u03b4\u03c4 ,\n\n\u03c9q exp{\u22122\u03c02\u03c4 2\u03bdq} cos(2\u03c0\u03c4 \u00b5q),\n\nf (\u03c4 ; \u03b8) =\n\n(6)\nwhere the set of parameters \u03b8 = {\u03c9, \u00b5, \u03bd} and \u03b3 de\ufb01ne the covariance kernel, \u03c4 = |t\u2212 t(cid:48)|, and \u03b4\u03c4 is\nthe Kronecker delta function which equals one if \u03c4 = 0. We set the prior \u03b3 \u223c Gamma(e1, f1) where\nthe hyperparemeters e1 and f1 are chosen to manifest a broad prior. The formulation of (6) results\nin an interpretable kernel in the spectral domain, where the weights \u03c9q correspond to the relative\ncontribution of each component, the means \u00b5q represent spectral peaks, and the variances \u03bdq play a\nrole similar to an inverse length-scale.\nThrough a realization of this Gaussian process, an analytical representation is obtained for the\nmarginal likelihood of the observed data y given the parameters {\u03b8, \u03b3}, and the observation loca-\ntions t, p(y|\u03b8, \u03b3, t). The optimal set of kernel parameters {\u03b8, \u03b3} can then be chosen as the set that\nmaximizes the marginal likelihood. Further discussions on the inference for the Gaussian process\nparameters is presented in Section 3.\n2.3.2 Generating observed data\nTo combine the clustering model with our SM kernel, each cluster (cid:96) is associated with a distinct\n\u2208 RN has\nset of kernel parameters \u03b8(cid:96). To generate the observations {y(ar)\nobservation times t = {t1, . . . , tN} such that |ti \u2212 tj| = |i \u2212 j|\u03c4 for all i and j, we consider a draw\nfrom the multivariate normal distribution:\n\nw }, where each y(ar)\n\nw\n\nw \u223c N (0, \u03a3z(ar)\ny(ar)\n\nw\n\n),\n\n(\u03a3(cid:96))ij = k(|ti \u2212 tj|; \u03b8(cid:96), \u03b3),\n\n(7)\n\nw\n\nw\n\nwhere each observation is generated from the cluster indicated by z(ar)\n(described in Section 2.2),\nand each cluster is represented uniquely by a covariance matrix, \u03a3(cid:96), whose elements are de\ufb01ned\nthrough the covariance kernel k(\u03c4 ; \u03b8(cid:96), \u03b3). Therefore, the parameters \u03b8z(ar)\ndescribe the auto-\ncorrelation content associated with each y(ar)\nw .\nWe address two concerns with this formulation. First, this observation model ignores complex\ncross-covariance functions between regions. Although LFP measurements exhibit coherence pat-\nterns across regions, the generative model in (7) only weakly couples the spectral densities of each\nregion through the brain state. In principle, the generative model could be extended to incorporate\nthis coherence information. Second, (7) does not model the time-series itself as a stochastic process,\nbut rather the preprocessed, \u2018independent\u2019 observation vectors. This shortcoming is not ideal, but\nthe windowing process allows for ef\ufb01cient computation via the mixture of Gaussian processes.\n3\nIn the following, latent model variables are represented by \u2126 = {Z, S, \u03a6, \u03b7, \u039b, \u03b2, \u03a8}, the kernel\nparameters to be optimized are \u0398 = {{\u03b8(cid:96)}L\n1 , \u03b3}, and H and L are upper limit truncations on the\nnumber of brain states and clusters, respectively. As described throughout this section, the proposed\nalgorithm adaptively adjusts the truncation levels on the number of brain states, H, and clusters, L,\n\nInference\n\n4\n\n\fthrough a series of split-merge moves. The joint probability of the proposed model is\n\np(Y ,\u2126, \u0398) = p(Y |Z, \u0398)p(Z, S|\u03a6, \u039b, \u03a8)p(\u03a6|\u03b7)p(\u03b7)p(\u039b|\u03b2)p(\u03b2)p(\u03a8)p(\u0398)\n\nw , \u0398)p(z(ar)\n\nw |s(a)\n(cid:89)W\n\na,r,w\n\np(s(a)\n\na\n\nw |z(ar)\np(y(ar)\n1 |\u03c8(a))p(\u03c8(a))\n(cid:89)Q\n\nw=2\n\n(cid:105)(cid:104)\nw , \u03a6)\nw |s(a)\n\np(\u03b7|\u03b31)\n\np(\u03c6(r)\n\n(cid:89)\n(cid:21)(cid:104)\n\nr,h\n\np(\u03b2|\u03b30)\n\n(cid:21)\n\n(cid:105)\n(cid:105)\nh |\u03b7, \u03b11)\n(cid:89)\ng |\u03b2, \u03b10)\n\np(\u03bb(a)\n\na,g\n\np(s(a)\n\nw\u22121, \u039b(a))\n\n=\n\n(cid:104)(cid:89)\n(cid:20)(cid:89)\n(cid:20)\n\n(cid:89)\n(cid:89)\n(cid:89)\n\na\n\na\n\na,r,w\n\nCat(z(ar)\nw }W\n\nw=1),\n\nq({s(a)\n\n\u03b4\u03c8(a)\u2217 (\u03c8(a)),\n\nq(S) =\n\nq(\u03a8) =\n\np(\u03b3|e1, f1)\n\np(\u03c9q|e0, f0)p(\u00b5q|\u00b5min, \u00b5max)p(\u03bdq|\u03bdmax)\n\n.\n\nq=1\n\n(8)\n\nA variational inference scheme is developed to update \u2126 and \u0398.\n3.1 Variational inference\nthat\nWith variational\nis similar\nThis variational\nposterior is assumed to have a factorization into simpler distributions, where q(\u2126, \u0398) =\nq(Z)q(S)q(\u03a6)q(\u03b7)q(\u039b)q(\u03b2)q(\u03a8)q(\u0398), with further factorization\nDir(\u03c6(r)\n\nto the true posterior distribution, q(\u2126, \u0398) \u2248 p(\u2126, \u0398|Y ).\n\ninference, an approximate variational posterior distribution is sought\n\nq(\u03b7) = \u03b4\u03b7\u2217 (\u03b7),\n\nq(Z) =\n\nq(\u03a6) =\n\nw ; \u03b6(ar)\n\nw ),\n\nh ; \u03bd(r)\nh ),\n\nq(\u039b) =\n\nq(\u0398) =\n\nDir(\u03bb(a)\n\ng ; \u03ba(a)\n\ng ),\n\n\u03b4\u0398\u2217\n\nj\n\n(\u0398j),\n\nj\n\nq(\u03b2) = \u03b4\u03b2\u2217 (\u03b2),\n\n(9)\n\nw }W\n\nj}, respectively.\n\nw = (cid:96)) = 0 for (cid:96) > L and q(s(a)\n\nwhere only necessary suf\ufb01cient statistics of the latent factors q({s(a)\nw=1) are required, and the\napproximate posteriors of \u03b7, \u03b2, {\u03c8(a)} and {\u0398j} are represented by point estimates at \u03b7\u2217, \u03b2\u2217,\n{\u03c8(a)\u2217} and {\u0398\u2217\nThe degenerate distributions \u03b4\u03b7\u2217 (\u03b7) and \u03b4\u03b2\u2217 (\u03b2) are described in previous work on variational in-\nference for HDPs [15, 16]. The idea is that the point estimates of the stick-breaking processes\nsimplify the derivation of the variational posterior, and the authors of [16] show that obtaining a\nfull posterior distribution on the stick-breaking weights has little impact on model \ufb01tting since the\nvariational lower bound is not heavily in\ufb02uenced by the terms dependent on \u03b7 and \u03b2. Further-\nmore, the Dirichlet process is truncated for both the number of states and the number of clusters\nsuch that q(z(ar)\nw = h) = 0 for h > H. This truncation method\n(see [17] for details) is notably different than other common truncation methods of the DP (e.g., [18]\nand [19]), and is primarily important for facilitating the split-merge inference techniques described\nin Section 3.2.\nIn mean-\ufb01eld variational inference, the variational distribution q(\u2126, \u0398) is chosen such that the\nKullback-Leibler divergence of p(\u2126, \u0398|Y ) from q(\u2126, \u0398), DKL(q(\u2126, \u0398)||p(\u2126, \u0398|Y )), is mini-\nmized. This is equivalent to maximizing the evidence lower bound (also known as the variational\nfree energy in the DCM literature), L(q) = Eq[log p(Y , \u2126, \u0398)] \u2212 Eq[log q(\u2126, \u0398)], where both\nexpectations are taken with respect to the variational distribution. The resulting lower bound is\nL(q) = E[ln p(Y |Z, \u0398)] + E[ln p(Z, S|\u03a6, \u039b, \u03a8)] + E[ln p(\u03a6|\u03b7)] + E[ln p(\u03b7)] + E[ln p(\u039b|\u03b2)]\n(10)\nwhere all expectations are with respect to the variational distribution, the hyperparameters are ex-\ncluded for notational simplicity, and we de\ufb01ne H[q(\u00b7)] as the sum over the entropies of the individual\nfactors of q(\u00b7). Due to the degenerate approximations for q(\u03b7), q(\u03b2), q(\u03a8) and q(\u0398), these full\nposterior distributions are not obtained, and, therefore, the terms H[q(\u03b7)], H[q(\u03b2)], H[q(\u03a8)] and\nH[q(\u0398)] are set to zero in the lower bound.\nThe updates for \u03b6(ar)\nare standard. Variational inference for the HDP-HMM is detailed\nin other work (e.g., see [20, 21]); using these methods, updates for \u03ba(a)\ng , \u03c8(a) and the necessary\nexpected suf\ufb01cient statistics of the factors of q(S) are realized. Finally, updates for \u03b2\u2217, \u03b7\u2217 and\n{\u0398j} are non-conjugate, so a gradient-ascent method is performed to optimize these values. We\nuse a simple resilient back-propagation (Rprop), though most line-search methods should suf\ufb01ce.\nDetails on all updates and taking the gradient of L(q) with respect to \u03b2, \u03b7 and {\u0398j} are found in\nthe Supplemental Material.\n\n+ E[ln p(\u03b2)] + E[ln p(\u03a8)] + E[ln p(\u0398)] + H[q(Z)] + H[q(S)] + H[q(\u03a6)] + H[q(\u039b)],\n\nand \u03bd(r)\nh\n\nw\n\n(cid:89)\n(cid:89)\n(cid:89)\n\nh,r\n\ng,a\n\n5\n\n\f(cid:96)\n\n(cid:96)\n\nh(cid:96)(cid:48)(cid:48), \u03b7\u2217\n\n(cid:96) = \u03b7\u2217\n\n(cid:96)(cid:48) + \u03b7\u2217\n\n(cid:96)\n\ngh(cid:48)(cid:48), \u03b2\u2217\n\n(cid:96)(cid:48)(cid:48), and \u03b8new\n\nh = \u03b2\u2217\n\nh(cid:48) + \u03b2\u2217\n\ngh = \u03ba(a)\n\ngh(cid:48) + \u03ba(a)\n\nwh(cid:48) + \u03c1(a)\n\nwh(cid:48)(cid:48), \u03ba(a)\n\nh(cid:96) = \u03bd(r)\n\nh(cid:96)(cid:48) + \u03bd(r)\n\nh = v(a)\n\nh(cid:48) + v(a)\n\nw(cid:96)(cid:48) + \u03b6 (ar)\n\nw(cid:96)(cid:48)(cid:48) , \u03bd(r)\n\nh(cid:48)(cid:48), and v(a)\n\n3.2 Split-merge moves\nDuring inference, a series of split and merge operations are used to help the algorithm jump out of\nlocal optima [22]. This work takes the viewpoint that two clusters (or states) should merge only if\nthe variational lower bound increases, and, when a split is proposed for a cluster (or state), it should\nalways be accepted, whether or not the split increases the variational lower bound. If the split is\nnot appropriate, a future merge step is expected to undo this operation. In this way, the opportunity\nis provided for cluster and state assignments to jump out of local optima, allowing the inference\nalgorithm to readjust assignments as desired.\nMerge states: To merge states h(cid:48) and h(cid:48)(cid:48) into a new state h, new parameters are initialized as:\n\u03c1(a)\nwh = \u03c1(a)\nh(cid:48)(cid:48) , such\nthat the model now has a truncation at H new = H \u2212 1 states. In order to account for problems\nwith merging two states in an HMM, a single restricted iteration is allowed, where only the state-\ndependent variational parameters in \u2126new are updated, producing a new distribution q(\u2126new). The\nmerge is accepted (i.e., \u2126 = \u2126new) if L(q(\u2126new)) > L(q(\u2126)). Since these computations are not\nexcessive, all possible state merges are computed and a small number of merges are accepted per\niteration.\nMerge clusters: To merge clusters (cid:96)(cid:48) and (cid:96)(cid:48)(cid:48) into a new cluster (cid:96), new parameters are initialized as:\n= \u03b8\u2217, such that there is a\n\u03b6 (ar)\nw(cid:96) = \u03b6 (ar)\ntruncation at Lnew = L \u2212 1 clusters. We set \u03b8\u2217 = \u03b8(cid:96)(cid:48) for simplicity, and allow a restricted iteration\n. The merge is accepted (i.e., \u2126 = \u2126new and \u0398 = \u0398new) if the lower\nof updates to \u2126new and \u03b8new\nbound is improved, L(q(\u2126new, \u0398new)) > L(q(\u2126, \u0398)). Since the restricted iteration for \u03b8new\nis\nexpensive, only a few cluster merges may be proposed at a time. Therefore, merges are proposed\nfor clusters with the smallest earth mover\u2019s distance [23] between their spectral densities.\nSplit step: When splitting states and clusters, the opposite process to the initialization of the merging\nprocedures described above is performed. For clusters, data points within a cluster (cid:96) are randomly\nchosen to stay in cluster (cid:96) or split to a new cluster (cid:96)(cid:48). For splitting state h, the cluster assignment\nvector \u03c6(r)\nh is replicated and windows within state h are randomly chosen to stay in state h or split\nto a new cluster h(cid:48). Regardless of how this effects the lower bound, a split step is always accepted.\nFor implementation details, we allow the model to accept 3 state merges every third iteration, pro-\npose 5 cluster merges every third iteration, and split one state and one cluster every third iteration.\nTherefore, every iteration may affect the truncation level of either the number of states or clusters.\nA \u2018burn-in\u2019 period is allowed before starting the proposing of splits/merges, and a \u2018burn-out\u2019 period\nis employed in which split proposals cease. In this way, the algorithm has guarantees of improving\nthe lower bound only during iterations when a split is not proposed, and convergence tests are only\nconsidered during the burn-out period.\n4 Datasets\nThree datasets are considered in this work, as follows:\nToy data: Data is generated for a single animal according to the proposed model in Section 2. The\npurpose of this dataset is to ensure the inference scheme can recover known ground truth, since\nground truth information is not known for the real datasets. We set L = 5 and H = 3. For each\ncluster, a spectral density was generated with Q = 4, \u00b5q \u223c Unif(4, 50), \u03bdq \u223c Unif(1, 50) and\n\u03c9 \u223c Dir(1, . . . , 1). The cluster usage probability vector was drawn \u03c6(r)\n10 ). State\ntransition probabilities were drawn according to \u03bbgh \u223c Unif(0, 1) + 10\u03b4(g=h). States were assigned\nto W = 1000 windows according to an HMM with transition matrix \u039b, and cluster assignments\nwere drawn conditioned on this state. Data with N = 200 was drawn for each window.\nSleep data: Twelve hours of LFP data from sixteen different brain regions were recorded from three\nmice naturally transitioning through different levels of sleep arousal. Due to the high number of\nbrain regions, we present only three hours of sleep data from a single mouse for simplicity. The\nmulti-animal analysis is reserved for the novel environment dataset.\nNovel environment data: Thirty minutes of LFP data from \ufb01ve brain regions was recorded from\n\ufb01ve mice who were moved from their home cage to a novel environment approximately nine minutes\ninto the recording. Placing animals into novel environments has been shown to increase arousal, and\n\nh \u223c Dir( 1\n\n10 , . . . , 1\n\n6\n\n\fshould therefore result in (at least one) network state change [3]. Data acquisition methods for the\nlatter two datasets are discussed in [24].\n5 Results\nFor all results, we set Q = 10, H = 15, L = 25, stop the \u2018burn-in\u2019 period after iteration 6, and start\nthe subsequent computation period after iteration 25. Hyperparameters were set to \u03b30 = \u03b31 = .01,\n\u03b10 = \u03b11 = 1, \u00b5min = 0, \u00b5max = 50, \u03bdmax = 10, and e0 = f0 = 10\u22126. In all results, the model\nwas seen to converge to a local optima after 30 iterations, and each iteration took on the order of 20\nseconds using Matlab code on a PC with a 2.30GHz quad-core CPU and 8GB RAM.\nFigure 2 shows results on the toy data. The model correctly recovers exactly 3 states and 5 clusters,\nand, as seen in the \ufb01gure, the state assignments and spectral densities of each cluster component\nare recovered almost perfectly. The model was implemented for different values of the noise vari-\nance, \u03b3\u22121, and, though not shown, in all cases the noise variance was recovered accurately during\ninference, implying the spectral mixture kernels are not over\ufb01tting the noise. In this way, we con-\n\ufb01rm that the inference scheme recovers a ground truth. For further model veri\ufb01cation, ten-fold\ncross-validation was used to compute predictive probabilities for held-out data (reported in Table 1),\nwhere we compare to two simpler versions of our model: 1) the HDP-HMM on brain states in (1) is\nreplaced with an HDP, and 2) a single brain state. For the HDP-HMM, the hold-out data was consid-\nered as \u2018missing data\u2019 in the training data and the window index was used to assign time-dependent\nprobabilities over clusters, whereas in the HDP and Single State models it was simply withheld from\nthe training data. We see large predictive performance gains when considering multiple brain states,\nand even more improvement on average (though modest) when considering an HDP-HMM.\n\nFigure 2: Toy data results. Top row shows the generated toy data. From left to right: the \ufb01ve spectral functions,\neach associated with a component in the mixture model; the probability of each of these \ufb01ve components\noccurring for all \ufb01ve regions in each brain state; the generated brain state assignments from a 3-state HMM\nalong with the generated cluster assignments for the \ufb01ve simulated regions. The bottom row shows the results\nof our model. On the left, a comparison of the recovered state vs. the true state for all time; on the right, an\nalignment of the \ufb01ve recovered kernels to the spectral density ground truth.\n\ne\nr\no\nc\n\np\np\ni\nH\nD\n\nc\nA\nN\n\nC\nF\nO\n\nA\nT\nV\n\ne\nr\no\nc\n\np\np\ni\nH\nD\n\nc\nA\nN\n\nC\nF\nO\n\nA\nT\nV\n\ne\nr\no\nc\n\np\np\ni\nH\nD\n\nc\nA\nN\n\nC\nF\nO\n\nA\nT\nV\n\nFigure 3: Sleep data results. Top: A comparison of brain state assignments from our method to two other\nmethods. Bottom Left: Spectral density of the 7 inferred clusters. Middle Left: Cluster assignments over\ntime for 16 different brain regions, sorted by similarity. Middle Right: Given brain states 1, 2 and 3, we show\ncluster assignment probabilities for 4 different brain regions: the hippocampus (D Hipp), nucleus accumbens\ncore (NAc core), orbitofrontal cortex (OFC) and ventral tegmental area (VTA) from left to right, respectively.\nRight: State assignments of our method and the tensor method conditioned on the method of [6].\n\n7\n\n01020304050\u22127\u22126\u22125\u22124\u22123\u22122Frequency,HzLogSpectralDensity12300.20.40.60.81BrainState,(5ChannelsperState)ClusterUsageforeachBrainStateClust 1Clust 2Clust 3Clust 4Clust 5MinutesSimulatedBrainActivityoverTime51015 \u2019Region 1\u2019 \u2019Region 2\u2019 \u2019Region 3\u2019 \u2019Region 4\u2019 \u2019Region 5\u2019 Clust 1Clust 2Clust 3Clust 4Clust 5MinutesStateAssignmentComparison51015True StateInferred StateState 1State 2State 302550\u22128\u22126\u22124\u22122Freq,HzLogSpectralDensity02550Freq,Hz02550Freq,Hz02550Freq,Hz02550Freq,HzMinutesBrainActivityoverTime51015202530354045505560Dzirasa et al.Tensor MethodOur MethodState 1State 2State 3State 4State 5State 6State 7State 8246810121400.050.10.150.2Frequency,HzSpectralDensityMinutesBrainActivityoverTime51015202530DLSDMSFrAM1M_OFC_CxOFCBasal_AmyD_HippL_HbNAc_CoreNAc_ShellMD_ThalPrL_CxSubNigraV1VTAClust 1Clust 2Clust 3Clust 4Clust 5Clust 6Clust 7132BrainState(Showing4/16Regions)ClusterUsagegivenState/RegionClust 1Clust 2Clust 3Clust 4Clust 5Clust 6Clust 7TensorOur00.20.40.60.81StateUsageGiven(Dzirasaetal.)State 1State 2State 3State 4State 5State 6State 7State 8\fFigure 4: Novel environment data results. Left: The log spectral density of the 6 inferred clusters. Middle:\nState assignments for all 9 animals over a 30 minute period. There are 7 inferred states, and each state has a\ndistribution over clusters for each region, as seen on the right.\n\nHDP-HMM\n\nDataset\n\u22121.686 (\u00b10.053) \u22121.688 (\u00b10.053) \u22121.718 (\u00b10.054)\nToy (\u00d7105)\nSleep (\u00d7106) \u22121.677 (\u00b10.030) \u22121.682 (\u00b10.020) \u22121.874 (\u00b10.019)\nNovel (\u00d7105) \u22125.932 (\u00b10.040) \u22125.973 (\u00b10.034) \u22126.962 (\u00b10.063)\n\nHDP\n\nSingle State\n\nTable 1: Average held-out log predictive probability for different priors on brain states: HDP-HMM, HDP,\nand a single state. The data consists of W time-series windows for R regions of A animals; at random, 10% of\nthese time-series windows were held-out, and the predictive distribution was used to determine their likelihood.\n\nThe sleep and novel environment results are presented in Figures 3 and 4, respectively. With the\nsleep dataset, our results are compared with the two methods discussed in the Introduction: that\nof [6, 7], and the tensor method of [8]. We refer to the Supplemental Material for exact speci\ufb01cations\nof the tensor method.\nFor each of these datasets, we infer the intended arousal states.\nIn the novel environment data,\nwe observe broad arousal changes at 9\u2013minutes for all animals, as expected. In the sleep data, we\nsuccessfully uncover at least as many states as the simple approach of [6, 7], to include SWS, REM\nand WK states. Thus far neuroscientists have focused primarily on 2 stages of sleep (NREM and\nREM), but as many as 5 have been discussed (4 different stages of NREM sleep, and 1 stage of\nREM). Different stages of sleep affect memory and behavior in different ways (e.g., see [25]), as\ndoes the number of times animals transition between these states [26]. Our results suggest that there\nmay be even more levels of sleep that should be considered (e.g., transition states and sub states).\nThis is very interesting and important for neuroscientists to know, because it is possible that each\nof our newly observed states could affect memory and behavior in different ways. There is no other\npublished method that has provided evidence of these other states.\nIn addition to brain states, we infer spectral information for each brain region through cluster assign-\nments. Though not the primary focus of this work, it is interesting that groups of brain regions tend\nto share similar attributes. In Figure 3, we have sorted brain regions into groups based on cluster\nassignment similarity, essentially recovering a \u2018network\u2019 of the brain. This underscores the power of\nthe proposed method: not only do we develop unsupervised methods to classify whole-brain activity\ninto states, we infer the cross-region/animal relationships within these states.\n6 Conclusion\nThe contributions of this paper are three-fold. First, we design an extension of the in\ufb01nite tensor\nmixture model, incorporating time dependency. Second, we develop variational inference for the\nproposed generative model, including an ef\ufb01cient inference scheme using split-merge moves for two\ngeneral models: the ITM and iHMM. To the authors\u2019 knowledge, neither of these inference schemes\nhave been developed previously. Finally, with respect to neuroscience application, we model brain\nstates given multi-channel LFP data in a principled manner, showing signi\ufb01cant advantages over\nother potential approaches to modeling brain states. Using the proposed framework, we discover\ndistinct brain states directly from the raw, \ufb01ltered data, de\ufb01ned by their spectral content and network\nproperties, and we can infer relationships between and share statistical strength across data from\nmultiple animals.\nAcknowledgments\nThe research reported here was funded in part by ARO, DARPA, DOE, NGA and ONR.\n\n8\n\n246810121400.050.10.150.20.250.30.35Frequency,HzSpectralDensityMinutesBrainActivityoverTime510152025Animal 1Animal 2Animal 3Animal 4Animal 5Animal 6Animal 7Animal 8Animal 9State 1State 2State 3State 4State 5State 6State 7123456700.20.40.60.81BrainState,(5ChannelsperState)ClusterUsageforeachBrainStateClust 1Clust 2Clust 3Clust 4Clust 5Clust 6\fReferences\n[1] C D Gilbert and M Sigman, \u201cBrain States: Top-down In\ufb02uences in Sensory Processing.,\u201d Neuron, vol.\n\n54, no. 5, pp. 677\u201396, June 2007.\n\n[2] A Kohn, A Zandvakili, and M A Smith, \u201cCorrelations and Brain States: from Electrophysiology to\n\nFunctional Imaging.,\u201d Curr. Opin. Neurobiol., vol. 19, no. 4, Aug. 2009.\n\n[3] D Pfaff, A Ribeiro, J Matthews, and L Kow, \u201cConcepts and Mechanisms of Generalized Central Nervous\n\nSystem Arousal,\u201d ANYAS, Jan. 2008.\n\n[4] P J Lang and M M Bradley, \u201cEmotion and the Motivational Brain,\u201d Biol. Psychol., vol. 84, no. 3, pp.\n\n437\u201350, July 2010.\n\n[5] K J Friston, L Harrison, and W Penny, \u201cDynamic Causal Modelling,\u201d NeuroImage, vol. 19, no. 4, pp.\n\n1273\u20131302, 2003.\n\n[6] K Dzirasa, S Ribeiro, R Costa, L M Santos, S C Lin, A Grosmark, T D Sotnikova, R R Gainetdinov, M G\nCaron, and M A L Nicolelis, \u201cDopaminergic Control of Sleep\u2013Wake States,\u201d J. Neurosci., vol. 26, no.\n41, pp. 10577\u201310589, 2006.\n\n[7] D Gervasoni, S C Lin, S Ribeiro, E S Soares, J Pantoja, and M A L Nicolelis, \u201cGlobal Forebrain Dynamics\nPredict Rat Behavioral States and their Transitions,\u201d J. Neurosci., vol. 24, no. 49, pp. 11137\u201311147, 2004.\n[8] P Rai, Y Wang, S Guo, G Chen, D Dunson, and L Carin, \u201cScalable Bayesian Low-Rank Decomposition\n\nof Incomplete Multiway Tensors,\u201d ICML, 2014.\n\n[9] M J Beal, Z Ghahramani, and C E Rasmussen, \u201cThe In\ufb01nite Hidden Markov Model,\u201d NIPS, 2002.\n[10] A G Wilson and R P Adams, \u201cGaussian Process Kernels for Pattern Discovery and Extrapolation,\u201d ICML,\n\n2013.\n\n[11] J Murray and D B Dunson, \u201cBayesian learning of joint distributions of objects,\u201d AISTATS, 2013.\n[12] Y W Teh, M I Jordan, M J Beal, and D M Blei, \u201cSharing Clusters Among Related Groups : Hierarchical\n\nDirichlet Processes,\u201d NIPS, 2005.\n\n[13] R A Harshman, \u201cFoundations of the Parafac Procedure,\u201d Work. Pap. Phonetics, 1970.\n[14] C E Rasmussen and C K I Williams, Gaussian Processes for Machine Learning, vol. 14, Apr. 2006.\n[15] M Bryant and E B Sudderth, \u201cTruly nonparametric online variational inference for hierarchical Dirichlet\n\nprocesses,\u201d NIPS, pp. 1\u20139, 2012.\n\n[16] P Liang, S Petrov, M I Jordan, and D Klein, \u201cThe In\ufb01nite PCFG using Hierarchical Dirichlet Processes,\u201d\n\nConf. Empir. Methods Nat. Lang. Process. Comput. Nat. Lang. Learn., pp. 688\u2013697, 2007.\n\n[17] Y W Teh, K Kurihara, and M Welling, \u201cCollapsed Variational Inference for HDP,\u201d NIPS, 2007.\n[18] D M Blei and M I Jordan, \u201cVariational Inference for Dirichlet Process Mixtures,\u201d Bayesian Anal, 2004.\n[19] K Kurihara, M Welling, and N Vlassis, \u201cAccelerated Variational Dirichlet Process Mixtures,\u201d NIPS,\n\n2007.\n\n[20] M J Beal, \u201cVariational Algorithms for Approximate Bayesian Inference,\u201d Diss. Univ. London, 2003.\n[21] J Paisley and L Carin, \u201cHidden Markov Models With Stick-Breaking Priors,\u201d IEEE Trans. Signal Pro-\n\ncess., vol. 57, no. 10, pp. 3905\u20133917, 2009.\n\n[22] S Jain and R M Neal, \u201cSplitting and merging components of a nonconjugate Dirichlet process mixture\n\nmodel,\u201d Bayesian Anal, Sept. 2007.\n\n[23] Y Rubner, C Tomasi, and L J Guibas, \u201cThe Earth Movers Distance as a Metric for Image Retrieval,\u201d Int.\n\nJ. Comput. Vis., vol. 40, no. 2, pp. 99\u2013121, 2000.\n\n[24] K Dzirasa, R Fuentes, S Kumar, J M Potes, and M A L Nicolelis, \u201cChronic in Vivo Multi-Circuit Neuro-\n\nphysiological Recordings in Mice,\u201d J. Neurosci. Methods, vol. 195, no. 1, pp. 36\u201346, Jan. 2011.\n\n[25] M A Tucker, Y Hirota, E J Wamsley, H Lau, A Chaklader, and W Fishbein, \u201cA Daytime Nap Containing\nSolely Non-REM Sleep Enhances Declarative but not Procedural Memory.,\u201d Neurobiol. Learn. Mem.,\nvol. 86, no. 2, pp. 241\u20137, Sept. 2006.\n\n[26] A Rolls, D Colas, A Adamantidis, M Carter, T Lanre-Amos, H C Heller, and L de Lecea, \u201cOptogenetic\nDisruption of Sleep Continuity Impairs Memory Consolidation,\u201d PNAS, vol. 108, no. 32, pp. 13305\u201310,\nAug. 2011.\n\n9\n\n\f", "award": [], "sourceid": 1305, "authors": [{"given_name": "Kyle", "family_name": "Ulrich", "institution": "Duke University"}, {"given_name": "David", "family_name": "Carlson", "institution": "Duke University"}, {"given_name": "Wenzhao", "family_name": "Lian", "institution": "Duke University"}, {"given_name": "Jana", "family_name": "Borg", "institution": "Duke University"}, {"given_name": "Kafui", "family_name": "Dzirasa", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}]}