{"title": "Cross-Spectral Factor Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 6842, "page_last": 6852, "abstract": "In neuropsychiatric disorders such as schizophrenia or depression, there is often a disruption in the way that regions of the brain synchronize with one another. To facilitate understanding of network-level synchronization between brain regions, we introduce a novel model of multisite low-frequency neural recordings, such as local field potentials (LFPs) and electroencephalograms (EEGs). The proposed model, named Cross-Spectral Factor Analysis (CSFA), breaks the observed signal into factors defined by unique spatio-spectral properties. These properties are granted to the factors via a Gaussian process formulation in a multiple kernel learning framework. In this way, the LFP signals can be mapped to a lower dimensional space in a way that retains information of relevance to neuroscientists.  Critically, the factors are interpretable. The proposed approach empirically allows similar performance in classifying mouse genotype and behavioral context when compared to commonly used approaches that lack the interpretability of CSFA. We also introduce a semi-supervised approach, termed discriminative CSFA (dCSFA). CSFA and dCSFA provide useful tools for understanding neural dynamics, particularly by aiding in the design of causal follow-up experiments.", "full_text": "Cross-Spectral Factor Analysis\n\nNeil M. Gallagher1,*, Kyle Ulrich2,*, Austin Talbot3,\n\nKafui Dzirasa1,4, Lawrence Carin2 and David E. Carlson5,6\n\n1Department of Neurobiology, 2Department of Electrical and Computer Engineering, 3Department of\nStatistical Science, 4Department of Psychiatry and Behavioral Sciences, 5Department of Civil and\nEnvironmental Engineering, 6Department of Biostatistics and Bioinformatics , Duke University\n\n*Contributed equally to this work\n\n{neil.gallagher,austin.talbot,kafui.dzirasa,\n\nlcarin,david.carlson}@duke.edu\n\nAbstract\n\nIn neuropsychiatric disorders such as schizophrenia or depression, there is often a\ndisruption in the way that regions of the brain synchronize with one another. To fa-\ncilitate understanding of network-level synchronization between brain regions, we\nintroduce a novel model of multisite low-frequency neural recordings, such as local\n\ufb01eld potentials (LFPs) and electroencephalograms (EEGs). The proposed model,\nnamed Cross-Spectral Factor Analysis (CSFA), breaks the observed signal into\nfactors de\ufb01ned by unique spatio-spectral properties. These properties are granted\nto the factors via a Gaussian process formulation in a multiple kernel learning\nframework. In this way, the LFP signals can be mapped to a lower dimensional\nspace in a way that retains information of relevance to neuroscientists. Critically,\nthe factors are interpretable. The proposed approach empirically allows similar\nperformance in classifying mouse genotype and behavioral context when compared\nto commonly used approaches that lack the interpretability of CSFA. We also intro-\nduce a semi-supervised approach, termed discriminative CSFA (dCSFA). CSFA\nand dCSFA provide useful tools for understanding neural dynamics, particularly\nby aiding in the design of causal follow-up experiments.\n\n1\n\nIntroduction\n\nNeuropsychiatric disorders (e.g. schizophrenia, autism spectral disorder, etc.) take an enormous\ntoll on our society [16]. In spite of this, the underlying neural causes of many of these diseases are\npoorly understood and treatments are developing at a slow pace [2]. Many of these disorders have\nbeen linked to a disruption of neural dynamics and communication between brain regions [10, 33].\nIn recent years, tools such as optogenetics [15, 26] have facilitated the direct probing of causal\nrelationships between neural activity in different brain regions and neural disorders [28]. Planning a\nwell-designed experiment to study spatiotemoral dynamics in neural activity can present a challenge\ndue to the high number of design choices, such as which region(s) to stimulate, what neuron types,\nand what stimulation pattern to use. In this manuscript we explore how a machine learning approach\ncan facilitate the design of these experiments by developing interpretable and predictive methods.\nThese two qualities are crucial because they allow exploratory experiments to be used more effectively\nin the design of causal studies.\nWe explore how to construct a machine learning approach to capture neural dynamics from raw\nneural data during changing behavioral and state conditions. A body of literature in theoretical and\nexperimental neuroscience has focused on linking synchronized oscillations, which are observable\nin LFPs and EEGs, to neural computation [18, 24]. Such oscillations are often quanti\ufb01ed by\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fspectral power, coherence, and phase relationships in particular frequency bands; disruption of these\nrelationships has been observed in neuropsychiatric disorders [20, 33]. There are a number of\nmethods for quantifying synchrony between pairs of brain regions based on statistical correlation\nbetween recorded activity in those regions [36, 5], but current methods for effectively identifying\nsuch patterns on a multi-region network level, such as Independent Component Analysis (ICA), are\ndif\ufb01cult to transform to actionable hypotheses.\nThe motivating data considered here are local \ufb01eld potentials (LFPs) recorded from implanted depth\nelectrodes at multiple sites (brain regions). LFPs are believed to re\ufb02ect the combined local neural\nactivity of hundreds of thousands of neurons [9]. The unique combination of spatial and temporal\nprecision provided by LFPs allows for accurate representation of frequency and phase relationships\nbetween activity in different brain regions. Notably, LFPs do not carry the signal precision present in\nspiking activity from signal neurons; however, LFP signal characteristics are more consistent between\nanimals, meaning that information gleaned from LFPs can be used to understand population level\neffects, just as in fMRI or EEG studies. Our empirical results further demonstrate this phenomenon.\nMulti-region LFP recordings produce relatively high-dimensional datasets. Basic statistical tests\ntypically perform poorly in such high dimensional spaces without being directed by prior knowledge\ndue to multiple comparisons, which diminish statistical power [27]. Furthermore, typical multi-site\nLFP datasets are both \u201cbig data\u201d in the sense that there are a large number of high-dimensional\nmeasurements and \u201csmall data\u201d in the sense that only a few animals are used to represent the\nentire population. A common approach to address this issue is to describe such data by a small\nnumber of factors (e.g. dimensionality reduction), which increases the statistical power when relevant\ninformation (e.g. relationship to behavior) is captured in the factors. Many methods for reducing the\ndimensionality of neural datasets exist [14], but are generally either geared towards spiking data or\nsimple general-purpose methods such as principal components analysis (PCA). Therefore, reducing\nthe dimensionality of multi-channel LFP datasets into a set of interpretable factors can facilitate the\nconstruction of testable hypotheses regarding the role of neural dynamics in brain function.\nThe end goal of this analysis is not simply to improve predictive performance, but to design meaningful\nfuture causal experiments. By identifying functional and interpretable networks, we can form educated\nhypotheses and design targeted manipulation of neural circuits. This approach has been previously\nsuccessful in the \ufb01eld of neuroscience [10]. The choice to investigate networks that span large portions\nof the brain is critical, as this is the scale at which most clinical and scienti\ufb01c in vivo interventions\nare applied. Additionally, decomposing complex signatures of brain activity into contributions from\nindividual functional networks (i.e. factors) allows for models and analyses that are more conceptually\nand technically tractable.\nHere, we introduce a new framework, denoted Cross-Spectral Factor Analysis (CSFA), which is able\nto accurately represent multi-region neural dynamics in a low-dimensional manifold while retaining\ninterpretability. The model de\ufb01nes a set of factors, each capturing the power, coherence, and phase\nrelationships for a distribution of neural signals. The learned parameters for each factor correspond\nto an interpretable representation of the network dynamics. Changes in the relative strengths of each\nfactor can relate neural dynamics to desired variables. Empirically, CSFA discovers networks that\nare highly predictive of response variables (behavioral context and genotype) for recordings from\nmice undergoing a behavioral paradigm designed to measure an animal\u2019s response to a challenging\nexperience. We further show that incorporating response variables in a supervised multi-objective\nframework can further map relevant information into a smaller set of features, as in [30], potentially\nincreasing statistical power.\n\n2 Model Description\n\nHere, we describe a model to extract a low-dimensional \u201cbrain state\u201d representation from multi-\nchannel LFP recordings. The states in this model are de\ufb01ned by a set of factors, each of which\ndescribes a speci\ufb01c distribution of observable signals in the network. The data is segmented into time\nwindows composed of N observations, equally spaced over time, from C distinct brain regions. We\nlet window w be represented by Y w = [yw\nN ] 2 RC\u21e5N (see Fig 1[left]). N is determined\nby the sampling rate and the duration of the window. The complete dataset is represented by the\nset Y = {Y w}w=1,...,W . Window lengths are typically chosen to be 1-5 seconds, as this temporal\nresolution is assumed to be suf\ufb01cient to capture the broad changes in brain state that we are interested\n\n1 , . . . , yw\n\n2\n\n\f!\"#$\n\n!\"\n\n!\"%$\n\n!\"%&\n\n\n\n\u2713\n\nzw\n\nsw1\n\nsw2\n\n. . .\n\nswL\n\nyw\n1\n\nyw\n2\n\n. . .\n\nyw\nN\n\nW\n\nFigure 1: [left] Example of multi-site LFP data from seven brain regions, separated into time windows.\n[right] Visual description of the parameters of the dCSFA model. yw\nc : Signal from channel c in\nwindow w. zw: Task-relevant side information. sw`: Score for factor ` in window w. \u2713: Parameters\ndescribing CSFA model. : Parameters of side-information classi\ufb01er. Shaded regions indicate\nobserved variables and clear represent inferred variables.\n\nin. We assume that window durations are short enough to make the signal approximately stationary.\nThis assumption, while only an approximation, is appropriate because we are interested in brain state\ndynamics that occur on a relatively long time scale (i.e. multiple seconds). Therefore, within a single\nwindow of LFP data the observation may be represented by a stationary Gaussian process (GP). It is\nimportant to distinguish between signal dynamics, which occur on a time scale of milliseconds, and\nbrain state dynamics, which are assumed to occur over a longer time scale.\nIn the following, the Cross-Spectral Mixture kernel [34], a key step in the proposed model, is reviewed\nin Section 2.1. The formulation of the CSFA model is given in Section 2.2. Model inference is\ndiscussed in Section 2.3. In Section 2.4, a joint CSFA and classi\ufb01cation model called discriminative\nCSFA (dCSFA) is introduced. Supplemental Section A discusses additional related work. Supple-\nmental Section B gives additional mathematical background on multi-region Gaussian processes.\nSupplemental Section C offers an alternative formulation of the CSFA model that models the ob-\nserved signal as the real component of a complex signal. For ef\ufb01cient calculations, computational\napproximations for the CSFA model are described in Supplemental Section D.\n\n2.1 Cross-Spectral Mixture Kernel\nCommon methods to characterize spectral relationships within and between signal channels are the\npower-spectral density (PSD) and cross-spectral density (CSD), respectively [29]. A set of multi-\nchannel neural recordings may be characterized by the set of PSDs for each channel and CSDs for\neach pair of channels, resulting in a quadratic increase in the number of parameters with the number\nof channels observed. In order to counteract the issues arising from many multiple comparisons,\nneuroscientists typically preselect channels and frequencies of interest before testing experimental\nhypotheses about spectral relationships in neural datasets. Instead of directly calculating each of\nthese parameters, we use a modeling approach to estimate the PSDs and CSDs over all channels and\nfrequency bands by using the Cross-Spectral Mixture (CSM) covariance kernel [34]. In this way we\neffectively reduce the number of parameters required to obtain a good representation of the PSDs and\nCSDs for a multi-site neural recording.\nThe CSM multi-output kernel is given by\n\nKCSM (t, t0; Bq, \u00b5q,\u232b q) = Real\u21e3PQ\n\n(1)\nwhere the matrix KCSM 2 CC\u21e5C. This is the real component of a sum of Q separable kernels. Each\nof these kernels is given by the combination of a cross-spectral density matrix, Bq 2 CC\u21e5C, and a\nstationary function of two time points that de\ufb01nes a frequency band, kq(\u00b7). Representing \u2327 = t  t0,\nas all kernels used here are stationary and depend only on the difference between the two inputs, the\nfrequency band for each spectral kernel is de\ufb01ned by a spectral Gaussian kernel,\n\nq=1 Bqkq(t, t0; \u00b5q,\u232b q)\u2318 ,\n\nkq(\u2327 ; \u00b5q,\u232b q) = exp 1\n\n2 \u232bq\u2327 2 + j\u00b5q\u2327 ,\n\n3\n\n(2)\n\n\fwhich is equivalent to a Gaussian distribution in the frequency domain with variance \u232bq, centered at\n\u00b5q. The matrix Bq is a positive semi-de\ufb01nite matrix with rank R. (Note: The cross-spectral density\nmatrix Bq is also known as coregionalization matrix in spatial statistics [4]). Keeping R small for\nthe coregionalization matrices ameliorates over\ufb01tting by reducing the overall parameter space. This\nrelationship is maintained and Bq is updated by storing the full matrix as the outer product of a tall\nmatrix with itself:\n\nBq = \u02dcBq \u02dcB\u2020q,\n\n\u02dcBq 2 C \u21e5 R.\n\n(3)\n\nPhase coherence between regions is given by the magnitudes of the complex off-diagonal entries in\nBq. The phase offset is given by the complex angle of those off-diagonal entries.\n\n2.2 Cross-Spectral Factor Analysis\n\nOur proposed model creates a low-dimensional manifold by extending the CSM framework to a\nmultiple kernel learning framework [17]. Let tn represent the time point of the nth sample in the\nwindow and t represent [t1, . . . , tN ]. Each window of data is modeled as\n\nn = f w(tn) + \u270fw\nyw\nn ,\n\nn \u21e0N (0,\u2318 1IC),\n\u270fw\n\nswlF l\n\nw(t),\n\nF w(t) = [f w(t1), . . . , f w(tN )],\n\nF w(t) =\n\nLXl=1\n\nwhere F w(t) is represented as a linear combination functions drawn from L latent factors, given by\n{F l\n\nl=1. The l-th latent function is drawn independently for each task according to\n\nw(t)}L\n\nF l\nw(t) \u21e0GP (0, KCSM (\u00b7; \u2713l)),\n\n(6)\n\nq}Q\nq=1). The GP here\nwhere \u2713l is the set of parameters associated with the lth factor (i.e. {Bl\nrepresents a multi-output Gaussian process due to the cross-correlation structure between the brain\nregions, as in [32]. Additional details on the multi-output Gaussian process formulation can be found\nin Supplemental Section B.\nIn CSFA, the latent functions {F l\nl=1 are not the same across windows; rather, the underlying\ncross-spectral content (power, coherence, and phase) of the signals is shared and the functional\ninstantiation differs from window to window. A marginalization of all latent functions results in a\ncovariance kernel that is a weighted superposition of the kernels for each latent factor, which is given\nmathematically as\n\nw(t)}L\n\nq, \u00b5l\n\nq,\u232b l\n\n(4)\n\n(5)\n\n(7)\n\n(8)\n\nY w \u21e0GP (0, KCSF A(\u00b7; \u21e5, w))\n\nKCSF A(\u2327 ; \u21e5, w) =\n\nwlKCSM (\u2327 ; \u2713l) + \u23181\u2327 I C.\ns2\n\nLXl=1\n\nHere, \u21e5 = {\u27131, . . . , \u2713L} is the set of parameters associated with all L factors and \u2327 represents the\nDirac delta function and constructs the additive Gaussian noise. The use of this multi-output GP\nformulation within the CSFA kernel means that the latent variables can be directly integrated out,\nfacilitating inference.\nTo address multiplicative non-identi\ufb01ability, the maximum power in any frequency band is limited\nfor each CSM kernel (i.e. max(diag(KCSM (0; \u2713l))) = 1 for all l). In this way, the factor scores\nwl, may now be interpreted approximately as the variance associated with factor l in\nsquared, s2\nwindow w.\n\n2.3\n\nInference\n\nA maximum likelihood formulation for the zero-mean Gaussian process given by Eq. 7 is used to\nlearn the factor scores {sw}W\nw=1 and CSM kernel parameters \u21e5, given the full dataset Y. If we let\n\u2303w\nCSF A 2 CN C\u21e5N C be the covariance matrix obtained from the kernel KCSF A(\u00b7; \u21e5, w) evaluated\n\n4\n\n\fat time points t, we have\n\n({sw}W\n\nw=1, \u21e5) = arg max\n\nw=1, \u02dc\u21e5L(Y;{ \u02dcsw}W\n\nw=1, \u02dc\u21e5)\n\nL(Y;{sw}W\n\nw=1, \u21e5) =\n\nN (vec(Y w); 0, \u2303w\n\nCSF A),\n\n{ \u02dcsw}W\n\nWYw=1\n\n(9)\n\n(10)\n\nwhere vec(\u00b7) gives a column-wise vectorization of its matrix argument, and W is the total number\nof windows. As is common with many Gaussian processes, an analytic solution to maximize the\nlog-likelihood does not exist. We resort to a batch gradient descent algorithm based on the Adam\nformulation [22]. Fast calculation of gradients is accomplished via a discrete Fourier transform\n(DFT) approximation for the CSM kernel [34]. This approximation alters the formulation given in\nEq. 7 slightly; the modi\ufb01ed form is given in Supplemental Section D. The hyperparameters of the\nmodel are the number of factors (L), the number of spectral Gaussians per factor (Q), the rank of\nthe coregionalization matrix (R), and the precision of the additive white noise (\u2318). In applications\nwhere the generative properties of the model are most important, hyperparameters should be chosen\nusing cross-validation based on hold-out log-likelihood. In the results described below, we emphasize\nthe predictive aspects of the model, so hyperparameters are chosen by cross-validating on predictive\nperformance. In order to maximize the generalizability of the model to a population, validation and\ntest sets are composed of data from complete animals/subjects that were not included in the training\nset.\nIn all of the results described below, models were trained for 500 Adam iterations, with a learning\nrate of 0.01 and other learning parameters set to the defaults suggested in [22]. The kernel parameters\n\u21e5 were then \ufb01xed at their values from the 500th iteration and suf\ufb01cient additional iterations were\nw=1, reached approximate convergence. Corresponding factor\ncarried out until the factor scores, {sw}W\nscores are learned for validation and test sets in a similar manner, by initializing the kernel parameters\n\u21e5 with those learned from the training set and holding them \ufb01xed while learning factor scores to\nconvergence as outlined above. Normalization to address multiplicative identi\ufb01ability, as described in\nSection 2.2, was applied to each model after all iterations were completed.\n\n2.4 Discriminative CSFA\n\nWe often wish to discover factors that are associated with some side information (e.g. behavioral\ncontext). More formally, given a set of labels, {z1, . . . , zW}, we wish to maximize the ability of the\nfactor scores, {s1, . . . , sw}, to predict the labels. This is accomplished by modifying the objective\nfunction to include a second term related to the performance of a classi\ufb01er that takes the factor\nscores as regressors. We term this modi\ufb01ed model discriminative CSFA, or dCSFA. We choose the\ncross-entropy error of a simple logistic regression classi\ufb01er to demonstrate this, giving\n\n{{sw}W\n\nw=1, \u21e5} = arg max\n\nw=1, \u02dc\u21e5L(Y;{ \u02dcsw}W\n\n{ \u02dcsw}W\n\nw=1, \u02dc\u21e5) + PW\n\nw=1P1\n\nk=0 1zw=k log\u21e3 exp(k \u02dcsw)\nP0k exp(0k \u02dcsw)\u2318.\n\n(11)\nThe \ufb01rst term of the RHS of (11) quanti\ufb01es the generative aspect of how well the model \ufb01ts the data\n(the log-likelihood of Section 2.2). The second term is the loss function of classi\ufb01cation. Here  is a\nparameter that controls the relative importance of the classi\ufb01cation loss function to the generative\nlikelihood. It is straightforward to include alternative classi\ufb01ers or side information. For example,\nwhen there are multiple classes it is desirable to set the loss function to be the cross entropy loss\nassociated with multinomial logistic regression [23], which only involves modifying the second term\nof the RHS of (11).\nIn this dCSFA formulation,  and the other hyperparameters are chosen based on cross-validation\nof the predictive accuracy of the factors, to produce factors that are predictive as possible in a new\ndataset from other members of the population. The number of factors included in the classi\ufb01cation\nand corresponding loss function can be limited to a number less than L. One application of dCSFA is\nto \ufb01nd a few factors predictive of side information, embedded in a full set of factors that describe\na dataset [30]. In this way, the predictive factors maintain the desirable properties of a generative\nmodel, such as robustness to missing regressors. We assume that in many applications of dCSFA, the\ndescriptive properties of the remaining factors matter only in that they provide a larger generative\nmodel to embed the discriminative factors in. In applications where the descriptive properties of the\n\n5\n\n\fHomecage\n\nOpen Field\n\nTail Suspension Test\n\nHomecage\n\nOpen Field\n\nTail Suspension Test\n\nDay 1\n\nDay 2\n\nHomecage\n\nOpen Field\n\nTail Suspension Test\n\nFigure 2: Factor scores learned in two different dCSFA models. Data shown corresponds to the test\nset described in 3.2. Score trajectories are smoothed over time for visualization. Bold lines give score\ntrajectory averaged over all 6 mice. (top) Scores for three factors that track with behavioral context\nover a two-day experiment. (bottom) Scores for a single factor that tracks with genotype.\n\nremaining factors are of major importance, hyperparameters can instead be cross-validated using the\nobjective function from (11) applied to data from new members of the population.\n\n2.5 Handling Missing Channels\n\nElectrode and surgical failures resulting in unusable data channels are common when collecting the\nmulti-channel LFP datasets that motivate this work. Fortunately, accounting for missing channels is\nstraightforward within the CSFA model by taking advantage of the marginal properties of multivariate\nGaussian distributions. This is a standard approach in the Gaussian process literature [31]. Missing\nchannels are handled by marginalizing the missing channel out of the covariance matrix in Eq. 7.\nThis mechanism also allows for the application of CSFA to multiple datasets simultaneously, as long\nas there is some overlap in the set of regions recorded in each dataset. Similarly, the conditional\nproperties of multivariate Gaussian distributions provide a mechanism for simulating data from\nmissing channels. This is accomplished by \ufb01nding the conditional covariance matrix for the missing\nchannels given the original matrix (Eq. 8) and the recorded data.\n\n3 Results\n\n3.1 Synthetic Data\n\nIn order to demonstrate that CSFA is capable of accurately representing the true spectral characteristics\nassociated with some dataset, we tested it on a synthetic dataset. The synthetic dataset was simulated\nfrom a CSFA model with pre-determined kernel parameters and randomly generated score values\nat each window. In this way there is a known covariance matrix associated with each window of\nthe dataset. Details of the model used to generate this data are described in Supplemental Section E\nand Supplemental Table 2. The cross-spectral density was learned for each window of the dataset\nby training a randomly initialized CSFA model and the KL-divergence compared to the true cross-\nspectral density was computed. Hyperparameters for the learned CSFA model were chosen to match\nthe model from which the dataset was generated.\n\n6\n\n\fA classical issue with many factor analysis approaches, such as probabilistic PCA [7], is the assump-\ntion of a constant covariance matrix. To emphasize the point that our method captures dynamics of\nthe covariance structure, we compare the results from CSFA to the KL-divergence from a constant\nestimate of the covariance matrix over all of the windows, as is assumed in traditional factor analysis\napproaches. CSFA had an average divergence of 5466.8 (std. dev. of 49.5) compared to 7560.2\n(std. dev. of 17.9) for the mean estimate. These distributions were signi\ufb01cantly different (p-value\n< 2 \u21e5 10308, Wilcoxon rank sum test). This indicates that, on average, CSFA provides a much\nbetter estimate of the covariance matrix associated with a window in this synthetic dataset compared\nto the classical constant covariance assumption.\n\n3.2 Mouse Data\n\nWe collected a dataset of LFPs recorded from 26 mice from two different genetic backgrounds (14\nwild type, 12 CLOCK19). The CLOCK19 line of mice have been proposed as a model of bipolar\ndisorder [35]. There are 20 minutes of recordings for each mouse: 5 minutes while the mouse was in\nits home cage, 5 minutes during open \ufb01eld exploration, and 10 minutes during a tail suspension test.\nThe tail suspension test is used as an assay of response to a challenging experience [1]. Eleven distinct\nbrain regions were recorded: Nucleus Accumbens Core, Nucleus Accumbens Shell, Basolateral\nAmygdala, Infralimbic Cortex, Mediodorsal Thalamus, Prelimbic Cortex, Ventral Tegmental Area,\nLateral Dorsal Hippocampus, Lateral Substantia Nigra Pars Compacta, Medial Dorsal Hippocampus,\nand Medial Substantia Nigra Pars Compacta. Following previous applications [34], the window\nlength was set to 5 seconds and data was downsampled to 250 Hz.\nWe learned CSFA and dCSFA models in two separate classi\ufb01cation tasks: prediction of animal\ngenotype and of the behavioral context of the recording (i.e. home cage, open \ufb01eld, or tail-suspension\ntest). Three mice of each genotype were held out as a testing set. We used a 5-fold cross-validation\napproach to select the number of factors, L, the number of spectral Gaussians per factor (i.e. factor\ncomplexity), Q, the rank of the cross-spectral density matrix, R, and the additive noise precision, \u2318.\nFor each validation set, CSFA models were trained for each combination of L 2{ 10, 20, 30}, Q 2\n{3, 5, 8}, R 2{ 1, 2},\u2318 2{ 5, 20}, and the model giving the best classi\ufb01cation performance on the\nvalidation set was selected for testing (see table 1). The hyperparameters above for each dCSFA\nmodel were chosen based on the best average performance over all validation sets using CSFA. The\nparameters for the dCSFA model corresponding to each validation set were initialized from a trained\nCSFA model for that validation set with the chosen hyperparameters. 3 factors from the CSFA\nmodel were chosen to be included in the classi\ufb01er component of the dCSFA model. For the binary\nclassi\ufb01cation task, the 3 factors with the lowest p-value in a Wilcoxon rank-sum test between scores\nassociated with each class were chosen. For the multinomial classi\ufb01cation task, a rank-sum test was\nperformed between all pairs of classes, and the 3 factors with the lowest average log p-value were\nchosen. The  hyperparameter for dCSFA was chosen from {1, 0.1, 0.01} based on validation set\nclassi\ufb01cation performance.\n\nFeatures\n\nGenotype (AUROC) Behavioral Context (% Accuracy)\n\nFFT + PCA\nWelch + PCA\n\nCSFA\ndCSFA\ndCSFA-3\n\nWelch + PCA-3\n\n0.632 [0.012]\n0.922 [0.013]\n0.685 [0.067]\n0.731 [0.064]\n0.741 [0.099]\n0.528 [0.045]\n\n85.5 [0.2]\n87.5 [1.5]\n82.8 [0.9]\n83.1 [0.6]\n70.7 [1.9]\n54.7 [0.4]\n\nTable 1: Classi\ufb01cation performance. For genotype, logistic regression with an L1 regularization\npenalty was used. For behavioral context, multinomial logistic regression with an L2 penalty was\nused. All results are reported as a mean, with standard error included in brackets. FFT+PCA: PCA\napplied to magnitude of FFT. Welch+PCA: PCA applied to Welch\u2019s estimated spectral densities.\nCSFA: CSFA factor scores. dCSFA: All dCSFA factor scores. dCSFA-3: Scores for 3 discriminative\ndCSFA factors. Welch + PCA-3: PCA applied to estimated spectral densities; 3 components selected\nusing the criteria described in 3.2.\n\n7\n\n\fFigure 3: Visual representations of a dCSFA factor. [right] Relative power-spectral (diagonal) and\ncross-spectral (off-diagonal) densities associated with the covariance function de\ufb01ning a single\nfactor. Amplitude reported for each frequency within a power or cross-spectral density is normalized\nrelative to the total sum of powers or coherences, respectively, at that frequency for all factors. [left]\nSimpli\ufb01ed representation of the same factor. Each \u2019wedge\u2019 corresponds to a single brain region.\nColored regions along the \u2018hub\u2019 of the circle represent frequency bands with signi\ufb01cant power within\nthat corresponding region. Colored \u2018spokes\u2019 represent frequency bands with signi\ufb01cant coherence\nbetween the corresponding pair of regions.\n\nWe compare our CSFA and dCSFA models to two-stage modeling approaches that are representative\nof techniques commonly used in the analysis of neural oscillation data [21]. Each of these approaches\nbegins with a method for estimating the spectral content of a signal, followed by PCA to reduce\ndimensionality (see Supplemental Section F for details). CSFA models were trained as described\nin Section 2.3; dCSFA models were initialized as reported above and trained for an additional 500\niterations. Figure 2 demonstrates that the predictive features learned from dCSFA clearly track the\ndifferent behavioral paradigms. If we constrain our classi\ufb01er to use only a few of the learned features,\ndCSFA features signi\ufb01cantly outperform features from the best comparison method. Compressing\nrelevant predictive information into only a handful of factors here is desirable for a number of reasons;\nit reduces the necessary number of statistical tests for testing hypotheses and also offers a more\ninterpretable situation for neuroscientists. The dCSFA factor that is most strongly associated with\ngenotype is visualized in Figure 3.\n\n3.3 Visualization\n\nThe models generated by CSFA are easily visualized and interpreted in a way that allows neuro-\nscientists to generate testable hypotheses related to brain network dynamics. Figure 3 shows one\nway to visualize the latent factors produced by CSFA. The upper-right section shows the power and\ncross-spectra associated with the CSM kernel from a single factor. Together these plots de\ufb01ne a\ndistribution of multi-channel signals that are described by this one factor. Plots along the diagonal\ngive power spectra for each of the 11 brain regions included in the dataset. The off diagonal plots\nshow the cross spectra with the associated phase offset in orange. The phase offset implies that\noscillations may originate in one region and travel to another, given the assumption that another\n(observed or unobserved) region is not responsible for the observed phase offset. These assumptions\nare not true in general, so we emphasize that their use is in hypothesis generation.\n\n8\n\n\fThe circular plot on the bottom-left of Figure 3 visualizes the same factor in an alternative concise\nway. Around the edge of the circle are the names of the brain regions in the data set and a range\nof frequencies modeled for each region. Colored bands along the outside of the circle indicate that\nspectral power in the corresponding region and frequency bands is above a threshold value. Similarly,\nlines connecting one region to another indicate that the coherence between the two regions is above\nthe same threshold value at the corresponding frequency band. Given the assumption that coherence\nimplies communication between brain regions [5], this plot quickly shows which brain regions are\nbelieved to be communicating and at what frequency band in each functional network.\n\n4 Discussion and Conclusion\n\nMulti-channel LFP datasets have enormous potential for describing brain network dynamics at the\nlevel of individual regions. The dynamic nature and high-dimensionality of such datasets makes direct\ninterpretation quite dif\ufb01cult. In order to take advantage of the information in such datasets, techniques\nfor simplifying and detecting patterns in this context are necessary. Currently available techniques for\nsimplifying these types of high dimensional datasets into a manageable size (e.g. ICA, PCA) generally\ndo not offer suf\ufb01cient insight into the types of questions that neuroscientists are interested in. More\nspeci\ufb01cally, there is evidence that neural networks produce oscillatory patterns in LFPs as signatures\nof network activation [18]. Methods such as CSFA, which identify and interpret these signatures\nat a network level, are needed to form reasonable and testable hypotheses about the dynamics of\nwhole-brain networks. In this work, we show that CSFA detects signatures of multi-region network\nactivity that explain variables of interest to neuroscientists (i.e. animal genotype, behavioral context).\nThe proposed CSFA model explicitly targets known relationships of LFP data to map the high-\ndimensional data to a low-dimensional set of features. In direct contrast to many other dimensionality\nreduction methods, each factor maintains a high degree of interpretability, particularly in neuroscience\napplications. We emphasize that CSFA captures both spectral power and coherence across brain\nregions, both of which have been associated with neural information processing within the brain [19].\nIt is important to note that this model \ufb01nds temporal precedence in observed signals, rather than\ntrue causality; there are many examples where temporal precedence does not imply true causation.\nTherefore, we emphasize that CSFA facilitates the generation of testable hypothesis rather than\ndemonstrating causal relationships by itself. In addition, CSFA can suggest ways of manipulating\nnetwork dynamics in order to directly test their role in mental processes. Such experiments might\ninvolve closed-loop stimulation using optogenetic or transcranial magnetic stimulation to manipulate\nthe complex temporal dynamics of neural activity captured by the learned factors.\nFuture work will focus on making these approaches broadly applicable, computationally ef\ufb01cient,\nand reliable. It is worth noting that CSFA describes the full-cross spectral density of the data, but\nthat there are additional signal characteristics of interest to neuroscientists that are not described,\nsuch as cross-frequency coupling [24]; another possible area of future work is the development of\nadditional kernel formulations that could capture these additional signal characteristics. CSFA will\nalso be generalized to include other measurement modalities (e.g. neural spiking, fMRI) to create\njoint generative models.\nIn summary, we believe that CSFA ful\ufb01lls three important criteria: 1. It consolidates high-dimensional\ndata into an easily interpretable low-dimensional space. 2. It adequately represents the raw observed\ndata. 3. It retains information from the original dataset that is relevant to neuroscience researchers. All\nthree of these characteristics are necessary to enable neuroscience researchers to generate trustworthy\nhypotheses about a network-level brain dynamics.\n\nAcknowledgements\n\nIn working on this project L.C. received funding from the DARPA HIST program; K.D., L.C., and\nD.C. received funding from the National Institutes of Health by grant R01MH099192-05S2; K.D\nreceived funding from the W.M. Keck Foundation.\n\n9\n\n\fReferences\n[1] H. M. Abelaira, G. Z. Reus, and J. Quevedo. Animal models as tools to study the pathophysiology of\n\ndepression. Revista Brasileira de Psiquiatria, 2013.\n\n[2] H. Akil, S. Brenner, E. Kandel, K. S. Kendler, M.-C. King, E. Scolnick, J. D. Watson, and H. Y. Zoghbi.\n\nThe future of psychiatric research: genomes and neural circuits. Science, 2010.\n\n[3] M. A. Alvarez, L. Rosasco, and N. D. Lawrence. Kernels for Vector-Valued Functions: a Review.\n\nFoundations and Trends in Machine Learning, 2012.\n\n[4] S. Banerjee, B. P. Carlin, and A. E. Gelfand. Hierarchical modeling and analysis for spatial data. Crc\n\nPress, 2014.\n\n[5] A. M. Bastos and J.-M. Schoffelen. A Tutorial Review of Functional Connectivity Analysis Methods and\n\nTheir Interpretational Pitfalls. Front Syst Neurosci 2016.\n\n[6] M. J. Beal. Variational Algorithms for Approximate Bayesian Inference. PhD thesis, University of London,\n\nUnited Kingdom, 2003.\n\n[7] C. M. Bishop. Pattern recognition. Machine Learning, 2006.\n[8] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research,\n\n2003.\n\n[9] G. Buzs\u00e1ki, C. A. Anastassiou, and C. Koch. The origin of extracellular \ufb01elds and currents\u2014EEG, ECoG,\n\nLFP and spikes. Nature Reviews Neuroscience, 2012.\n\n[10] D. Carlson, L. K. David, N. M. Gallagher, M.-A. T. Vu, M. Shirley, R. Hultman, J. Wang, C. Burrus,\nC. A. McClung, S. Kumar, L. Carin, S. D. Mague, and K. Dzirasa. Dynamically Timed Stimulation of\nCorticolimbic Circuitry Activates a Stress-Compensatory Pathway. Biological Psychiatry 2017.\n\n[11] R. Caruana. Multitask Learning. Machine Learning, 1997.\n[12] B. Chen, G. Polatkan, G. Sapiro, D. Blei, D. Dunson, and L. Carin. Deep learning with hierarchical\n\nconvolutional factor analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.\n\n[13] Y. Cho and L. K. Saul. Kernel methods for deep learning. In Advances in Neural Information Processing\n\nSystems, 2009.\n\n[14] J. P. Cunningham and M. Y. Byron. Dimensionality reduction for large-scale neural recordings. Nature\n\nNeuroscience, 2014.\n\n[15] K. Deisseroth. Optogenetics. Nature Methods, 2011.\n[16] W. W. Eaton, S. S. Martins, G. Nestadt, O. J. Bienvenu, D. Clarke, and P. Alexandre. The burden of mental\n\ndisorders. Epidemiologic reviews, 2008.\n\n[17] M. G\u00f6nen and E. Alpayd\u0131n. Multiple kernel learning algorithms. Journal of Machine Learning Research,\n\n2011.\n\n[18] A. Z. Harris and J. A. Gordon. Long-Range Neural Synchrony in Behavior. Annual Review of Neuroscience,\n\n2015.\n\n[19] K. D. Harris and A. Thiele. Cortical state and attention. Nature Reviews Neuroscience, 2011.\n[20] R. Hultman, S. D. Mague, Q. Li, B. M. Katz, N. Michel, L. Lin, J. Wang, L. K. David, C. Blount,\nR. Chandy, and others. Dysregulation of prefrontal cortex-mediated slow-evolving limbic dynamics drives\nstress-induced emotional pathology. Neuron, 2016.\n\n[21] D. Iacoviello, A. Petracca, M. Spezialetti, and G. Placidi. A real-time classi\ufb01cation algorithm for EEG-\n\nbased BCI driven by self-induced emotions. Computer Methods and Programs in Biomedicine, 2015.\n\n[22] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs] 2014. arXiv:\n\n1412.6980.\n\n[23] C. Kwak and A. Clayton-Matthews. Multinomial logistic regression. Nursing research, 2002.\n[24] J. E. Lisman and O. Jensen. The Theta-Gamma Neural Code. Neuron 2013.\n[25] J. Mairal, P. Koniusz, Z. Harchaoui, and C. Schmid. Convolutional kernel networks. In Advances in Neural\n\nInformation Processing Systems, 2014.\n\n[26] G. Miesenb\u00f6ck. Genetic methods for illuminating the function of neural circuits. Current Opinion in\n\nNeurobiology, 2004.\n\n[27] M. D. Moran. Arguments for rejecting the sequential Bonferroni in ecological studies. Oikos, 2003.\n[28] E. J. Nestler and S. E. Hyman. Animal models of neuropsychiatric disorders. Nature Neuroscience, 2010.\n[29] A. V. Oppenheim. Discrete-time signal processing. Pearson Education India, 1999.\n\n10\n\n\f[30] R. Raina, Y. Shen, A. Mccallum, and A. Y. Ng. Classi\ufb01cation with hybrid generative/discriminative models.\n\nIn Advances in Neural Information Processing Systems, 2004.\n\n[31] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. the MIT Press, 2006.\n[32] Y. W. Teh, M. Seeger, and M. I. Jordan. Semiparametric Latent Factor Models. AISTATS, 2005.\n[33] P. J. Uhlhaas, C. Haenschel, D. Nikoli\u00b4c, and W. Singer. The role of oscillations and synchrony in cortical\n\nnetworks and their putative relevance for the pathophysiology of schizophrenia. Schizophr Bull 2008.\n\n[34] K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. GP Kernels for Cross-Spectrum Analysis. Advances\n\nin Neural Information Processing Systems, 2015.\n\n[35] J. van Enkhuizen, A. Minassian, and J. W. Young. Further evidence for clock19 mice as a model for\nbipolar disorder mania using cross-species tests of exploration and sensorimotor gating. Behavioural Brain\nResearch, 2013.\n\n[36] H. E. Wang, C. G. B\u00e9nar, P. P. Quilichini, K. J. Friston, V. K. Jirsa, and C. Bernard. A systematic framework\n\nfor functional connectivity measures. Front. Neurosci., 2014.\n\n[37] P. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time\n\naveraging over short, modi\ufb01ed periodograms. IEEE Transactions on Audio and Electroacoustics 1967.\n\n[38] A. G. Wilson, E. Gilboa, A. Nehorai, and J. P. Cunningham. Fast Kernel Learning for Multidimensional\n\nPattern Extrapolation. Advances in Neural Information Processing Systems, 2014.\n\n[39] A. Wilson and R. Adams. Gaussian process kernels for pattern discovery and extrapolation. In Proceedings\n\nof the 30th International Conference on Machine Learning (ICML-13), 2013.\n\n[40] M. Zhou, H. Chen, L. Ren, G. Sapiro, L. Carin, and J. W. Paisley. Non-parametric Bayesian dictionary\nlearning for sparse image representations. In Advances in Neural Information Processing Systems, 2009.\n\n11\n\n\f", "award": [], "sourceid": 3435, "authors": [{"given_name": "Neil", "family_name": "Gallagher", "institution": "Duke University"}, {"given_name": "Kyle", "family_name": "Ulrich", "institution": null}, {"given_name": "Austin", "family_name": "Talbot", "institution": "Duke University"}, {"given_name": "Kafui", "family_name": "Dzirasa", "institution": "Duke University"}, {"given_name": "Lawrence", "family_name": "Carin", "institution": "Duke University"}, {"given_name": "David", "family_name": "Carlson", "institution": "Duke University"}]}