{"title": "Extracting Latent Structure From Multiple Interacting Neural Populations", "book": "Advances in Neural Information Processing Systems", "page_first": 2942, "page_last": 2950, "abstract": "Developments in neural recording technology are rapidly enabling the recording of populations of neurons in multiple brain areas simultaneously, as well as the identification of the types of neurons being recorded (e.g., excitatory vs. inhibitory). There is a growing need for statistical methods to study the interaction among multiple, labeled populations of neurons. Rather than attempting to identify direct interactions between neurons (where the number of interactions grows with the number of neurons squared), we propose to extract a smaller number of latent variables from each population and study how the latent variables interact. Specifically, we propose extensions to probabilistic canonical correlation analysis (pCCA) to capture the temporal structure of the latent variables, as well as to distinguish within-population dynamics from across-population interactions (termed Group Latent Auto-Regressive Analysis, gLARA). We then applied these methods to populations of neurons recorded simultaneously in visual areas V1 and V2, and found that gLARA provides a better description of the recordings than pCCA. This work provides a foundation for studying how multiple populations of neurons interact and how this interaction supports brain function.", "full_text": "Extracting Latent Structure From Multiple\n\nInteracting Neural Populations\n\nJo\u02dcao D. Semedo1,2,3, Amin Zandvakili4, Adam Kohn4,\n\n\u2217Christian K. Machens3, \u2217Byron M. Yu1,5\n\n1Department of Electrical and Computer Engineering, Carnegie Mellon University\n2Department of Electrical and Computer Engineering, Instituto Superior T\u00b4ecnico\n\n3Champalimaud Neuroscience Programme, Champalimaud Center for the Unknown\n4Dominick Purpura Department of Neuroscience, Albert Einstein College of Medicine\n\n5Department of Biomedical Engineering, Carnegie Mellon University\n\njsemedo@cmu.edu\nchristian.machens@neuro.fchampalimaud.org\n\n{amin.zandvakili,adam.kohn}@einstein.yu.edu\nbyronyu@cmu.edu\n\n\u2217 Denotes equal contribution.\n\nAbstract\n\nDevelopments in neural recording technology are rapidly enabling the record-\ning of populations of neurons in multiple brain areas simultaneously, as well as\nthe identi\ufb01cation of the types of neurons being recorded (e.g., excitatory vs. in-\nhibitory). There is a growing need for statistical methods to study the interaction\namong multiple, labeled populations of neurons. Rather than attempting to iden-\ntify direct interactions between neurons (where the number of interactions grows\nwith the number of neurons squared), we propose to extract a smaller number\nof latent variables from each population and study how these latent variables in-\nteract. Speci\ufb01cally, we propose extensions to probabilistic canonical correlation\nanalysis (pCCA) to capture the temporal structure of the latent variables, as well\nas to distinguish within-population dynamics from between-population interac-\ntions (termed Group Latent Auto-Regressive Analysis, gLARA). We then applied\nthese methods to populations of neurons recorded simultaneously in visual areas\nV1 and V2, and found that gLARA provides a better description of the recordings\nthan pCCA. This work provides a foundation for studying how multiple popula-\ntions of neurons interact and how this interaction supports brain function.\n\n1\n\nIntroduction\n\nIn recent years, developments in neural recording technologies have enabled the recording of popu-\nlations of neurons from multiple brain areas simultaneously [1\u20137]. In addition, it is rapidly becom-\ning possible to identify the types of neurons being recorded (e.g., excitatory versus inhibitory [8]).\nEnabled by these experimental advances, a major growing line of scienti\ufb01c inquiry is to ask how dif-\nferent populations of neurons interact, whether the populations correspond to different brain areas or\ndifferent neuron types. To address such questions, we need statistical methods that are well-suited\nfor assessing how different groups of neurons interact on a population level.\nOne way to characterize multi-population activity is to have the neurons interact directly [9\u201311],\nthen examine the properties of the interaction strengths. While this may be a reasonable approach\nfor small populations of neurons, the number of interactions grows with the square of the number\nof recorded neurons, which may make it dif\ufb01cult to summarize how larger populations of neurons\ninteract [12]. Instead, it may be possible to obtain a more succinct account by extracting latent\nvariables for each population and asking how these latent variables interact.\n\n1\n\n\fFigure 1: Directed graphical models for multi-population activity. (a) Probabilistic canonical\ncorrelation analysis (pCCA). (b) pCCA with auto-regressive latent dynamics (AR-pCCA). (c) Group\nlatent auto-regressive analysis (gLARA). For clarity, we show only two populations in each panel\nand auto-regressive dynamics of order 1 in panel (c).\n\nDimensionality reduction methods have been widely used to extract succinct representations of pop-\nulation activity [13\u201317] (see [18] for a review). Each observed dimension corresponds to the spike\ncount (or \ufb01ring rate) of a neuron, and the goal is to extract latent variables that describe how the pop-\nulation activity varies across experimental conditions, experimental trials, and/or across time. These\nprevious studies use dimensionality reduction methods that do not explicitly account for multiple\npopulations of neurons. In other words, these methods are invariant to permutations of the ordering\nof the neurons (i.e., the observed dimensions).\nThis work focuses on latent variable methods designed explicitly for studying the interaction be-\ntween labelled populations of neurons. To motivate the need for these methods, consider applying\na standard dimensionality reduction method, such as factor analysis (FA) [19], to all neurons to-\ngether by ignoring the population labels. The extracted latent variables would capture all modes\nof covariability across the neurons, without distinguishing between-population interaction (i.e., the\nquantity of interest) from within-population interaction. Alternatively, one might \ufb01rst apply a stan-\ndard dimensionality reduction method to each population of neurons individually, then examine\nhow the latent variables extracted from each population interact. However, important features of the\nbetween-population interaction may be eliminated by the dimensionality reduction step, whose sole\nobjective is to preserve the within-population interaction.\nWe begin by considering canonical correlations analysis (CCA) and its probabilistic formulation\n(pCCA) [20], which identify a single set of latent variables that explicitly captures the between-\npopulation covariability. To understand how the different neural populations interact on different\ntimescales, we propose extensions of pCCA that introduce a separate set of latent variables for each\nneural population, as well as dynamics on the latent variables to describe their interaction over time.\nWe then apply the proposed methods to populations of neurons recorded simultaneously in visual\nareas V1 and V2 to demonstrate their utility.\n\n2 Methods\n\nWe consider the setting where many neurons are recorded simultaneously, and the neurons belong\nt \u2208 Rqi represent the observed\nto distinct populations (either by brain area or by neuron type). Let yi\nactivity vector of population i \u2208 {1, ..., M} at time t \u2208 {1, ..., T}, where qi denotes the number of\nneurons in population i. Below, we consider three different ways to study the interaction between\nthe neural populations. To keep the notation simple, we\u2019ll only consider two populations (M = 2);\nthe extension to more than two populations is straightforward.\n\n2.1 Factor analysis and probabilistic canonical correlation analysis\n\nConsider the following latent variable model, that de\ufb01nes a linear-Gaussian relationship between the\nobserved variables, y1\n\nt , and the latent state, xt \u2208 Rp:\n\nt and y2\n\nxt \u223c N (0, I)\n\n2\n\n(1)\n\n(a) pCCA(b) AR-pCCA(c) gLARA\f(cid:20) y1\n\n(cid:21)\n\n| xt \u223c N\nwhere C i \u2208 Rqi\u00d7p, di \u2208 Rqi and:\n\nt\ny2\nt\n\n(cid:21)(cid:19)\n\n(cid:18)(cid:20) y1\n\nt\ny2\nt\n\ncov\n\n=\n\nxt +\n\nC 2\n\n(cid:18)(cid:20) C 1\n(cid:21)\n(cid:20) R11 R12\n(cid:20) C 1\n\n(cid:21)(cid:104)\n\nR12T\n\nR22\n\nC 1T\n\nC 2\n\n,\n\nd2\n\n(cid:21)\n\n(cid:20) d1\n(cid:21)\nC 2T (cid:105)\n\n\u2208 Sq\n\n++\n\n(cid:20) R11 R12\n\nR12T\n\nR22\n\n(cid:21)(cid:19)\n\n(cid:20) R11 R12\n\nR12T\n\nR22\n\n(cid:21)\n\n+\n\nwith q = q1 + q2. According to this model, the covariance of the observed variables is given by:\n\n(2)\n\n(3)\n\n1, ..., r1\nq1\n\n), R22 = diag(r2\n\nFactor analysis (FA) and probabilistic canonical correlation analysis (pCCA) can be seen as two\nspecial cases of the general model presented above. FA assumes the noise covariance to be diagonal,\ni.e., R11 = diag(r1\n) and R12 = 0. This noise covariance captures\nonly the independent variance of each neuron, and not the covariance between neurons. As a result,\nthe covariance between neurons is explained by the latent state through the observation matrices C 1\nand C 2. pCCA, on the other hand, considers a block diagonal noise covariance, i.e., R12 = 0. This\nnoise covariance accounts for the covariance observed between neurons in the same population. The\nlatent state is therefore only used to explain the covariance between neurons in different populations.\nThe directed graphical model for pCCA is shown in Fig.1a.\n\n1, ..., r2\nq2\n\n2.2 Auto-regressive probabilistic canonical correlation analysis (AR-pCCA)\n\nWhile pCCA offers a succinct picture of the covariance structure between populations of neurons,\nit does not capture any temporal structure. There are two main reasons as to why this time structure\nmay be interesting. First, pCCA is modelling the covariance structure at zero time lag, which may\nnot capture all of the interactions of interest. If the two populations of neurons correspond to two\ndifferent brain areas, there may be important interactions at non-zero time lags due to physical delays\nin information transmission. Second, the two populations of neurons may interact at more than one\ntime delay, for example if multiple pathways exist between the neurons in these populations. To\ntake the temporal structure into account we will \ufb01rst extend pCCA by de\ufb01ning an auto-regressive\nlinear-Gaussian model on the latent state:\n\nxt \u223c N (0, I),\n\nxt | xt\u22121, xt\u22122, ..., xt\u2212\u03c4 \u223c N\n\n(cid:32) \u03c4(cid:88)\n\nif 1 \u2264 t \u2264 \u03c4\n\n(cid:33)\n\nAkxt\u2212k, Q\n\n,\n\nif t > \u03c4\n\n(4)\n\n(5)\n\nwhere Ak \u2208 Rp\u00d7p, \u2200k, Q \u2208 Sp\n++ and \u03c4 denotes the order of the autoregressive model. We term this\nmodel AR-pCCA, which is de\ufb01ned by the state model in Eq.(4)-(5) and the observation model in\nEq.(2) with R12 = 0. Although the observation model is the same as that for pCCA, the latent state\nhere accounts for temporal dynamics, as well as the covariation structure between the populations.\nThe corresponding directed graphical model is shown in Fig.1b.\n\nk=1\n\n2.3 Group latent auto-regressive analysis (gLARA)\n\nAccording to AR-pCCA, a single latent state drives the observed activity in both areas. As a result,\nit\u2019s not possible to distinguish the within-population dynamics from the between-population interac-\ntions. To allow for this, we propose using two separate latent states, one per population, that interact\nover time. We refer to the proposed model as group latent auto-regressive analysis (gLARA):\n\nxt \u223c N (0, I),\n\nif 1 \u2264 t \u2264 \u03c4\n\nt | xt\u22121, xt\u22122, ..., xt\u2212\u03c4 \u223c N\nxi\n\nAij\n\nk xj\n\nt\u2212k, Qi\n\n\uf8f6\uf8f8,\n\nif t > \u03c4\n\n(6)\n\n(7)\n\n\uf8eb\uf8ed 2(cid:88)\n\n\u03c4(cid:88)\n\nj=1\n\nk=1\n\n3\n\n\f(cid:21)\n\n(cid:21)\n\n(cid:21)(cid:19)\n\n| xt \u223c N\n\n(cid:20) y1\n\n(cid:20) d1\n\n(cid:20) R1\n\n(cid:21)(cid:20) x1\n\n(cid:18)(cid:20) C 1\n\nt\ny2\nt\nk \u2208 Rpi\u00d7pj and Qi \u2208 Spi\n\n(cid:21)\n0\n(8)\nt\nx2\n0 C 2\nt\nt \u2208 Rp1 and x2\nt \u2208 Rp2, the latent states for each population,\nwhere xt is obtained by stacking x1\n++, \u2200k and i \u2208 {1, 2}. Note that the covariance structure\nC i \u2208 Rqi\u00d7pi, Aij\nobserved on a population level now has to be completely re\ufb02ected by the latent states (there are no\nshared latent variables in this model) and is therefore de\ufb01ned by the dynamics matrices Aij\nk , allowing\nfor the separation of the within-population dynamics (A11\nk ) and the between-population\nk ). Furthermore, the interaction between the populations is asymmetrically\ninteractions (A12\nde\ufb01ned by A12\nk , allowing for a more in depth study of the way in each the two areas interact\nby comparing these across the various time delays considered. Note that gLARA represents a special\ncase of the AR-pCCA model.\n\n0\n0 R2\n\nk and A22\n\nk and A21\n\nk and A21\n\nd2\n\n+\n\n,\n\n2.4 Parameter estimation for gLARA\n\n(cid:21)\n\n(cid:20) \u00afx1\n\nt\n\u00afx2\nt\n\n(cid:104)\n\nThe parameters of gLARA can be \ufb01t to the training data using the expectation-maximization (EM)\nalgorithm. To do so, we start by de\ufb01ning the augmented latent state \u00afxt \u2208 Rp\u03c4 , with p = p1 + p2:\n\n\u00afxt =\n\n=\n\nT\n\nx1\nt\n\n. . . x1\n\nt\u2212\u03c4\n\nT x2\nt\n\nT\n\n. . . x2\n\nt\u2212\u03c4\n\nand the augmented observation vector \u00afyt \u2208 Rq, with q = q1 + q2:\n\n(cid:104)\n\nT(cid:105)T\n\n(10)\nfor t \u2208 {\u03c4, ..., T}. Using the augmented latent state \u00afx, the dynamics equation (Eq.(6) and (7)) can\nbe rewritten as:\n\nT y2\nt\n\n\u00afyt =\n\ny1\nt\n\nT(cid:105)T\n\nif t = \u03c4\n\n\u00afxt \u223c N (0, I),\n\n\u00afxt | \u00afxt\u22121 \u223c N(cid:0) \u00afA\u00afxt\u22121, \u00afQ(cid:1),\n(cid:20) \u00afxt\n(cid:21)\n\n(cid:18)\n\n\u00afyt | \u00afxt \u223c N\n\n\u00afC\n\n(cid:19)\n\n, \u00afR\n\nfor appropriately structured \u00afA \u2208 Rp\u03c4\u00d7p\u03c4 and \u00afQ \u2208 Sp\u03c4\nrewritten as:\n\nif t > \u03c4\n\n(12)\n++. The observation model (Eq.(8)) can be\n\n1\nfor appropriately structured \u00afC \u2208 Rq\u00d7(p\u03c4 +1) and \u00afR \u2208 Sq\n++. Due to space constraints, we will not\nexplicitly show the structure of the augmented parameters \u00af\u03b8 = { \u00afC, \u00afR, \u00afA, \u00afQ}. It is straightforward\nto derive them by inspection of Eq.(9)-(13).\nWe \ufb01t the model parameters using the EM algorithm. In the E-step, because the latent and observed\nvariables are jointly Gaussian, P (\u00afxt | \u00afy1, ..., \u00afyT ) is also Gaussian and can be computed exactly by\napplying the forward-backward recursion of the Kalman smoother [21] on the augmented vectors. In\nthe M-step, we directly estimate the original parameters \u03b8 = {C i, di, Ri, Aij\nk }, as opposed to esti-\nmating the structured form of the augmented parameters \u00af\u03b8 = { \u00afC, \u00afR, \u00afA} (without loss of generality,\nwe set Qi = I):\n\n(cid:104) E(xi\n\nt\n\nT )\n\n1\n\n(cid:105)(cid:33)(cid:32) T(cid:88)\n\n(cid:34) E(xi\n\nT ) E(xi\ntxi\nt)\nt\nE(xi\nT )\n1\n\nt\n\nt=1\n\n(cid:35)(cid:33)\u22121\n\n(cid:2) C i di (cid:3) =\n\n(9)\n\n(11)\n\n(13)\n\n(cid:32) T(cid:88)\nT(cid:88)\n\nt=1\n\n1\nT\n\nt=1\n\nyi\nt\n\nRi =\n\nT \u2212 di) \u2212 C iE(xi\n\nt)(yi\n\n{(yi\n\n\u2212(yi\n\nt\n\nt \u2212 di)(yi\nt \u2212 di)E(xi\n(cid:21)\n\n(cid:32) T(cid:88)\n\nT\n\nt\n\n=\n\n. . . A12\nk\n. . . A22\nk\n\nt \u2212 di)T\n)C iT}\n\nT\n\n)C iT\n\n+ C iE(xi\n\ntxi\nt\n\n(cid:1)(cid:33)(cid:32) T(cid:88)\n\nE(cid:0)\u00afxt \u00afxT\n\nt\u22121\n\nE(cid:0)\u00afxt\u22121 \u00afxT\n\nt\u22121\n\nt=2\n\nt=2\n\n(cid:20) A11\n\n1\nA21\n1\n\n. . . A11\n. . . A21\n\nk A12\n1\nk A22\n1\n\nTo initialize the EM algorithm, we start by applying FA to each population individually, and use\nthe estimated observation matrices C 1 and C 2, as well as the mean vectors d1 and d2 and the\nobservation covariance matrices R11 and R22. The Aij\n\nk matrices are initialized at 0.\n\n4\n\n(14)\n\n(15)\n\n(16)\n\n(cid:1)(cid:33)\u22121\n\n\fFigure 2: Comparing the optimal dimensionality for FA and pCCA. Cross-validated log-\nlikelihood plotted as a function of the dimensionality of the latent state for FA (black) and pCCA\n(blue). pCCA was also applied to the same data after randomly shuf\ufb02ing the population labels\n(green). Note that the maximum possible dimensionality for pCCA is 31, which is the size of the\nsmaller of the two populations (in this case, V2).\n\n2.5 Neural recordings\n\nThe methods described above were applied to multi-electrode recordings performed simultaneously\nin visual area 1 (V1) and visual area 2 (V2) of an anaesthetised monkey, while the monkey was\nshown a set of oriented gratings with 8 different orientations. Each of the 8 orientations was shown\n400 times for a period of 1.28s, providing a total of 3200 trials. We used 1.23s of data in each trial,\nfrom 50ms after stimulus onset until the end of the trial, and proceeded to bin the observed spikes\nwith a 5ms window. The recordings include a total of 97 units in V1 and 31 units in V2 (single- and\nmulti-units). For model comparison, we performed 4-fold cross-validation, splitting the data into\nfour non-overlapping test folds with 250 trials each. We chose to analyze a subset of the trials for\nrapid iteration of the analyses, as the cross-validation procedure is computationally expensive for\nthe full dataset. Given that 1000 trials provides a total of 246,000 timepoints (at 5 ms resolution),\nthis provides a reasonable amount of data to \ufb01t any of the models with the 128 observed neurons.\nIn this study, we sought to investigate how trial-to-trial population variability in V1 relates to the\ntrial-to-trial population variability in V2. For these gratings stimuli (which are relatively simple\ncompared to naturalistic stimuli [22]), there is likely richer structure in the V1-V2 interaction for\nthe trial-to-trial variability than for the stimulus drive. To this end, we preprocessed the neural\nactivity by computing the peristimulus time histogram (PSTH), representing the trial-averaged \ufb01ring\nrate timecourse, for each neuron and experimental condition (grating orientation). For each spike\ntrain, we then subtracted the appropriate PSTH from the binned spike counts to obtain a single-trial\n\u201cresidual\u201d. The residuals across all neurons and conditions were considered together in the analyses\nshown in Section 3. Note that the methods considered in this study could also be applied to the\nPSTHs of sequentially recorded neurons in multiple areas.\n\n3 Results\n\nWe started by asking how many dimensions are needed to describe the between-population covari-\nance, relative to the number of dimensions needed to describe the within-population covariance.\nThis was assessed by applying pCCA to the labeled V1 and V2 populations, as well as FA to the two\npopulations together (which ignores the V1 and V2 labels). In this analysis, pCCA captures only\nthe between-population covariance, whereas FA captures both the between-population and within-\npopulation covariance. By comparing cross-validated data likelihoods for different dimensionalities,\nwe found that pCCA required three latent dimensions, whereas FA required 40 latent dimensions\n(Fig.2). This indicates that the zero time lag interaction between V1 and V2 is con\ufb01ned to a small\nnumber of dimensions (three) relative to the number of dimensions (40) needed to describe all co-\nvariance among the neurons. The difference of these two dimensionalities (37) describes covariance\nthat is \u2018private\u2019 to each population (i.e., within-population covariance). The FA and pCCA curves\npeak at similar cross-validated likelihoods in Fig.2 because the observation model for pCCA Eq.(2)\naccounts for the within-population covariance (which is not captured by the pCCA latents).\n\n5\n\n102030405060\u22124.795\u22124.77x 105FApCCA shu\"edpCCAlatent dimensionalitycross\u2212validated log\u2212likelihood\fFigure 3: Model selection for AR-pCCA and gLARA. (a) Comparing AR-pCCA and gLARA as\na function of the latent dimensionality (de\ufb01ned as p1 + p2 for gLARA, where p2 was \ufb01xed at 15),\nfor \u03c4 = 3. (b) gLARA\u2019s cross-validated log-likelihood plotted as a function of the dimensionality\nof V1\u2019s latent state, p1 (for p2 = 15), for different choices of \u03c4. (c) gLARA\u2019s cross-validated log-\nlikelihood plotted as a function of the dimensionality of V2\u2019s latent state, p2 (for p1 = 50), for\ndifferent choices of \u03c4.\n\nThe distinction between within-population covariance and between-population covariance is further\nsupported by re-applying pCCA, but now randomly shuf\ufb02ing the population labels. The cross-\nvalidated log-likelihood curve for these mixed populations now peaks at a larger dimensionality\nthan three. The reason is that the shuf\ufb02ing procedure removes the distinction between the two types\nof covariance, such that the pCCA latents now capture both types of covariance (of the original\nunmixed populations). The peak for mixed pCCA occurs at a lower dimensionality than for FA for\ntwo reasons: i) because the mixed populations have the same number of neurons as the original\npopulations (97 and 31), the maximum number of dimensions that can be identi\ufb01ed by pCCA is 31,\nand ii) for the same latent dimensionality, pCCA has a larger number of parameters than FA, which\nmakes pCCA more prone to over\ufb01tting.\nTogether, the analyses in Fig.2 demonstrate two key points. First, if the focus of the analysis lies\nin the interaction between populations, then pCCA provides a more parsimonious description, as it\nfocuses exclusively on the covariance between populations. In contrast, FA is unable to distinguish\nwithin-population covariance from between-population covariance. Second, the neuron groupings\nfor V1 and V2 are meaningful, as the number of dimensions needed to describe the covariance\nbetween V1 and V2 is small relative to that within each population.\nWe then analysed the performance of the models with latent dynamics (AR-pCCA and gLARA).\nThe cross-validated log-likelihood for these models depends jointly on the dimensionality of the\nlatent state, p, and the order of the auto-regressive model, \u03c4. For gLARA, p is the sum of the di-\nmensionalities of each population\u2019s latent state, p1 + p2, and we therefore want to jointly maximize\nthe cross-validated log-likelihood with respect to both p1 and p2. AR-pCCA required a latent di-\nmensionality of p = 70, while gLARA peaked for a joint latent dimensionality of 65 (p1 = 50 and\np2 = 15) (Fig.3a). When computing the performance of AR-pCCA we considered models with\np \u2208 {5, 10, ..., 75} and \u03c4 \u2208 {1, 3, ..., 7} (Fig.3a shows the \u03c4 = 3 case). To access how gLARA\u2019s\ncross-validated log-likelihood varied with the latent dimensionalities and the model order, we plot-\nted it in Fig.3b, for p2 = 15 and p1 \u2208 {5, 10, ..., 50}, for different choices of \u03c4. This showed that\nthe performance is greater for an order 3 model, and that it saturates by the time p1 reaches 50.\nIn Fig.3c, we did a similar analysis for the dimensionality of V2\u2019s latent state, where p1 was held\nconstant at 50 and p2 \u2208 {5, 10, ..., 25}. The cross-validated log-likelihood shows a clear peak at\np2 = 15 regardless of \u03c4. We found that, for both models, the cross-validated log-likelihood peaks\nfor \u03c4 = 3 (see Fig.3b and 3c for gLARA, results not shown for AR-pCCA).\nFinally, we asked which model, AR-pCCA or gLARA, better describes the data. Note that gLARA\nis a special case of AR-pCCA, where the observation matrix in Eq.(8) is constrained to have a block\ndiagonal structure (with blocks C 1 and C 2). The key difference between the two models is that\ngLARA assigns a non-overlapping set of latent variables to each population. We found that gLARA\noutperforms AR-pCCA (Fig.3a). This suggests that the extra \ufb02exibility of the AR-pCCA model\n\n6\n\n20406080\u22124.62\u22124.52x 105AR\u2212pCCAgLARAlatent dimensionality(a)cross\u2212validated log\u2212likelihood1020304050\u03c4 = 1\u03c4 = 5\u03c4 = 3latent dimensionality p1(b)510152025\u03c4 = 1\u03c4 = 5\u03c4 = 3latent dimensionality p2(c)\fFigure 4: Leave-one-neuron-out prediction using gLARA. Observed activity (black) and the\nleave-one-neuron-out prediction of gLARA (blue) for a representative held-out trial, averaged over\n(a) the V1 population and (b) the V2 population. Note that the activity can be negative because we\nare analyzing the single-trial residuals (cf. Section 2.5).\n\nleads to over\ufb01tting and that the data are better explained by considering two separate sets of latent\nvariables that interact.\nThe optimal latent dimensionalities found for AR-pCCA and gLARA are substantially higher than\nthose found for pCCA, as the latent states now also capture non-zero time lag interactions between\nthe populations, and the dynamics within each population. For gLARA, the between-population\nt and\ncovariance must be accounted for by the interaction between the population-speci\ufb01c latents, x1\nt , because there are no shared latents in this model. Thus, the interaction between V1 and V2 is\nx2\nsummarized by the A12\nk matrices. Also, both AR-pCCA and gLARA outperform FA and\npCCA (comparing vertical axes in Fig.2 and 3), showing that there is meaningful temporal structure\nin how V1 and V2 interact that can be captured by these models.\nHaving performed a systematic, relative comparison between AR-pCCA and gLARA models of dif-\nferent complexities, we asked how well the best gLARA model \ufb01t the data in an absolute sense. To\ndo so, we used 3/4 of the data to \ufb01t the model parameters and performed leave-one-neuron-out pre-\n\nk and A21\n\ndiction [15] on the remaining 1/4. This is done by estimating the latent states E(cid:0)x1\nand E(cid:0)x2\n\n(cid:1)\n(cid:1) using all but one neuron. This estimate of the latent state is then used to\n\n1,...,T | y1\n\n1,...,T\n\n1,...,T | y2\n\n1,...,T\n\nk and A21\n\nk and A22\n\npredict the activity of the neuron that was left out (the same procedure was repeated for each neu-\nron). For visualization purposes, we averaged the predicted activity across neurons for a given trial\nand compared it to the recorded activity averaged across neurons for the same trial. We found that\nthey indeed tracked each other, as shown in Fig.4 for a representative trial.\nFinally, we asked whether gLARA reveals differences in the time structures of the within-population\ndynamics and the between-population interactions. We computed the Frobenius norm of both the\nwithin-population dynamics matrices A11\nk (Fig.5a) and the between-population interaction\nk (Fig.5b), for p1 = 50, p2 = 15 and \u03c4 = 3 (k \u2208 {1, 2, 3}), which is the model\nmatrices A12\nfor which the cross-validated log-likelihood was the highest. The time structure of the within-\npopulation dynamics appears to differ from that of the between-population interaction. In particular,\nthe latents for each area depend more strongly on its own previous latents as the time delay increases\nup to 15 ms (Fig.5a). In contrast, the dependence between areas is stronger at time lags of 5 and\n15 ms, compared to 10 ms (Fig.5b). Note that the peak of the cross-validated log-likelihood for\n\u03c4 = 3 (Fig.3) shows that delays longer than 15ms do not contribute to an increase in the accuracy\nof the model and, therefore, the most signi\ufb01cant interactions between these areas may occur within\nthis time window. The structure seen in Fig.5 is not present if the same analysis is performed on\ndata that are shuf\ufb02ed across time (results not shown). Because the latent states may have different\nscales, it is not informative to compare the magnitude of A12\nk and A22\nk and A22\nalso have different dimensions). Thus, we divided the norms for each Aij\nk matrix by the respective\nmaximum across k.\n\nk and A21\n\nk or A11\n\nk (A11\n\nk\n\n7\n\n20040060080010001200\u22124040time (ms)average activity (spikes/s)observed activitypredicted activityV1(a)20040060080010001200\u22122020time (ms)observed activitypredicted activityV2(b)\fFigure 5: Temporal structure of coupling matrices for gLARA. (a) Frobenius norm of the within-\nk , for k \u2208 {1, 2, 3}. Each curve was divided by its\npopulation dynamics matrices A11\nmaximum value. (b) Same as (a) for the between-population interaction matrices A12\n\nk and A22\n\nk and A21\nk .\n\n4 Discussion\n\nWe started by applying standard methods, FA and pCCA, to neural activity recorded simultaneously\nfrom visual areas V1 and V2. We found that the neuron groupings by brain area are meaningful,\nas the covariance of the neurons across areas is lower dimensional than that within each area. We\nthen proposed an extension to pCCA that takes temporal dynamics into account and allows for\nthe separation of within-population dynamics from between-population interactions (gLARA). This\nmethod was then shown to provide a better characterization of the two-population neural activity\nthan FA and pCCA.\nIn the context of studying the interaction between populations of neurons, capturing the information\n\ufb02ow is key to understanding how information is processed in the brain [3\u20137,23]. To do so, one must\nbe able to characterize the directionality of these between-population interactions. Previous studies\nhave sought to identify the directionality of interactions directly between neurons, using measures\nsuch as Granger causality [10] (and related extensions, such as directed transfer function (DTF)\n[24]), and directed information [11]. Here, we proposed to study between-population interaction\non the level of latent variables, rather than of the neurons themselves. The advantage is that this\napproach scales better with the number of recorded neurons and provides a more succinct picture\nof the structure of these interactions. To detect \ufb01ne timescale interactions, it may be necessary to\nreplace the linear-Gaussian model with a point process model on the spike trains [25].\n\nAcknowledgments\n\nThis work was supported by NIH-EY016774 and Fundac\u00b8\u02dcao para a Ci\u02c6encia e a Tecnologia graduate\nscholarship SFRH/BD/52069/2012.\n\nReferences\n\n[1] Xiaoxuan Jia, Seiji Tanabe, and Adam Kohn. Gamma and the coordination of spiking activity\n\nin early visual cortex. Neuron, 77(4):762\u2013774, February 2013.\n\n[2] Misha B. Ahrens, Jennifer M. Li, Michael B. Orger, Drew N. Robson, Alexander F. Schier,\nFlorian Engert, and Ruben Portugues. Brain-wide neuronal dynamics during motor adaptation\nin zebra\ufb01sh. Nature, 485(7399):471\u2013477, May 2012.\n\n[3] David A. Crowe, Shikha J. Goodwin, Rachael K. Blackman, So\ufb01a Sakellaridi, Scott R. Spon-\nheim, Angus W. MacDonald Iii, and Matthew V. Chafee. Prefrontal neurons transmit signals to\nparietal neurons that re\ufb02ect executive control of cognition. Nature Neuroscience, 16(10):1484\u2013\n1491, October 2013.\n\n[4] Georgia G. Gregoriou, Stephen J. Gotts, Huihui Zhou, and Robert Desimone. High-\nfrequency, long-range coupling between prefrontal and visual cortex during attention. Science,\n324(5931):1207\u20131210, May 2009.\n\n8\n\n510150.41V1 \u0002 V1V2 \u0002 V2time delay (ms)Frobenius norm (a.u.)(a)510150.751V2 \u0002 V1V1 \u0002 V2time delay (ms)(b)\f[5] Yuri B. Saalmann, Mark A. Pinsk, Liang Wang, Xin Li, and Sabine Kastner. The pulvinar reg-\nulates information transmission between cortical areas based on attention demands. Science,\n337(6095):753\u2013756, August 2012.\n\n[6] R. F. Salazar, N. M. Dotson, S. L. Bressler, and C. M. Gray. Content-speci\ufb01c fronto-parietal\nsynchronization during visual working memory. Science, 338(6110):1097\u20131100, November\n2012.\n\n[7] Yuriria V\u00b4azquez, Emilio Salinas, and Ranulfo Romo. Transformation of the neural code for\ntactile detection from thalamus to cortex. Proceedings of the National Academy of Sciences,\n110(28):E2635\u2013E2644, July 2013.\n\n[8] Davi D. Bock, Wei-Chung Allen Lee, Aaron M. Kerlin, Mark L. Andermann, Greg Hood,\nArthur W. Wetzel, Sergey Yurgenson, Edward R. Soucy, Hyon Suk Kim, and R. Clay Reid.\nNetwork anatomy and in vivo physiology of visual cortical neurons. Nature, 471(7337):177\u2013\n182, March 2011.\n\n[9] Jonathan W. Pillow, Jonathon Shlens, Liam Paninski, Alexander Sher, Alan M. Litke, E. J.\nChichilnisky, and Eero P. Simoncelli. Spatio-temporal correlations and visual signalling in a\ncomplete neuronal population. Nature, 454(7207):995\u2013999, August 2008.\n\n[10] Sanggyun Kim, David Putrino, Soumya Ghosh, and Emery N. Brown. A granger causality\nmeasure for point process models of ensemble neural spiking activity. PLoS Comput Biol,\n7(3):e1001110, March 2011.\n\n[11] Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, and Nicholas G. Hatsopoulos. Es-\ntimating the directed information to infer causal relationships in ensemble neural spike train\nrecordings. Journal of Computational Neuroscience, 30(1):17\u201344, February 2011.\n\n[12] Jakob H. Macke, Lars Buesing, John P. Cunningham, M. Yu Byron, Krishna V. Shenoy, and\nManeesh Sahani. Empirical models of spiking in neural populations. In NIPS, pages 1350\u2013\n1358, 2011.\n\n[13] Mark Stopfer, Vivek Jayaraman, and Gilles Laurent.\n\nIntensity versus identity coding in an\n\nolfactory system. Neuron, 39(6):991\u20131004, September 2003.\n\n[14] Jayant E. Kulkarni and Liam Paninski. Common-input models for multiple neural spike-train\n\ndata. Network: Computation in Neural Systems, 18(4):375\u2013407, January 2007.\n\n[15] Byron M. Yu, John P. Cunningham, Gopal Santhanam, Stephen I. Ryu, Krishna V. Shenoy, and\nManeesh Sahani. Gaussian-process factor analysis for low-dimensional single-trial analysis of\nneural population activity. In NIPS, pages 1881\u20131888, 2008.\n\n[16] Wieland Brendel, Ranulfo Romo, and Christian K. Machens. Demixed principal component\n\nanalysis. In NIPS, pages 2654\u20132662, 2011.\n\n[17] Valerio Mante, David Sussillo, Krishna V. Shenoy, and William T. Newsome. Context-\ndependent computation by recurrent dynamics in prefrontal cortex. Nature, 503(7474):78\u201384,\nNovember 2013.\n\n[18] John P. Cunningham and Byron M. Yu. Dimensionality reduction for large-scale neural record-\n\nings. Nature Neuroscience, 17(11):1500\u20131509, November 2014.\n\n[19] B. S. Everitt. An Introduction to Latent Variable Models. Springer Netherlands, Dordrecht,\n\n1984.\n\n[20] Francis R. Bach and Michael I. Jordan. A probabilistic interpretation of canonical correlation\n\nanalysis. 2005.\n\n[21] Brian DO Anderson and John B. Moore. Optimal \ufb01ltering. Courier Dover Publications, 2012.\n[22] Jeremy Freeman, Corey M. Ziemba, David J. Heeger, Eero P. Simoncelli, and J. Anthony\nMovshon. A functional and perceptual signature of the second visual area in primates. Nature\nNeuroscience, 16(7):974\u2013981, July 2013.\n\n[23] Pascal Fries. A mechanism for cognitive dynamics: neuronal communication through neuronal\n\ncoherence. Trends in Cognitive Sciences, 9(10):474\u2013480, October 2005.\n\n[24] M. J. Kaminski and K. J. Blinowska. A new method of the description of the information \ufb02ow\n\nin the brain structures. Biological Cybernetics, 65(3):203\u2013210, July 1991.\n\n[25] Anne C. Smith and Emery N. Brown. Estimating a state-space model from point process\n\nobservations. Neural Computation, 15(5):965\u2013991, May 2003.\n\n9\n\n\f", "award": [], "sourceid": 1541, "authors": [{"given_name": "Joao", "family_name": "Semedo", "institution": "Carnegie Mellon University"}, {"given_name": "Amin", "family_name": "Zandvakili", "institution": "Albert Einstein College of Medicine"}, {"given_name": "Adam", "family_name": "Kohn", "institution": "Albert Einstein College of Medicine"}, {"given_name": "Christian", "family_name": "Machens", "institution": "Champalimaud Centre for the Unknown"}, {"given_name": "Byron", "family_name": "Yu", "institution": "Carnegie Mellon University"}]}