{"title": "Neural characterization in partially observed populations of spiking neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1161, "page_last": 1168, "abstract": null, "full_text": "Neural characterization in partially observed\n\npopulations of spiking neurons\n\nJonathan W. Pillow\nPeter Latham\nGatsby Computational Neuroscience Unit, UCL\n\n17 Queen Square, London WC1N 3AR, UK\n\npillow@gatsby.ucl.ac.uk\n\npel@gatsby.ucl.ac.uk\n\nAbstract\n\nPoint process encoding models provide powerful statistical methods for under-\nstanding the responses of neurons to sensory stimuli. Although these models have\nbeen successfully applied to neurons in the early sensory pathway, they have fared\nless well capturing the response properties of neurons in deeper brain areas, ow-\ning in part to the fact that they do not take into account multiple stages of pro-\ncessing. Here we introduce a new twist on the point-process modeling approach:\nwe include unobserved as well as observed spiking neurons in a joint encoding\nmodel. The resulting model exhibits richer dynamics and more highly nonlinear\nresponse properties, making it more powerful and more \ufb02exible for \ufb01tting neural\ndata. More importantly, it allows us to estimate connectivity patterns among neu-\nrons (both observed and unobserved), and may provide insight into how networks\nprocess sensory input. We formulate the estimation procedure using variational\nEM and the wake-sleep algorithm, and illustrate the model\u2019s performance using a\nsimulated example network consisting of two coupled neurons.\n\n1 Introduction\n\nA central goal of computational neuroscience is to understand how the brain transforms sensory\ninput into spike trains, and considerable effort has focused on the development of statistical models\nthat can describe this transformation. One of the most successful of these is the linear-nonlinear-\nPoisson (LNP) cascade model, which describes a cell\u2019s response in terms of a linear \ufb01lter (or recep-\ntive \ufb01eld), an output nonlinearity, and an instantaneous spiking point process [1\u20135]. Recent efforts\nhave generalized this model to incorporate spike-history and multi-neuronal dependencies, which\ngreatly enhances the model\u2019s \ufb02exibility, allowing it to capture non-Poisson spiking statistics and\njoint responses of an entire population of neurons [6\u201310].\n\nPoint process models accurately describe the spiking responses of neurons in the early visual path-\nway to light, and of cortical neurons to injected currents. However, they perform poorly both in\nhigher visual areas and in auditory cortex, and often do not generalize well to stimuli whose statis-\ntics differ from those used for \ufb01tting. Such failings are in some ways not surprising: the cascade\nmodel\u2019s stimulus sensitivity is described with a single linear \ufb01lter, whereas responses in the brain\nre\ufb02ect multiple stages of nonlinear processing, adaptation on multiple timescales, and recurrent\nfeedback from higher-level areas. However, given its mathematical tractability and its accuracy in\ncapturing the input-output properties of single neurons, the model provides a useful building block\nfor constructing richer and more complex models of neural population responses.\n\nHere we extend the point-process modeling framework to incorporate a set of unobserved or \u201chid-\nden\u201d neurons, whose spike trains are unknown and treated as hidden or latent variables. The unob-\nserved neurons respond to the stimulus and to synaptic inputs from other neurons, and their spiking\n\n1\n\n\factivity can in turn affect the responses of the observed neurons. Consequently, their functional\nproperties and connectivity can be inferred from data [11\u201318]. However, the idea is not to simply\nbuild a more powerful statistical model, but to develop a model that can help us learn something\nabout the underlying structure of networks deep in the brain.\n\nAlthough this expanded model offers considerably greater \ufb02exibility in describing an observed set\nof neural responses, it is more dif\ufb01cult to \ufb01t to data. Computing the likelihood of an observed set\nof spike trains requires integrating out the probability distribution over hidden activity, and we need\nsophisticated algorithms to \ufb01nd the maximum likelihood estimate of model parameters. Here we\nintroduce a pair of estimation procedures based on variational EM (expectation maximization) and\nthe wake-sleep algorithm. Both algorithms make use of a novel proposal density to capture the\ndependence of hidden spikes on the observed spike trains, which allows for fast sampling of hidden\nneurons\u2019 activity. In the remainder of this paper we derive the basic formalism and demonstrate\nits utility on a toy problem consisting of two neurons, one of which is observed and one which\nis designated \u201chidden\u201d. We show that a single-cell model used to characterize the observed neuron\nperforms poorly, while a coupled two-cell model estimated using the wake-sleep algorithm performs\nmuch more accurately.\n\n2 Multi-neuronal point-process encoding model\n\nWe begin with a description of the encoding model, which generalizes the LNP model to incorporate\nnon-Poisson spiking and coupling between neurons. We refer to this as a generalized linear point-\nprocess (glpp) model1 [8, 9]. For simplicity, we formulate the model for a pair of neurons, although\nit can be tractably applied to data from a moderate-sized populations (\u223c10-100 neurons). In this\nsection we do not distinguish between observed and unobserved spikes, but will do so in the next.\nLet xt denote the stimulus at time t, and yt and zt denote the number of spikes elicited by two\nneurons at t, where t \u2208 [0, T ] is an index over time. Note that x t is a vector containing all elements\nof the stimulus that are causally related to the (scalar) responses y t and zt at time t. Furthermore, let\nus assume t takes on a discrete set of values, with bin size \u2206, i.e., t \u2208 {0, \u2206, 2\u2206, . . . , T}. Typically\n\u2206 is suf\ufb01ciently small that we observe only zero or one spike in every bin: y t, zt \u2208 {0, 1}.\nThe conditional intensity (or instantaneous spike rate) of each cell depends on both the stimulus and\nthe recent spiking history via a bank of linear \ufb01lters. Let y [t\u2212\u03c4,t) and z[t\u2212\u03c4,t) denote the (vector)\nspike train histories at time t. Here [t \u2212 \u03c4, t) refers to times between t \u2212 \u03c4 and t \u2212 \u2206, so y [t\u2212\u03c4,t) \u2261\n(yt\u2212\u03c4 , yt\u2212\u03c4 +\u2206, ..., yt\u22122\u2206, yt\u2212\u2206) and similarly for z[t\u2212\u03c4,t). The conditional intensities for the two\ncells are then given by\n\n\u03bbyt = f(ky \u00b7 xt + hyy \u00b7 y[t\u2212\u03c4,t) + hyz \u00b7 z[t\u2212\u03c4,t))\n\u03bbzt = f(kz \u00b7 xt + hzz \u00b7 z[t\u2212\u03c4,t) + hzy \u00b7 y[t\u2212\u03c4,t))\n\n(1)\n\nwhere ky and kz are linear \ufb01lters representing each cell\u2019s receptive \ufb01eld, h yy and hzz are \ufb01lters\noperating on each cell\u2019s own spike-train history (capturing effects like refractoriness and bursting),\nand hzy and hyz are a \ufb01lters coupling the spike train history of each neuron to the other (allowing\nthe model to capture statistical correlations and functional interactions between neurons). The \u201c\u00b7\u201d\nnotation represents the standard dot product (performing a summation over either index or time):\n\nk \u00b7 xt \u2261 (cid:1)\nh \u00b7 y[t\u2212\u03c4,t) \u2261 t\u2212\u2206(cid:1)\n\ni\n\nkixit\n\nht(cid:1) yt(cid:1),\n\nt(cid:1)=t\u2212\u03c4\n\nwhere the index i run over the components of the stimuli (which typically are time points extending\ninto the past). The second expression generalizes to h \u00b7 z [t\u2212\u03c4,t).\nThe nonlinear function, f , maps the input to the instantaneous spike rate of each cell. We assume\nhere that f is exponential, although any monotonic convex function that grows no faster than expo-\n\n1We adapt this terminology from \u201cgeneralized linear model\u201d (glm), a much more general class of models\n\nfrom the statistics literature [19]; this model is a glm whose distribution function is Poisson.\n\n2\n\n\f=\n\nx\n\nIJE\u0006K\u0006KI\u0014BE\u0006JAH\n\n\u0006\u0006\u0006\u0006E\u0006A=HEJO\n\nIJ\u0006?D=IJE?\n\nIFE\u0006E\u0006C\n\nky\n\nf\n\n\u0002\n\nF\u0006IJ\u0002IFE\u0006A\u0014BE\u0006JAH\n\n\u0006AKH\u0006\u0006\u0014O\n\n?\u0006KF\u0006E\u0006C\n\nBE\u0006JAHI\n\nx\n\nkz\n\n\u0002\n\n\u0006AKH\u0006\u0006\u0014\u0007\n\nhyy\n\nhzy\nhyz\n\nhzz\n\n\u0014F\u0006E\u0006J\u0002FH\u0006?AII\u0014\u0006\u0006@A\u0006\n\n>\n\nAGKEL=\u0006A\u0006J\u0014\u0006\u0006@A\u0006\u0014@E=CH=\u0006\u0014\n\n?\n\n?=KI=\u0006\u0014IJHK?JKHA\n\nANF\u001c\u0007\u001d\n\nIJE\u0006K\u0006KI\n\n\u0006AKH\u0006\u0006\nO\u0014IFE\u0006AI\u0014\n\n\u0006AKH\u0006\u0006\n\u0007\u0014IFE\u0006AI\n\nky\n\nhyy\n\nhzy\n\n\u0002\n\nxt\n\ny[t-\u03c4\u0002t)\n\nyt \n\nz[t-\u03c4,t)\n\nJE\u0006A\n\ny\n\nz\n\nx1\n\ny1\n\nz1\n\nx3\n\ny3\n\nz3\n\nx2\n\ny2\n\nz2\n\nJE\u0006A\n\n\u0002\u0002\u0002\n\n\u0002\u0002\u0002\n\n\u0002\u0002\u0002\n\nx4\n\ny4\n\nz4\n\nFigure 1: Schematic of generalized linear point-process (glpp) encoding model. a, Diagram of model\nparameters for a pair of coupled neurons. For each cell, the parameters consist of a stimulus \ufb01lter\n(e.g., ky), a spike-train history \ufb01lter (hyy), and a \ufb01lter capturing coupling from the spike train history\nof the other cell (hzy). The \ufb01lter outputs are summed, pass through an exponential nonlinearity, and\ndrive spiking via an instantaneous point process. b, Equivalent diagram showing just the parameters\nof the neuron y, as used for drawing a sample yt. Gray boxes highlight the stimulus vector xt and\nspike train history vectors that form the input to the model on this time step. c, Simpli\ufb01ed graphical\nmodel of the glpp causal structure, which allows us to visualize how the likelihood factorizes. Arrows\nbetween variables indicate conditional dependence. For visual clarity, temporal dependence is depicted\nas extending only two time bins, though in real data extends over many more. Red arrows highlight the\ndependency structure for a single time bin of the response y3.\n\nnentially is suitable [9]. Equation 1 is equivalent to f applied to a linear convolution of the stimulus\nand spike trains with their respective \ufb01lters; a schematic is shown in \ufb01gure 1.\nThe probability of observing y t spikes in a bin of size \u2206 is given by a Poisson distribution with rate\nparameter \u03bbyt\u2206,\n\nP (yt|\u03bbyt) =\n\n(\u03bbyt\u2206)yt\n\nyt!\n\ne\u2212\u03bbyt\u2206,\n\n(2)\n\nand likewise for P (zt|\u03bbzt). The likelihood of the full set of spike times is the product of condition-\nally independent terms,\n\nP (Y, Z|X, \u03b8) =\n\n(cid:2)\n\nt\n\nP (yt|\u03bbyt)P (zt|\u03bbzt),\n\n(3)\n\nwhere Y and Z represent the full spike trains, X denotes the full set of stimuli, and \u03b8 \u2261\n{ky, kz, hyy, hzy, hzz, hyz} denotes the model parameters. This factorization is possible because\n\u03bbyt and \u03bbzt depend only on the process history up to time t, making y t and zt conditionally inde-\npendent given the stimulus and spike histories up to t (see Fig. 1c). If the response at time t were to\ndepend on both the past and future response, we would have a causal loop , preventing factorization\nand making both sampling and likelihood evaluation very dif\ufb01cult.\n\nThe model parameters can be tractably \ufb01t to spike-train data using maximum likelihood. Although\nthe parameter space may be high-dimensional (incorporating spike-history dependence over many\ntime bins and stimulus dependence over a large region of time and space), the negative log-likelihood\nis convex with respect to the model parameters, making fast convex optimization methods feasible\nfor \ufb01nding the global maximum [9]. We can write the log-likelihood simply as\n\nlog P (Y, Z|X, \u03b8) =\n\n(yt log \u03bbyt + zt log \u03bbzt \u2212 \u2206\u03bbyt \u2212 \u2206\u03bbzt) + c,\n\n(4)\n\n(cid:1)\n\nwhere c is a constant that does not depend on \u03b8.\n\nt\n\n3\n\n\f3 Generalized Expectation-Maximization and Wake-Sleep\nMaximizing log P (Y, Z|X, \u03b8) is straightforward if both Y and Z are observed, but here we are\ninterested in the case where Y is observed and Z is \u201chidden\u201d. Consequently, we have to average\nover Z. The log-likelihood of the observed data is given by\n(cid:1)\n\nL(\u03b8) \u2261 log P (Y |\u03b8) = log\n\nP (Y, Z|\u03b8),\n\nZ\n\nwhere we have dropped X to simplify notation (all probabilities can henceforth be taken to also\ndepend on X). This sum over Z is intractable in many settings, motivating the use of approximate\nmethods for maximizing likelihood. Variational expectation-maximization (EM) [20, 21] and the\nwake-sleep algorithm [22] are iterative algorithms for solving this problem by introducing a tractable\napproximation to the conditional probability over hidden variables,\n\nQ(Z|Y, \u03c6) \u2248 P (Z|Y, \u03b8),\n\n(5)\n\n(6)\n\nwhere \u03c6 denotes the parameter vector determining Q.\nThe idea behind variational EM can be described as follows. Concavity of the log implies a lower\nbound on the log-likelihood:\n\nL(\u03b8) \u2265 (cid:1)\n\nQ(Z|Y, \u03c6) log\n= log P (Y |\u03b8) \u2212 DKL\n\nZ\n\nP (Y, Z|\u03b8)\nQ(Z|Y, \u03c6)\n(cid:3)\n(cid:4)\nQ(Z|Y, \u03c6), P (Z|Y, \u03b8)\n\n,\n\n(7)\nwhere Q is any probability distribution over Z and D KL is the Kullback-Leibler (KL) divergence\nbetween Q and P (using P as shorthand for P (Z|Y, \u03b8)), which is always \u2265 0. In standard EM, Q\ntakes the same functional form as P , so that by setting \u03c6 = \u03b8 (the E-step), D KL is 0 and the bound is\ntight, since the right-hand-side of eq. 7 equals L(\u03b8). Fixing \u03c6, we then maximize the r.h.s. for \u03b8 (the\nM-step), which is equivalent to maximizing the expected complete-data log-likelihood (expectation\ntaken w.r.t. Q), given by\n\n(cid:5)\n\nlog P (Y, Z|\u03b8)\n\nE\n\nQ(Z|Y,\u03c6)\n\n(cid:6) \u2261 (cid:1)\n\nZ\n\nQ(Z|Y, \u03c6) log P (Y, Z|\u03b8).\n\n(8)\n\nEach step increases a lower bound on the log-likelihood, which can always be made tight, so the\nalgorithm converges to a \ufb01xed point that is a maximum of L(\u03b8). The variational formulation differs\nin allowing Q to take a different functional form than P (i.e., one for which eq. 8 is easier to max-\nimize). The variational E-step involves minimizing D KL(Q, P ) with respect to \u03c6, which remains\npositive if Q does not approximate P exactly; the variational M-step is unchanged from the standard\nalgorithm.\nIn certain cases, it is easier to minimize the KL divergence D KL(P, Q) than DKL(Q, P ), and\ndoing so in place of the variational E-step above results in the wake-sleep algorithm [22]. In this\nalgorithm, we \ufb01t \u03c6 by minimizing DKL(P, Q) averaged over Y , which is equivalent to maximizing\nthe expectation\n\n(cid:5)\n\nlog Q(Z|Y, \u03c6)\n\n(cid:6) \u2261 (cid:1)\n\nE\n\nP (Y,Z|\u03b8)\n\nP (Y, Z|\u03b8) log Q(Z|Y, \u03c6),\n\n(9)\n\nY,Z\n\nwhich bears an obvious symmetry to eq. 8. Thus, both steps of the wake-sleep algorithm involve\nmaximizing an expected log-probability. In the \u201cwake\u201d step (identical to the M-step), we \ufb01t the\ntrue model parameters \u03b8 by maximizing (an approximation to) the log-probability of the observed\ndata Y . In the \u201csleep\u201d step, we \ufb01t \u03c6 by trying to \ufb01nd a distribution Q that best approximates the\nconditional dependence of Z on Y , averaged over the joint distribution P (Y, Z|\u03b8). We can therefore\nthink of the wake phase as learning a model of the data (parametrized by \u03b8), and the sleep phase as\nlearning a consistent internal description of that model (parametrized by \u03c6).\nBoth variational-EM and the wake-sleep algorithm work well when Q closely approximates P , but\nmay fail to converge to a maximum of the likelihood if there is a signi\ufb01cant mismatch. Therefore,\nthe ef\ufb01ciency of these methods depends on choosing a good approximating distribution Q(Z|Y, \u03c6)\n\u2014 one that closely matches P (Z|Y, \u03b8). In the next section we show that considerations of the spike\ngeneration process can provide us with a good choice for Q.\n\n4\n\n\f=\n\n=?=KI=\u0006\u0014\u0006\u0006@A\u0006\u0014I?DA\u0006=JE?\n\n>\n\n?=KI=\u0006\u0014IJHK?JKHA\n\nIJE\u0006K\u0006KI\n\n\u0006AKH\u0006\u0006\u0014\u0007\u0014IFE\u0006AI\n\n\u001cDE@@A\u0006\u001d\n\n\u0006AKH\u0006\u0006\u0014O\u0014IFE\u0006AI\n\n\u001c\u0006>IAHLA@\u001d\n\nANF\u001c\u0007\u001d\n\nkz\n\nhzz\n\nhyz\n\n\u0002\n\nxt\n\nz[t-t,t\u001d\n\nzt \n\ny[t-\u03c4 , t+\u03c4]\n\nJE\u0006A\n\nx3\n\nz3\n\ny3\n\nx4\n\nz4\n\ny4\n\nx1\n\nz1\n\ny1\n\nx2\n\nz2\n\ny2\n\nJE\u0006A\n\nFigure 2: Schematic diagram of the (acausal) model for the proposal density Q(Z|Y, \u03c6), the condi-\ntional density on hidden spikes given the observed spike data. a, Conditional model schematic, which\nallows zt to depend on the observed response both before and after t. b, Graphical model showing\ncausal structure of the acausal model, with arrows indicating dependency. The observed spike re-\nsponses (gray circles) are no longer dependent variables, but regarded as \ufb01xed, external data, which is\nnecessary for computing Q(zt|Y, \u03c6). Red arrows illustrate the dependency structure for a single bin of\nthe hidden response, z3.\n\n4 Estimating the model with partially observed data\nTo understand intuitively why the true P (Z|Y, \u03b8) is dif\ufb01cult to sample, and to motivate a reasonable\nchoice for Q(Z|Y, \u03c6), let us consider a simple example: suppose a single hidden neuron (whose full\nresponse is Z) makes a strong excitatory connection to an observed neuron (whose response is Y ),\nso that if zt = 1 (i.e., the hidden neuron spikes at time t), it is highly likely that y t+1 = 1 (i.e.,\nthe observed neuron spikes at time t + 1). Consequently, under the true P (Z|Y, \u03b8), which is the\nprobability over Z in all time bins given Y in all time bins, if y t+1 = 1 there is a high probability\nthat zt = 1. In other words, zt exhibits an acausal dependence on y t+1. But this acausal dependence\nis not captured in Equation 3, which expresses the probability over z t as depending only on past\nevents at time t, ignoring the future event y t+1 = 1.\nBased on this observation \u2014 essentially, that the effect of future observed spikes on the probability of\nunobserved spikes depends on the connection strength between the two neurons \u2014 we approximate\nP (Z|Y, \u03b8) using a separate point-process model Q(Z|Y, \u03c6), which contains set of acausal linear\n\ufb01lters from Y to Z. Thus we have\n\n\u02dc\u03bbzt = exp(\u02dckz \u00b7 xt + \u02dchzz \u00b7 z[t\u2212\u03c4,t) + \u02dchzy \u00b7 y[t\u2212\u03c4,t+\u03c4 )).\n\n(10)\nAs above, \u02dckz, \u02dchzz and \u02dchzy are linear \ufb01lters; the important difference is that \u02dchzy \u00b7 y[t\u2212\u03c4,t+\u03c4 ) is\na sum over past and future time: from t \u2212 \u03c4 to t + \u03c4 \u2212 \u2206. For this model, the parameters are\n\u03c6 = (\u02dckz, \u02dchzz, \u02dchzy). Figure 2 illustrates the model architecture.\nWe now have a straightforward way to implement the wake-sleep algorithm, using samples from Q\nto perform the wake phase (estimating \u03b8), and samples from P (Y, Z|\u03b8) to perform the sleep phase\n(estimating \u03c6). The algorithm works as follows:\n\n\u2022 Wake: Draw samples {Zi} \u223c Q(Z|Y, \u03c6), where Y are the observed spike trains and \u03c6 is\nthe current set of parameters for the acausal point-process model Q. Evaluate the expected\ncomplete-data log-likelihood (eq. 8) using Monte Carlo integration:\n\n(cid:5)\n\n(cid:6)\nlog P (Y, Z|\u03b8)\n\nE\n\nQ\n\n= lim\nN\u2192\u221e\n\n1\nN\n\nN(cid:1)\n\ni=1\n\nlog P (Y, Zi|\u03b8).\n\n(11)\n\nThis is log-concave in \u03b8, meaning that we can ef\ufb01ciently \ufb01nd its global maximum to \ufb01t \u03b8.\n\n5\n\n\f\u2022 Sleep: Draw samples {Yj, Zj} \u223c P (Y, Z|\u03b8), the true encoding distribution with current\nparameters \u03b8. (Note these samples are pure \u201cfantasy\u201d data, drawn without reference to the\nobserved Y ). As above, compute the expected log-probability (eq. 9) using these samples:\n\n(cid:5)\n\n(cid:6)\nlog Q(Z|Y, \u03c6)\n\nE\n\nP (Y,Z|\u03b8)\n\n= lim\nN\u2192\u221e\n\n1\nN\n\nN(cid:1)\n\ni=1\n\nlog Q(Zj|Yj, \u03c6),\n\n(12)\n\nwhich is also log-concave and thus ef\ufb01ciently maximized for \u03c6.\n\n(cid:3)\n\nOne advantage of the wake-sleep algorithm is that each complete iteration can be performed using\nonly a single set of samples drawn from Q and P . A theoretical drawback to wake-sleep, however, is\nthat the sleep step is not guaranteed to increase a lower-bound on the log-likelihood, as in variational-\nEM (wake-sleep minimizes the \u201cwrong\u201d KL divergence). We can implement variational-EM using\nthe same approximating point-process model Q, but we now require multiple steps of sampling for\na complete E-step. To perform a variational E-step, we draw samples (as above) from Q and use\nQ(Z|Y, \u03c6)||P (Z|Y, \u03b8) and its gradient with respect\nthem to evaluate both the KL divergence D KL\nto \u03c6. We can then perform noisy gradient descent to \ufb01nd a minimum, drawing a new set of samples\nfor each evaluation of DKL(Q, P ). The M-step is equivalent to the wake phase of wake-sleep,\nachievable with a single set of samples.\nOne additional use for the approximating point-process model Q is as a \u201cproposal\u201d distribution for\nMetropolis-Hastings sampling of the true P (Z|Y, \u03b8). Such samples can be used to evaluate the true\nlog-likelihood, for comparison with the variational lower bound, and for noisy gradient ascent of the\nlikelihood to examine how closely these approximate methods converge to the true ML estimate. For\nfully observed data, such samples also provide a useful means for measuring how much the entropy\nof one neuron\u2019s response is reduced by knowing the responses of its neighbors.\n\n5 Simulations: a two-neuron example\n\nTo verify the method, we applied it to a pair of neurons (as depicted in \ufb01g. 1), simulated using a\nstimulus consisting of a long presentation of white noise. We denoted one of the neurons \u201dobserved\u201d\nand the other \u201dhidden\u201d. The parameters used for the simulation are depicted in \ufb01g. 3. The cells have\nsimilarly-shaped biphasic stimulus \ufb01lters with opposite sign, like those commonly observed in ON\nand OFF retinal ganglion cells. We assume that the ON-like cell is observed, while the OFF-like\ncell is hidden. Both cells have spike-history \ufb01lters that induce a refractory period following a spike,\nwith a small peak during the relative refractory period that elicits burst-like responses. The hidden\ncell has a strong positive coupling \ufb01lter hzy onto the observed cell, which allows spiking activity in\nthe hidden cell to excite the observed cell (despite the fact that the two cells receive opposite-sign\nstimulus input). For simplicity, we assume no coupling from the observed to the hidden cell 2. Both\ntypes of \ufb01lters were de\ufb01ned in a linear basis consisting of four raised cosines, meaning that each\n\ufb01lter is speci\ufb01ed by four parameters, and the full model contains 20 parameters (i.e., 2 stimulus\n\ufb01lters and 3 spike-train \ufb01lters).\n\nFig. 3b shows rasters of the two cells\u2019 responses to a repeated presentations of a 1s Gaussian white-\nnoise stimulus with a framerate of 100Hz. Note that the temporal structure of the observed cell\u2019s\nresponse is strongly correlated with that of the hidden cell due to the strong coupling from hidden\nto observed (and the fact that the hidden cell receives slightly stronger stimulus drive).\n\nOur \ufb01rst task is to examine whether a standard, single-cell glpp model can capture the mapping from\nstimuli to spike responses. Fig. 3c shows the parameters obtained from such a \ufb01t to the observed data,\nusing 10s of the response to a non-repeating white noise stimulus (1000 samples, 251 spikes). Note\nthat the estimated stimulus \ufb01lter (red) has much lower amplitude than the stimulus \ufb01lter of the true\nmodel (gray). Fig. 3d shows the parameters obtained for an observed and a hidden neuron, estimated\nusing wake-sleep as described in section 4. Fig. 3e-f shows a comparison of the performance of the\ntwo models, indicating that the coupled model estimated with wake-sleep does a much better job of\ncapturing the temporal structure of the observed neuron\u2019s response (accounting for 60% vs. 15% of\n\n2Although the stimulus and spike-history \ufb01lters bear a rough similarity to those observed in retinal ganglion\ncells, the coupling used here is unlike coupling \ufb01lters observed (to our knowledge) between ON and OFF cells\nin retinal data; it is assumed purely for demonstration purposes.\n\n6\n\n\fJHKA\u0014F=H=\u0006AJAHI\nhzy\n\nky\n\n\u001e\n\u0002\u001f\u001e\n\u0002 \u001e\n\n\u001e\n\u0002\u001f\u001e\n\u0002 \u001e\n\u001e\n\n\u001e\n\nkz\n\n\u0002\u001e\u0002 \n\n/9\u0004\u0014IJE\u0006K\u0006KI\n\nhyy\n\nhzz\n\u001e\u0002\u001e#\n\n\u0014\n\n@\nA\nL\nH\nA\nI\n>\n\u0006\n\n\u0006\nA\n@\n@\nD\n\nE\n\n=\n\n \n\u001e\n\u0002 \n\n \n\u001e\n\u0002 \n\n>\n\n@\nA\nL\nH\nA\nI\n>\n\u0006\n\nH\nA\n\nJ\nI\n=\nH\n\n\u0006\nA\n@\n@\nD\n\nE\n\nH\nA\n\nJ\nI\n=\nH\n\n\u001e\n\n\u001e\u0002#\n\nJE\u0006A\u0014\u001cI\u001d\n\n\u001f\n\n?\n\nIE\u0006C\u0006A\u0002?A\u0006\u0006\u0014\u0006\u0006@A\u0006\n\nAIJE\u0006=JA\n\nA\nK\nH\nJ\n\n@\n\n?\u0006KF\u0006A@\u0002\u0006\u0006@A\u0006\u0014AIJE\u0006=JA\n\nKIE\u0006C\u0014L=HE=JE\u0006\u0006=\u0006\u0014-\u0004\n\nH=IJAH\u0014?\u0006\u0006F=HEI\u0006\u0006\n\nA\n\n@\nA\nL\nH\nA\nI\n>\n\u0006\n\n\u0014\n\u0006\n\u0006\n\nA\n?\n\n\u0006\n\n\u0002\nA\nC\n\u0006\nI\n\nE\n\n\u0006\n\n@\nA\nF\nK\n\u0006\n?\n\n\u0006\n\nA\n@\n\u0006\n\u0006\n\nB\n\nFIJD\u0014?\u0006\u0006F=HEI\u0006\u0006\n\n\u001d\n\u0007\n0\n\n\u001c\n\u0014\n\nA\n\nJ\n\n=\nH\n\n\u001f\u001e\u001e\n\n#\u001e\n\n\u001e\n\u001e\n\nJHKA\n\nIE\u0006C\u0006A\u0002?A\u0006\u0006\n?\u0006KF\u0006A@\n\n\u001e\u0002#\n\nJE\u0006A\u0014\u001cI\u001d\n\n\u001f\n\nFigure 3: Simulation results. a, Parameters used for generating simulated responses. The top row\nshows the \ufb01lters determining the input to the observed cell, while the bottom row shows those in\ufb02u-\nencing the hidden cell. b, Raster of spike responses of observed and hidden cells to a repeated, 1s\nGaussian white noise stimulus (top). c, Parameter estimates for a single-cell glpp model \ufb01t to the\nobserved cell\u2019s response, using just the stimulus and observed data (estimates in red; true observed-\ncell \ufb01lters in gray). d, Parameters obtained using wake-sleep to estimate a coupled glpp model, again\nusing only the stimulus and observed spike times. e, Response raster of true observed cell (obtained\nby simulating the true two-cell model), estimated single-cell model and estimated coupled model. f,\nPeri-stimulus time histogram (PSTH) of the above rasters showing that the coupled model gives much\nhigher accuracy predicting the true response.\n\nthe PSTH variance). The single-cell model, by contrast, exhibits much worse performance, which\nis unsurprising given that the standard glpp encoding model can capture only quasi-linear stimulus\ndependencies.\n\n6 Discussion\n\nAlthough most statistical models of spike trains posit a direct pathway from sensory stimuli to neu-\nronal responses, neurons are in fact embedded in highly recurrent networks that exhibit dynamics\non a broad range of time-scales. To take into account the fact that neural responses are driven by\nboth stimuli and network activity, and to understand the role of network interactions, we proposed\na model incorporating both hidden and observed spikes. We regard the observed spike responses\nas those recorded during a typical experiment, while the responses of unobserved neurons are mod-\neled as latent variables (unrecorded, but exerting in\ufb02uence on the observed responses). The resulting\nmodel is tractable, as the latent variables can be integrated out using approximate sampling methods,\nand optimization using variational EM or wake-sleep provides an approximate maximum likelihood\nestimate of the model parameters. As shown by a simple example, certain settings of model param-\neters necessitate the incorporation unobserved spikes, as the standard single-stage encoding model\ndoes not accurately describe the data.\n\nIn future work, we plan to examine the quantitative performance of the variational-EM and wake-\nsleep algorithms, to explore their tractability in scaling to larger populations, and to apply them to\nreal neural data. The model offers a promising tool for analyzing network structure and network-\nbased computations carried out in higher sensory areas, particularly in the context where data are\nonly available from a restricted set of neurons recorded within a larger population.\n\n7\n\n\fReferences\n\n[1] I. Hunter and M. Korenberg. The identi\ufb01cation of nonlinear biological systems: Wiener and hammerstein\n\ncascade models. Biological Cybernetics, 55:135\u2013144, 1986.\n\n[2] N. Brenner, W. Bialek, and R. de Ruyter van Steveninck. Adaptive rescaling optimizes information\n\ntransmission. Neuron, 26:695\u2013702, 2000.\n\n[3] H. Plesser and W. Gerstner. Noise in integrate-and-\ufb01re neurons: From stochastic input to escape rates.\n\nNeural Computation, 12:367\u2013384, 2000.\n\n[4] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Computation in\n\nNeural Systems, 12:199\u2013213, 2001.\n\n[5] E. P. Simoncelli, L. Paninski, J. Pillow, and O. Schwartz. Characterization of neural responses with\nstochastic stimuli. In M. Gazzaniga, editor, The Cognitive Neurosciences, pages 327\u2013338. MIT Press, 3rd\nedition, 2004.\n\n[6] M. Berry and M. Meister. Refractoriness and neural precision. Journal of Neuroscience, 18:2200\u20132211,\n\n1998.\n\n[7] K. Harris, J. Csicsvari, H. Hirase, G. Dragoi, and G. Buzsaki. Organization of cell assemblies in the\n\nhippocampus. Nature, 424:552\u2013556, 2003.\n\n[8] W. Truccolo, U. T. Eden, M. R. Fellows, J. P. Donoghue, and E. N. Brown. A point process framework\nfor relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. J.\nNeurophysiol, 93(2):1074\u20131089, 2004.\n\n[9] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network:\n\nComputation in Neural Systems, 15:243\u2013262, 2004.\n\n[10] J. W. Pillow, J. Shlens, L. Paninski, A. Sher, A. M. Litke, and E. P. Chichilnisky, E. J. Simoncelli. Corre-\n\nlations and coding with multi-neuronal spike trains in primate retina. SFN abstracts, #768.9, 2007.\n\n[11] D. Nykamp. Reconstructing stimulus-driven neural networks from spike times. NIPS, 15:309\u2013316, 2003.\n[12] D. Nykamp. Revealing pairwise coupling in linear-nonlinear networks. SIAM Journal on Applied Math-\n\nematics, 65:2005\u20132032, 2005.\n\n[13] M. Okatan, M. Wilson, and E. Brown. Analyzing functional connectivity using a network likelihood\n\nmodel of ensemble neural spiking activity. Neural Computation, 17:1927\u20131961, 2005.\n\n[14] L. Srinivasan, U. Eden, A. Willsky, and E. Brown. A state-space analysis for reconstruction of goal-\n\ndirected movements using neural signals. Neural Computation, 18:2465\u20132494, 2006.\n\n[15] D. Nykamp. A mathematical framework for inferring connectivity in probabilistic neuronal networks.\n\nMathematical Biosciences, 205:204\u2013251, 2007.\n\n[16] J. E. Kulkarni and L Paninski. Common-input models for multiple neural spike-train data. Network:\n\nComputation in Neural Systems, 18(4):375\u2013407, 2007.\n\n[17] B. Yu, A. Afshar, G. Santhanam, S. Ryu, K. Shenoy, and M. Sahani. Extracting dynamical structure\n\nembedded in neural activity. NIPS, 2006.\n\n[18] S. Escola and L. Paninski. Hidden Markov models applied toward the inference of neural states and the\n\nimproved estimation of linear receptive \ufb01elds. COSYNE07, 2007.\n\n[19] P. McCullagh and J. Nelder. Generalized linear models. Chapman and Hall, London, 1989.\n[20] A. Dempster, N. Laird, and R. Rubin. Maximum likelihood from incomplete data via the EM algorithm.\n\nJ. Royal Statistical Society, B, 39(1):1\u201338, 1977.\n\n[21] R. Neal and G. Hinton. A view of the EM algorithm that justi\ufb01es incremental, sparse, and other variants.\n\nIn M. I. Jordan, editor, Learning in Graphical Models, pages 355\u2013368. MIT Press, Cambridge, 1999.\n\n[22] GE Hinton, P. Dayan, BJ Frey, and RM Neal. The\u201d wake-sleep\u201d algorithm for unsupervised neural\n\nnetworks. Science, 268(5214):1158\u20131161, 1995.\n\n8\n\n\f", "award": [], "sourceid": 995, "authors": [{"given_name": "Jonathan", "family_name": "Pillow", "institution": null}, {"given_name": "Peter", "family_name": "Latham", "institution": null}]}