{"title": "Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit", "book": "Advances in Neural Information Processing Systems", "page_first": 3486, "page_last": 3495, "abstract": "Population activity measurement by calcium imaging can be combined with cellular resolution optogenetic activity perturbations to enable the mapping of neural connectivity in vivo. This requires accurate inference of perturbed and unperturbed neural activity from calcium imaging measurements, which are noisy and indirect, and can also be contaminated by photostimulation artifacts. We have developed a new fully Bayesian approach to jointly inferring spiking activity and neural connectivity from in vivo all-optical perturbation experiments. In contrast to standard approaches that perform spike inference and analysis in two separate maximum-likelihood phases, our joint model is able to propagate uncertainty in spike inference to the inference of connectivity and vice versa. We use the framework of variational autoencoders to model spiking activity using discrete latent variables, low-dimensional latent common input, and sparse spike-and-slab generalized linear coupling between neurons. Additionally, we model two properties of the optogenetic perturbation: off-target photostimulation and photostimulation transients. Using this model, we were able to fit models on 30 minutes of data in just 10 minutes. We performed an all-optical circuit mapping experiment in primary visual cortex of the awake mouse, and use our approach to predict neural connectivity between excitatory neurons in layer 2/3. Predicted connectivity is sparse and consistent with known correlations with stimulus tuning, spontaneous correlation and distance.", "full_text": "Model-based Bayesian inference of neural activity\nand connectivity from all-optical interrogation of a\n\nneural circuit\n\nLaurence Aitchison\n\nUniversity of Cambridge\nCambridge, CB2 1PZ, UK\n\nlaurence.aitchison@gmail.com\n\nLloyd Russell\n\nUniversity College London\nLondon, WC1E 6BT, UK\nllerussell@gmail.com\n\nAdam Packer\n\nUniversity College London\nLondon, WC1E 6BT, UK\nadampacker@gmail.com\n\nJinyao Yan\n\nJanelia Research Campus\n\nAshburn, VA 20147\n\nyanj11@janelia.hhmi.org\n\nPhilippe Castonguay\n\nJanelia Research Campus\n\nAshburn, VA 20147\n\nph.castonguay@gmail.com\n\nMichael H\u00e4usser\n\nUniversity College London\nLondon, WC1E 6BT, UK\nm.hausser@ucl.ac.uk\n\nSrinivas C. Turaga\n\nJanelia Research Campus\n\nAshburn, VA 20147\n\nturagas@janelia.hhmi.org\n\nAbstract\n\nPopulation activity measurement by calcium imaging can be combined with cellu-\nlar resolution optogenetic activity perturbations to enable the mapping of neural\nconnectivity in vivo. This requires accurate inference of perturbed and unper-\nturbed neural activity from calcium imaging measurements, which are noisy and\nindirect, and can also be contaminated by photostimulation artifacts. We have\ndeveloped a new fully Bayesian approach to jointly inferring spiking activity and\nneural connectivity from in vivo all-optical perturbation experiments. In contrast\nto standard approaches that perform spike inference and analysis in two separate\nmaximum-likelihood phases, our joint model is able to propagate uncertainty in\nspike inference to the inference of connectivity and vice versa. We use the frame-\nwork of variational autoencoders to model spiking activity using discrete latent\nvariables, low-dimensional latent common input, and sparse spike-and-slab gen-\neralized linear coupling between neurons. Additionally, we model two properties\nof the optogenetic perturbation: off-target photostimulation and photostimulation\ntransients. Using this model, we were able to \ufb01t models on 30 minutes of data\nin just 10 minutes. We performed an all-optical circuit mapping experiment in\nprimary visual cortex of the awake mouse, and use our approach to predict neural\nconnectivity between excitatory neurons in layer 2/3. Predicted connectivity is\nsparse and consistent with known correlations with stimulus tuning, spontaneous\ncorrelation and distance.\n\n1\n\nIntroduction\n\nQuantitative mapping of connectivity is an essential prerequisite for understanding the operation\nof neural circuits. Thus far, it has only been possible to perform neural circuit mapping by using\nelectrophysiological [1, 2], or electron-microscopic [3, 4] techniques. In addition to being extremely\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\finvolved, these techniques are dif\ufb01cult or impossible to perform in vivo. But a new generation of\nall-optical techniques enable the simultaneous optical recording and perturbation of neural activity\nwith cellular resolution in vivo [5]. In principle, cellular resolution perturbation experiments can\nenable circuit mapping in vivo, however several challenges exist.\nFirst, while two-photon optogenetics can be used to drive spikes in neurons with cellular resolution,\nthere can be variability in the number of spikes generated from trial to trial and from neuron to neuron.\nSecond, there can be substantial off-target excitation of neurons whose dendrites might pass close to\nthe targeted neurons. Third, there is a transient artifact from the laser pulse used for photostimulation\nwhich contaminates the activity imaging, preventing accurate estimates of changes in neural activity\nat the precise time of the perturbation, when accurate activity estimates are most useful. Fourth, the\nreadout of activity in the stimulated neurons, and their downstream neighbors is a noisy \ufb02ourescence\nmeasurement of the intracellular calcium concentration, which is itself an indirect measure of spiking\nactivity. Fifth, the synaptic input from one neuron is rarely strong enough to generate action potentials\non its own. Thus the optogenetic perturbation of single neurons is unlikely to generate changes in the\nsuprathreshold activity of post-synaptic neurons which can be detected via calcium imaging on every\ntrial.\nHighly sensitive statistical tools are needed to infer neural connectivity in the face of these unique\nchallenges posed by modern all-optical experimental technology. To solve this problem, we develop\na global Bayesian inference strategy, jointly inferring a distribution over spikes and unknown con-\nnections, and thus allowing uncertainty in the spikes to in\ufb02uence the inferred connections and vice\nversa. In the past, such methods have not been used because they were computationally intractable,\nbut they are becoming increasingly possible due to three recent advances: the development of GPU\ncomputing [6], modern automatic differentiation libraries such as Tensor\ufb02ow [7], and recent devel-\nopments in variational autoencoders, including the reparameterization trick [8, 9]. By combining\nthese techniques, we are able to perform inference in a large-scale model of calcium imaging data,\nincluding spike inference, photostimulation, low-dimensional activity, and generalized linear synaptic\nconnectivity.\n\n1.1 Prior work\n\nBayesian models have been proposed to infer connectivity from purely observational neural datasets\n[10, 11], however such approaches do not recover connectivity in the common setting where the\npopulation neural activity is low-rank or driven by external unobserved inputs. Perturbations are\nessential to uncover connectivity in such scenarios, and a combination of electrophysiological readout\nand optogenetic perturbation has been used successfully [12, 13]. The analysis of such data is\nfar simpler than our setting as electrophysiological measurements of the sub-threshold membrane\npotential of a post-synaptic neuron can enable highly accurate detection of strong and weak incoming\nconnections. In contrast, we are concerned with the more challenging setting of noisy calcium\nimaging measurements of suprathreshold post-synaptic spiking activity. Further, we are the \ufb01rst to\naccurately model artifacts associated with 2-photon optogenetic photostimulation and simultaneous\ncalcium imaging, while performing joint inference of spiking neural activity and sparse connectivity.\n\n2 Methods\n\n2.1 Variational Inference\n\nWe seek to perform Bayesian inference, i.e. to compute the posterior over latent variables, z, (e.g.\nweights, spikes) given data, x (i.e. the \ufb02uorescence signal),\nP (x|z) P (z)\n\n(1)\n\nand, for model comparison, we would like to compute the model evidence,\n\n,\n\nP (x)\n\nP (z|x) =\nP (x) =(cid:90) dz P (x|z) P (z) .\n\n2\n\nHowever, the computation of these quantities is intractable, and this intractability has hindered the\napplication of Bayesian techniques to large-scale data analysis, such as calcium imaging. Variational\n\n(2)\n\n\fA Rest of brain\n\nB\n\nStim laser\n\ne(t \u2212 1)\n\nGCaMP only\n\nl(t \u2212 1)\n\ns(t \u2212 1)\n\nl(t)\n\nl(t + 1)\n\nl(t + 2)\n\ne(t)\n\ne(t + 1)\n\ne(t + 2)\n\ns(t)\n\ns(t + 1)\n\ns(t + 2)\n\nGCaMP + opsin\n\nf (t \u2212 1)\n\nf (t)\n\nf (t + 1)\n\nf (t + 2)\n\nFigure 1: An overview of the data and generative model. A. A schematic diagram displaying the experimental\nprotocol. All cells express a GCaMP calcium indicator, which \ufb02uoresces in response to spiking activity. A\nlarge subset of the excitatory cells also express channelrhodopsin, which, in combination with two-photon\nphotostimulation, allows cellular resolution activity perturbations [5]. B. A simpli\ufb01ed generative model, omitting\nunknown weights. The observed \ufb02uorescence signal, f, depends on spikes, s, at past times, and the external\noptogenetic perturbation, e (to account for the small photostimulation transient, which lasts only one or two\nframes). The spikes depend on previous spikes, external optogenetic stimulation, e, and on a low-dimensional\ndynamical system, l, representing the inputs coming from the rest of the brain. C. Results for spike inference\nbased on spontaneous data. Gray gives the original (very noisy) \ufb02uorescence trace, black gives the reconstructed\ndenoised \ufb02uorescence trace, based on inferred spikes, and red gives the inferred probability of spiking. D.\nAverage \ufb02uorescence signal for cells that are directly perturbed (triggered on the perturbation). We see a large\nincrease and slow decay in the \ufb02uorescence signal, driven by spiking activity. The small peaks at 0.5 s intervals\nare photostimulation transients. E. As in C, but for perturbed data. Note the small peaks in the reconstruction\ncoming from the modelled photostimulation transients.\ninference is one technique for circumventing this intractability [8, 9, 14], which, in combination with\nrecent work in deep neural networks (DNNs), has proven extremely effective [8, 9]. In variational\ninference, we create a recognition model/approximate posterior, Q (z|x), intended to approximate\nthe posterior, P (z|x) [14]. This recognition model allows us to write down the evidence lower bound\nobjective (ELBO),\n(3)\nand optimizing this bound allows us to improve the recognition model, to the extent that, if Q (z|x)\nis suf\ufb01ciently \ufb02exible, the bound becomes tight and the recognition model will match the posterior,\nQ (z|x) = P (z|x).\n2.2 Our model\n\nlog P (x) \u2265 L = EQ(z|x) [log P (x, z) \u2212 log Q (z|x)] ,\n\nAt the broadest possible level, our experimental system has known inputs, observed outputs, and\nunknown latent variables. The input is optogenetic stimulation of randomly selected cells (Fig. 1A;\ni.e. we target the cell with a laser, which usually causes it to spike), represented by a binary vector,\net, which is 1 if the cell is directly targeted, and 0 if it is not directly targeted. There are three\nunknown latent variables/parameters over which we infer an approximate posterior. First, there\nis a synaptic weight matrix, Wss, describing the underlying connectivity between cells. Second,\nthere is a low-dimensional latent common input, lt, which represents input from other brain regions,\nand changes slowly over time (Fig. 1B). Third, there is a binary latent, st, representing spiking\nactivity, which depends on previous spiking activity through the synaptic weight matrix, optogenetic\nstimulation and the low-rank latent (Fig. 1B). Finally, we observe spiking activity indirectly through\na \ufb02ourescence signal, ft, which is in essence a noisy convolution of the underlying spikes. As such,\nthe observations and latents can be written,\n\nx = f ,\nz = {l, s, Wss},\n\n3\n\n0.00.50102030t(s)C0.080.1202t(s)D0.00.51.00102030t(s)EObservationsfc(t)Recon.rc(t)SpikesQ(sc(t)=1)\frespectively. Substituting these into the ELBO (Eq. 3), the full variational objective becomes,\n\nL = EQ(s,l,Wss|f ,e) [log P (f , s, l, Wss|e) \u2212 log Q (s, l, Wss|f , e)] ,\n\nwhere we have additionally conditioned everything on the known inputs, e.\n\n(4)\n\n2.3 Generative model\n\nNeglecting initial states, we can factorize the generative model as\n\nP (f , s, l, Wss|e) = P (Wss)(cid:89)t\n\nP (lt|lt\u22121) P (st|st\u22121:0, e, lt, Wss) P (ft|st:0, et) ,\n\n(5)\n\ni.e., we \ufb01rst generate a synaptic weight matrix, Wss, then we generate the latent low-rank states, lt\nbased on their values at the previous time-step, then we generate the spikes based on past spikes, the\nsynaptic weights, optogenetic stimulation, e, and the low-rank latents, and \ufb01nally, we generate the\n\ufb02ourescence signal based on past spiking and optogenetic stimulation. To generate synaptic weights,\nwe assume a sparse prior, where there is some probability p that the weight is generated from a\nzero-mean Gaussian, and there is probability 1 \u2212 p that the weight is zero,\nij , 0, \u03c32(cid:1) ,\n\n(6)\nwhere \u03b4 is the Dirac delta, we set p = 0.1 based on prior information, and learn \u03c32. To generate the\nlow-rank latent states, we use a simple dynamical system,\n\nP(cid:0)W ss\n\n(7)\nwhere Wll is the dynamics matrix, and \u03a3l is a diagonal covariance matrix, representing independent\nGaussian noise. To generate spikes, we use,\n\nij(cid:1) = (1 \u2212 p)\u03b4(cid:0)W ss\nij(cid:1) + pN(cid:0)W ss\nP (lt|lt\u22121) = N(cid:0)lt; Wlllt\u22121, \u03a3l(cid:1) .\n\nwhere \u03c3 is a vectorised sigmoid, \u03c3i (x) = 1/ (1 + e\u2212xi), and the cell\u2019s inputs, ut, are given by,\n\nP (st|st\u22121:0, e, lt, Wss) = Bernoulli (st; \u03c3 (ut))\n\n(8)\n\n(9)\n\nut = Wseet + Wss\n\n\u03bas\nt\u2212t(cid:48)st(cid:48) + Wsllt + bs.\n\nt\u22121(cid:88)t(cid:48)=t\u22124\n\nThe \ufb01rst term represents the drive from optogenetic input, et, (to reiterate, a binary vector representing\nwhether a cell was directly targeted on this timestep), coupled by weights, Wse, representing the\ndegree to which cells surrounding the targeted cell also respond to the optogenetic stimulation.\nNote that Wse is structured (i.e. written down in terms of other parameters), and we discuss this\nstructure later. The second term represents synaptic connectivity: how spikes at previous timesteps,\nst(cid:48) might in\ufb02uence spiking at this timestep, via a rapidly-decaying temporal kernel, \u03bas, and a synaptic\nweight matrix Wss. The third term represents the input from other brain-regions by allowing the\nlow-dimensional latents, lt, to in\ufb02uence spiking activity according to a weight matrix, Wsl. Finally,\nto generate the observed \ufb02ourescence signal from the spiking activity, we use,\n\n(10)\nwhere \u03a3f is a learned, diagonal covariance matrix, representing independent noise in the \ufb02ourescence\nobservations. For computational tractability, the mean \ufb02ourescence signal, or \u201creconstruction\u201d, is\nsimply a convolution of the spikes,\n\nP (ft) = N(cid:0)ft; rt, \u03a3f(cid:1) ,\n\nrt = A\n\n\u03bat\u2212t(cid:48) (cid:12) st(cid:48) + br + Wreet,\n\n(11)\n\nt(cid:88)t(cid:48)=0\n\nwhere (cid:12) represents an entrywise, or Hadamard, product. This expression takes a binary vector\nrepresenting spiking activity, st(cid:48), convolves it with a temporal kernel, \u03ba, representing temporal\ndynamics of \ufb02ourescence responses, then scales it with the diagonal matrix, A, and adds a bias,\nbr. The last term models an artifact in which optogenetic photostimulation, represented by a binary\nvector et describing whether a cell was directly targeted by the stimulation laser on that timestep,\ndirectly affects the imaging system according to a weight matrix Wre. The temporal kernel, \u03bac,t\u2212t(cid:48)\nis a sum of two exponentials unique to each cell,\n\nas is typical in e.g. [15].\n\n\u03bac,t = e\n\n\u2212t/\u03c4 decay\n\nc \u2212 e\n\n\u2212t/\u03c4 rise\nc ,\n\n(12)\n\n4\n\n\f2.4 Recognition model\n\nThe recognition model factorises similarly,\n\nTo approximate the posterior over weights we use,\n\nQ (s, l, Wss|f , e) = Q (Wss) Q (s|f , e) Q (l|f ) .\nij(cid:1) .\nij(cid:1) = (1 \u2212 pij)\u03b4(cid:0)W ss\nQ(cid:0)W ss\n\nij(cid:1) + pijN(cid:0)W ss\n\nij , \u00b5ij, \u03c32\n\n(13)\n\n(14)\n\nwhere pij is the inferred probability that the weight is non-zero, and \u00b5ij and \u03c32\nij are the mean and\nvariance of the inferred distribution over the weight, given that it is non-zero. As a recognition model\nfor spikes, we use a multi-layer perceptron to map from the \ufb02ourescence signal back to an inferred\nprobability of spiking,\n\nQ (s(t)|v(t)) = Bernoulli (s(t); \u03c3 (v(t))) ,\nwhere v(t) depends on the \ufb02uorescence trace, and the optogenetic input,\n\n(15)\n\nv(t) = MLPs (f (t \u2212 T : t + T )) + DeWsee(t) + bs.\n\n(16)\nHere, De is a diagonal matrix scaling the external input, and MLP (f (t \u2212 T : t + T )) is a neural\nnetwork that, for each cell, takes a window of the \ufb02uorescence trace from time t \u2212 T to t + T , (for us,\nT = 100 frames, or about 3 seconds) linearly maps this window onto 20 features, then maps those\n20 features through 2 standard neural-network layers with 20 units and Elu non-linearities [16], and\n\ufb01nally linearly maps to a single value. To generate the low-rank latents, we use the same MLP, but\nallow for a different \ufb01nal linear mapping from 20 features to a single output,\n\nQ (l(t)|f ) = N(cid:0)l(t); W\ufb02MLPl (f (t \u2212 T : t + T )) , \u0393l(cid:1) .\n\nHere, we use a \ufb01xed diagonal covariance, \u0393l, and we use W\ufb02 to reduce the dimensionality of the\nMLP output to the number of latents.\n\n(17)\n\n2.5 Gradient-based optimization of generative and recognition model parameters\n\nWe used the automatic differentiation routines embedded within TensorFlow to differentiate the\nELBO with respect to the parameters of both the generative and recognition models,\n\nL = L(cid:0)\u03c3, Wll, \u03a3l, Wsl, bs, \u03a3f, \u03c4 decay\n\nc\n\n(18)\nwhere the \ufb01nal two variables are de\ufb01ned later. We then used Adam [17] to perform the optimization.\nInstead of using minibatches consisting of multiple short time-windows, we used a single, relatively\nlarge time-window (of 1000 frames, or around 30 s, which minimized any edge-effects at the start or\nend of the time-window.\n\n, \u03c4 rise\n\nc\n\n, br, Wre, pij, \u00b5ij, \u03c32\n\nij, De, W\ufb02, MLP, respi, \u03c3k(cid:1) ,\n\n3 Results\n\n3.1 All-optical circuit mapping experimental protocol\n\nWe used a virus to express GCaMP6s pan-neuronally in layer 2/3 of mouse primary visual cortex (V1),\nand co-expressed C1V1 in excitatory neurons of the same layer. The mouse was awake, head\ufb01xed\nand on a treadmill. As in [5], we used a spatial light modulator to target 2-photon excitation of the\nC1V1 opsin in a subset of neurons, while simultaneously imaging neural activity in the local circuit\nby 2-photon calcium imaging of GCaMP6s. With this setup, we designed an experimental protocol to\nfacilitate discovery of a large portion of the connections within a calcium-imaging \ufb01eld of view. In\nparticular, twice every second we selected \ufb01ve cells at random, stimulated them, observed the activity\nin the rest of the network, and used this information to infer whether the stimulated cells projected to\nany of the other cells in the network (Fig. 1A). The optogenetic perturbation experiment consisted of\n7200 trials and lasted one hour. We also mapped the orientation and direction tuning properties of\nthe imaged neurons, and separately recorded spontaneous neural activity for 40 minutes. Our model\nwas able to infer spikes in spontaneous data (Fig. 1C), and in photostimulation data, was able to both\ninfer spikes and account for photostimulation transients (Fig. 1DE).\n\n5\n\n\fFigure 2: Modeling off-target photostimulation, in which stimulating at one location activates surrounding\ncells. A. The change in average \ufb02uorescence based on 500 ms just before and just after stimulation (\u2206fc) for\nphotostimulation of a target at a speci\ufb01ed distance [5]. B. The modelled distance-dependent activation induced\nby photostimulation. The spatial extent of modelled off-target stimulation is broadly consistent with the raw-data\nin A. Note that as each cell has a different spatial absorption pro\ufb01le and responsiveness, modelled stimulation is\nnot a simple function of distance from the target cell. C. Modelled off-target photostimulation resulting from\nstimulation of an example cell.\n\nFigure 3: Inferred low-rank latent activity. A. Time course of lt for perturbed data. The different lines\ncorrespond to different modes. B. The projection weights from the \ufb01rst latent onto cells, where cells are plotted\naccording to their locations on the imaging plane. C. As B but for the second latent. Note that all projection\nweights are very close to 0, so the points are all gray.\n\n3.2\n\nInferring the extent of off-target photostimulation\n\nSince photostimulation may also directly excite off-target neurons, we explicitly modelled this\nprocess (Fig. 2A). We used a sum of \ufb01ve Gaussians with different scales, \u03c3k, to \ufb02exibly model\ndistance-dependent stimulation,\n\nWse\n\nij = respi\n\n5(cid:88)k=1\n\nexp(cid:2)d2\n\ni (xj)/(cid:0)2\u03c32\nk(cid:1)(cid:3) ,\n\n(19)\n\n(20)\n\nwhere xj describes the x, y position of the \u201ctarget\u201d cell j, and each cell receiving off-target stimulation\nhas its own degree of responsiveness, respi, and a metric, di(xj, yj), describing that cell\u2019s response\nto light stimulation in different spatial locations. The metric allows for stimulation to take on an\nelliptic pattern (given by Pi\u2019s), and have a shifted center (given by \u02c6xi),\n\ni (xj) = (xj \u2212 \u02c6xi)T Pi (xj \u2212 \u02c6xi)\nd2\n\nAfter inference, this model gives a similar spatial distribution of perturbation-triggered activity\n(Fig. 2B). Furthermore, it should be noted that because each cell has its own responsiveness and\nspatial light absorption pro\ufb01le, if we stimulate in one location, a cell\u2019s responsiveness is not a simple\nfunction of distance (Fig. 2BC). Finally, we allow small modi\ufb01cations around this strict spatial pro\ufb01le\nusing a dense weight matrix.\n\n3.3\n\nJoint inference of latent common inputs\n\nOur model was able to jointly infer neural activity, latent common inputs (Fig. 3A) and sparse\nsynaptic connectivity. As expected, we found one critical latent variable describing overall activation\nof all cells (Fig. 3B) [18], and a second, far less important latent (Fig. 3C). Given the considerable\ndifference in magnitude between the impact of these two latents on the system, we can infer that only\none latent variable is required to describe the system effectively. However, further work is needed to\nimplement \ufb02exible yet interpretable low-rank latent variables in this system.\n\n6\n\n0.000.050200400Distance(\u00b5m)\u2206fcA0.00.51.00200400Distance(\u00b5m)Modelledstim.B02004000200400x(\u00b5m)y(\u00b5m)C0.00.20.4Stim.DirectIndirect-4-20240102030Time(s)Norm.Act.A02004000200400x\u00b5my\u00b5mB02004000200400x\u00b5my\u00b5mC-0.20.00.2\fFigure 4: Performance of various models for spontaneous (A) and perturbed (B) data. We consider \u201cSparse\nGLM + LR\u201d (the full model), \u201cDense GLM + LR\u201d (the full model, but with with dense GLM weights), \u201cLR\u201d (a\nmodel with no GLM, only the low-rank component), \u201cIndependent\u201d (a model with no higher-level structure) and\n\ufb01nally \u201cSeparate\u201d (the spikes are extracted using the independent model, then the full model is \ufb01tted to those\nspikes).\n\n3.4 The model recovers known properties of biological activity\n\nThe ELBO forms only a lower bound on the model evidence, so it is possible for models to appear\nbetter/worse simply because of changes in the tightness of the bound. As such, it is important to\ncheck that the learned model recovers known properties of biological connectivity. We thus compared\na group of models, including the full model, a model with dense (as opposed to the usual sparse)\nsynaptic connectivity, a model with only low-rank latents, and a simple model with no higher-level\nstructure, for both spontaneous (Fig. 4A) and perturbed (Fig. 4B) data. We found that the sparse\nGLM offered a dramatic improvement over the dense GLM, which in turn offered little bene\ufb01t over a\nmodel with only low-rank activity. (Note the reported values are ELBO per cell per timestep, so must\nbe multiplied by 348 cells and around 100,000 time-steps to obtain the raw-ELBO values, which are\nthen highly signi\ufb01cant). Thus, the ELBO is able to recover features of real biological connectivity\n(biological connectivity is also sparse [1, 2]).\n\n3.5\n\nJoint inference is better than a \u201cpipeline\u201d\n\nFurthermore, we compared our joint approach, where we jointly infer spikes, low-rank activity, and\nweights, to a more standard \u201cpipeline\u201d in which one infer spikes using a simple Bayesian model\nlacking low-rank activity and GLM connectivity, then infer the low-rank activity and weights based\non those spikes, similar to [11]. We found that performing inference jointly \u2014 allowing information\nabout low-rank activity, GLM connectivity and external stimulation to in\ufb02uence spike inferences\ngreatly improved the quality of our inferences for both spontaneous (Fig. 4A) and perturbed data\n(Fig. 4B). This improvement is entirely expected within the framework of variational inference, as\nthe \u201cpipeline\u201d has two objectives, one for spike extraction, and another for the high-level generative\nmodel, and without the single, uni\ufb01ed objective, it is even possible for the ELBO to decrease with\nmore training (Fig. 4B).\n\n3.6 The inferred sparse weights are consistent with known properties of neural circuits\n\nNext, we plotted the synaptic \u201cGLM\u201d weights for spontaneous (Fig. 5A\u2013D) and perturbed (Fig. 5E\u2013\nH) data. These weights are negatively correlated with distance (p < 0.0001; Fig. 5BF) suggesting\nthat short-range connections are predominantly excitatory (though this may be confounded by cells\noverlapping, such that activity in one cell is recorded as activity in a different cell). The short range\nexcitatory connections can be seen as the diagonal red bands in Fig. 5AE as the neurons are roughly\nsorted by proximity, with the \ufb01rst 248 being perturbed, and the remainder never being perturbed. The\nweights are strongly correlated with spontaneous correlation (p < 0.0001; Fig. 5CG), as measured\nusing raw \ufb02uorescence traces; a result which is expected, given that the model should use these\nweights to account for some aspects of the spontaneous correlation. Finally, the weights are positively\ncorrelated with signal correlation (p < 0.0001; Fig. 5DH), as measured using 8 drifting gratings, a\n\ufb01nding that is consistent with previous results [1, 2].\n\n7\n\n1.001.021.041.06050100EpochTestELBOA1.041.061.081.10050100EpochTestELBOBModelSparseGLM+LRDenseGLM+LRLRIndependentSeparate\fFigure 5: Inferred connection weights. A. Weight matrix inferred from spontaneous data (in particular, the\nexpected value of the weight, under the recognition model, with red representing positive connectivity, and\nblue representing negative connectivity), plotted against distance (B), spontaneous correlation (C), and signal\ncorrelation (D). E\u2013H. As A\u2013D for perturbed data.\n3.7 Perturbed data supports stronger inferences than spontaneous data\n\nConsistent with our expectations, we found that perturbations considerably increased the number\nof discovered connections. Our spike-and-slab posterior over weights can be interpreted to yield\nan estimated con\ufb01dence probability that a given connection exists. We can use this probability to\nestimate the number of highly con\ufb01dent connections. In particular, we were able to \ufb01nd 50% more\nconnections in the perturbed dataset than the spontaneous dataset, with a greater than 0.95 probability\n(1940 vs 1204); twice times as many highly con\ufb01dent connections with probability 0.99 or higher\n(1107 vs 535); and \ufb01ve times as many with the probability 0.999 or higher (527 vs 101). These results\nhighlight the importance of perturbations to uncovering connections which would otherwise have\nbeen missed when analyzing purely observational datasets.\n\n3.8 Simulated data\n\nUsing the above methods, it is dif\ufb01cult to assess the effectiveness of the model because we do not\nhave ground truth. While the ideal approach would be to obtain ground-truth data experimentally,\nthis is very dif\ufb01cult in practice. An alternative approach is thus to simulate data from the generative\nmodel, in which case the ground-truth weights are simply those used to perform the initial simulation.\nTo perform a quantitative comparison, we used the correlation between a binary variable representing\nwhether the true weights were greater than 0.1 (because it is extremely dif\ufb01cult to distinguish between\nzero, and very small but non-zero weights, and), and the inferred probability of the weight being\ngreater than 0.1, based on a combination of the inferences over the discrete and continuous component.\nWe chose a threshold of 0.1 because it was relatively small in comparison with the standard-deviation\nfor the non-zero weights of around 0.4. We started by trying to replicate our experiments as closely\nas possible (Fig. 6), i.e. we inferred all the parameters, noise-levels, timescales, priors on weights\netc. based on real data, and resampled the weight matrix based on the inferred prior over weights.\nWe then considered repeating the same stimulation pattern 50 times (frozen), as against using 50\ntimes more entirely random simulated data (unfrozen), and found that, as expected, using random\nstimulation patterns is more effective. As computational constraints prevent us from increasing the\ndata further, we considered reducing the noise by a factor of 40 (low-noise), and then additionally\nreduced the timescales of the calcium transients by a factor of 10 (fast decay) which improved the\ncorrelation to 0.85.\nThese results indicate the model is functioning correctly, but raise issues for future work. In particular,\nthe considerable improvement achieved by reducing the timescales indicates that careful modeling of\nthe calcium transient is essential, and that faster calcium indicators have the potential to dramatically\nimprove the ultimate accuracy of weight inferences.\n\n8\n\n02000200PreindexPostindexA010200400DistanceWeightB010.00.5Spont.corr.WeightC01-0.50.00.5Signalcorr.WeightD02000200PreindexPostindexE010200400DistanceWeightF010.00.5Spont.corr.WeightG01-0.50.00.5Signalcorr.WeightH\fn\no\ni\nt\na\nl\ne\nr\nr\no\nc\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\nraw\n\nfrozen\n\nunfrozen\n\nunfrozen\nlow noise\n\nunfrozen\nlow noise\nfast decay\n\nFigure 6: Effectiveness of various variants of the model at \ufb01nding the underlying ground-truth weights. The\ncorrelation compares a binary variable reporting whether the ground-truth weight is above or below 0.1 with a\ncontinuous measure reporting the inferred probability of the weight being larger than 0.1. The \ufb01rst condition, raw,\nuses simulated data that matches the real data as closely as possible including the same length of photostimulated\nand spontaneous data as we obtained, and matching the parameters such as the noise level to those used in data.\nThe frozen/unfrozen conditions represent using 50 times more data, where, for \u201cfrozen\u201d condition, we repeat the\nsame optogenetic stimulation 50 times, and for the \u201cunfrozen\u201d condition we always use fresh, randomly chosen\nstimulation patterns. The \ufb01nal pair of conditions are photo stimulated data, with 50 times more unfrozen data.\nFor the \u201clow noise\u201d condition we reduce the noise level by a factor of 40, and for the \u201cfast decay\u201d condition, we\nadditionally reduce the calcium decay timeconstants by a factor of 10.\n4 Discussion\n\nWe applied modern variational autoencoder and GPU computing techniques to create a fully Bayesian\nmodel of calcium imaging and perturbation data. This model simultaneously and ef\ufb01ciently extracted\nBayesian approximate posteriors over spikes, the extent of two optogenetic perturbation artifacts, low-\nrank activity, and sparse synaptic (GLM) weights. This is the \ufb01rst model designed for perturbation\ndata, and we are not aware of any other model which is able to extract posteriors over such a wide\nrange of parameters with such ef\ufb01ciency.\nOur inferred weights are consistent with studies using electrophysiological means to measure connec-\ntivity in mouse V1 [1, 2]. Further, model selection gives biologically expected results, identifying\nsparseness, suggesting that these models are identifying biologically relevant structure in the data.\nHowever, simply identifying broad properties such as sparseness does not imply that our inferences\nabout individual weights are correct: for this, we need validation using complementary experimental\napproaches.\nFinally, we have shown that recent developments in variational autoencoders make it possible to\nperform inference in \u201cideal\u201d models: large-scale models describing noisy data-generating processes\nand complex biological phenomena simultaneously.\n\nReferences\n[1] H. Ko, S. B. Hofer, B. Pichler, K. A. Buchanan, P. J. Sj\u00f6str\u00f6m, and T. D. Mrsic-Flogel,\n\u201cFunctional speci\ufb01city of local synaptic connections in neocortical networks,\u201d Nature, vol. 473,\nno. 7345, pp. 87\u201391, 2011.\n\n[2] L. Cossell, M. F. Iacaruso, D. R. Muir, R. Houlton, E. N. Sader, H. Ko, S. B. Hofer, and T. D.\nMrsic-Flogel, \u201cFunctional organization of excitatory synaptic strength in primary visual cortex,\u201d\nNature, vol. 518, no. 7539, pp. 399\u2013403, 2015.\n\n[3] S. ya Takemura, A. Bharioke, Z. Lu, A. Nern, S. Vitaladevuni, P. K. Rivlin, W. T. Katz, D. J.\nOlbris, S. M. Plaza, P. Winston, T. Zhao, J. A. Horne, R. D. Fetter, S. Takemura, K. Blazek, L.-A.\nChang, O. Ogundeyi, M. A. Saunders, V. Shapiro, C. Sigmund, G. M. Rubin, L. K. Scheffer,\nI. A. Meinertzhagen, and D. B. Chklovskii, \u201cA visual motion detection circuit suggested by\ndrosophila connectomics,\u201d Nature, vol. 500, pp. 175\u2013181, aug 2013.\n\n[4] W.-C. A. Lee, V. Bonin, M. Reed, B. J. Graham, G. Hood, K. Glattfelder, and R. C. Reid,\n\u201cAnatomy and function of an excitatory network in the visual cortex,\u201d Nature, vol. 532, no. 7599,\npp. 370\u2013374, 2016.\n\n9\n\n\f[5] A. M. Packer, L. E. Russell, H. W. Dalgleish, and M. H\u00e4usser, \u201cSimultaneous all-optical\nmanipulation and recording of neural circuit activity with cellular resolution in vivo,\u201d Nature\nMethods, vol. 12, no. 2, pp. 140\u2013146, 2015.\n\n[6] R. Raina, A. Madhavan, and A. Y. Ng, \u201cLarge-scale deep unsupervised learning using graphics\nprocessors,\u201d in Proceedings of the 26th annual international conference on machine learning,\npp. 873\u2013880, ACM, 2009.\n\n[7] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,\nJ. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze-\nfowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man\u00e9, R. Monga, S. Moore, D. Murray, C. Olah,\nM. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va-\nsudevan, F. Vi\u00e9gas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,\n\u201cTensorFlow: Large-scale machine learning on heterogeneous systems,\u201d 2015. Software avail-\nable from tensor\ufb02ow.org.\n\n[8] D. P. Kingma and M. Welling, \u201cAuto-encoding variational bayes,\u201d ICLR, 2014.\n\n[9] D. J. Rezende, S. Mohamed, and D. Wierstra, \u201cStochastic backpropagation and approximate\n\ninference in deep generative models,\u201d ICML, 2014.\n\n[10] Y. Mishchenko, J. T. Vogelstein, and L. Paninski, \u201cA Bayesian approach for inferring neuronal\nconnectivity from calcium \ufb02uorescent imaging data,\u201d The Annals of Applied Statistics, vol. 5,\npp. 1229\u20131261, June 2011.\n\n[11] D. Soudry, S. Keshri, P. Stinson, M.-H. Oh, G. Iyengar, and L. Paninski, \u201cEf\ufb01cient \"shotgun\"\ninference of neural connectivity from highly sub-sampled activity data,\u201d PLoS computational\nbiology, vol. 11, p. e1004464, Oct. 2015.\n\n[12] A. M. Packer, D. S. Peterka, J. J. Hirtz, R. Prakash, K. Deisseroth, and R. Yuste, \u201cTwo-photon\noptogenetics of dendritic spines and neural circuits,\u201d Nat Methods, vol. 9, pp. 1202\u2013U103, Dec.\n2012.\n\n[13] B. Shababo, B. Paige, A. Pakman, and L. Paninski, \u201cBayesian inference and online experimental\ndesign for mapping neural microcircuits,\u201d in Advances in Neural Information Processing\nSystems 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger,\neds.), pp. 1304\u20131312, Curran Associates, Inc., 2013.\n\n[14] M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul, \u201cAn introduction to variational\n\nmethods for graphical models,\u201d Machine Learning, vol. 37, no. 2, pp. 183\u2013233, 1999.\n\n[15] J. T. Vogelstein, A. M. Packer, T. A. Machado, T. Sippy, B. Babadi, R. Yuste, and L. Paninski,\n\u201cFast nonnegative deconvolution for spike train inference from population calcium imaging,\u201d\nJournal of neurophysiology, vol. 104, no. 6, pp. 3691\u20133704, 2010.\n\n[16] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, \u201cFast and accurate deep network learning by\n\nexponential linear units (elus),\u201d arXiv preprint arXiv:1511.07289, 2015.\n\n[17] D. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d ICLR, 2015.\n\n[18] M. Okun, N. A. Steinmetz, L. Cossell, M. F. Iacaruso, H. Ko, P. Barth\u00f3, T. Moore, S. B. Hofer,\nT. D. Mrsic-Flogel, M. Carandini, et al., \u201cDiverse coupling of neurons to populations in sensory\ncortex,\u201d Nature, vol. 521, no. 7553, pp. 511\u2013515, 2015.\n\n[19] A. Mnih and D. J. Rezende, \u201cVariational inference for Monte Carlo objectives,\u201d ICML, 2016.\n\n[20] C. J. Maddison, A. Mnih, and Y. W. Teh, \u201cThe concrete distribution: A continuous relaxation of\n\ndiscrete random variables,\u201d arXiv preprint arXiv:1611.00712, 2016.\n\n[21] E. Jang, S. Gu, and B. Poole, \u201cCategorical reparameterization with Gumbel-Softmax,\u201d arXiv\n\npreprint arXiv:1611.01144, 2016.\n\n10\n\n\f", "award": [], "sourceid": 1980, "authors": [{"given_name": "Laurence", "family_name": "Aitchison", "institution": "University of Cambridge"}, {"given_name": "Lloyd", "family_name": "Russell", "institution": "University College London"}, {"given_name": "Adam", "family_name": "Packer", "institution": "University College London"}, {"given_name": "Jinyao", "family_name": "Yan", "institution": "Janelia Research Campus"}, {"given_name": "Philippe", "family_name": "Castonguay", "institution": "University of Montreal"}, {"given_name": "Michael", "family_name": "Hausser", "institution": "UCL"}, {"given_name": "Srinivas", "family_name": "Turaga", "institution": "Janelia Research Campus, Howard Hughes Medical Institute"}]}