{"title": "Flexible statistical inference for mechanistic models of neural dynamics", "book": "Advances in Neural Information Processing Systems", "page_first": 1289, "page_last": 1299, "abstract": "Mechanistic models of single-neuron dynamics have been extensively studied in computational neuroscience. However, identifying which models can quantitatively reproduce empirically measured data has been challenging. We propose to overcome this limitation by using likelihood-free inference approaches (also known as Approximate Bayesian Computation, ABC) to perform full Bayesian inference on single-neuron models. Our approach builds on recent advances in ABC by learning a neural network which maps features of the observed data to the posterior distribution over parameters. We learn a Bayesian mixture-density network approximating the posterior over multiple rounds of adaptively chosen simulations. Furthermore, we propose an efficient approach for handling missing features and parameter settings for which the simulator fails, as well as a strategy for automatically learning relevant features using recurrent neural networks. On synthetic data, our approach efficiently estimates posterior distributions and recovers ground-truth parameters. On in-vitro recordings of membrane voltages, we recover multivariate posteriors over biophysical parameters, which yield model-predicted voltage traces that accurately match empirical data. Our approach will enable neuroscientists to perform Bayesian inference on complex neuron models without having to design model-specific algorithms, closing the gap between mechanistic and statistical approaches to single-neuron modelling.", "full_text": "Flexible statistical inference for mechanistic models of\n\nneural dynamics\n\nJan-Matthis Lueckmann\u2217 1, Pedro J. Gon\u00e7alves\u2217 1, Giacomo Bassetto1,\n\nKaan \u00d6cal1,2, Marcel Nonnenmacher1, Jakob H. Macke\u20201\n\n1 research center caesar, an associate of the Max Planck Society, Bonn, Germany\n\n2 Mathematical Institute, University of Bonn, Bonn, Germany\n\n{jan-matthis.lueckmann, pedro.goncalves, giacomo.bassetto,\nkaan.oecal, marcel.nonnenmacher, jakob.macke}@caesar.de\n\nAbstract\n\nMechanistic models of single-neuron dynamics have been extensively studied in\ncomputational neuroscience. However, identifying which models can quantitatively\nreproduce empirically measured data has been challenging. We propose to over-\ncome this limitation by using likelihood-free inference approaches (also known\nas Approximate Bayesian Computation, ABC) to perform full Bayesian inference\non single-neuron models. Our approach builds on recent advances in ABC by\nlearning a neural network which maps features of the observed data to the poste-\nrior distribution over parameters. We learn a Bayesian mixture-density network\napproximating the posterior over multiple rounds of adaptively chosen simulations.\nFurthermore, we propose an ef\ufb01cient approach for handling missing features and\nparameter settings for which the simulator fails, as well as a strategy for automati-\ncally learning relevant features using recurrent neural networks. On synthetic data,\nour approach ef\ufb01ciently estimates posterior distributions and recovers ground-truth\nparameters. On in-vitro recordings of membrane voltages, we recover multivariate\nposteriors over biophysical parameters, which yield model-predicted voltage traces\nthat accurately match empirical data. Our approach will enable neuroscientists to\nperform Bayesian inference on complex neuron models without having to design\nmodel-speci\ufb01c algorithms, closing the gap between mechanistic and statistical\napproaches to single-neuron modelling.\n\nIntroduction\n\n1\nBiophysical models of neuronal dynamics are of central importance for understanding the mechanisms\nby which neural circuits process information and control behaviour. However, identifying which\nmodels of neural dynamics can (or cannot) reproduce electrophysiological or imaging measurements\nof neural activity has been a major challenge [1]. In particular, many models of interest \u2013 such as\nmulti-compartment biophysical models [2], networks of spiking neurons [3] or detailed simulations\nof brain activity [4] \u2013 have intractable or computationally expensive likelihoods, and statistical\ninference has only been possible in selected cases and using model-speci\ufb01c algorithms [5, 6, 7]. Many\nmodels are de\ufb01ned implicitly through simulators, i.e. a set of dynamical equations and possibly a\ndescription of sources of stochasticity [1]. In addition, it is often of interest to identify models which\ncan reproduce particular features in the data, e.g. a \ufb01ring rate or response latency, rather than the full\ntemporal structure of a neural recording.\n\n\u2217Equal contribution\n\u2020Current primary af\ufb01liation: Centre for Cognitive Science, Technical University Darmstadt\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fFigure 1: Flexible likelihood-free inference for models of neural dynamics. A. We want to\n\ufb02exibly and ef\ufb01ciently infer the posterior over model parameters given observed data, on a wide\nrange of models of neural dynamics. B. Our method approximates the true posterior on \u03b8 around the\nobserved data xo by performing density estimation on data simulated using a proposal prior. C. We\ntrain a Bayesian mixture-density network (MDN) for posterior density estimation.\n\nIn the absence of likelihoods, the standard approach in neuroscience has been to use heuristic\nparameter-\ufb01tting methods [2, 8, 9]: distance measures are de\ufb01ned on multiple features of interest,\nand brute-force search [10, 11] or evolutionary algorithms [2, 9, 12, 13] (neither of which scales to\nhigh-dimensional parameter spaces) are used to minimise the distances between observed and model-\nderived features. As it is dif\ufb01cult to trade off distances between different features, the state-of-the-art\nmethods optimise multiple objectives and leave the \ufb01nal choice of a model to the user [2, 9]. As\nthis approach is not based on statistical inference, it does not provide estimates of the full posterior\ndistribution \u2013 thus, while this approach has been of great importance for identifying \u2018best \ufb01tting\u2019\nparameters, it does not allow one to identify the full space of parameters that are consistent with data\nand prior knowledge, or to incrementally re\ufb01ne and reject models.\nBayesian inference for likelihood-free simulator models, also known as Approximate Bayesian\nComputation [14, 15, 16], provides an attractive framework for overcoming these limitations: like\nparameter-\ufb01tting approaches in neuroscience [2, 8, 9], it is based on comparing summary features\nbetween simulated and empirical data. However, unlike them, it provides a principled framework for\nfull Bayesian inference and can be used to determine how to trade off goodness-of-\ufb01t across summary\nstatistics. However, to the best of our knowledge, this potential has not been realised yet, and ABC\napproaches are not used for linking mechanistic models of neural dynamics with experimental data\n(for an exception, see [17]). Here, we propose to use ABC methods for statistical inference of\nmechanistic models of single neurons. We argue that ABC approaches based on conditional density\nestimation [18, 19] are particularly suited for neuroscience applications.\nWe present a novel method (Sequential Neural Posterior Estimation, SNPE) in which we sequentially\ntrain a mixture-density network across multiple rounds of adaptively chosen simulations1. Our\napproach is directly inspired by prior work [18, 19], but overcomes critical limitations: \ufb01rst, a\n\ufb02exible mixture-density network trained with an importance-weighted loss function enables us to\nuse complex proposal distributions and approximate complex posteriors. Second, we represent a full\nposterior over network parameters of the density estimator (i.e. a \u201cposterior on posterior-parameters\u201d)\nwhich allows us to take uncertainty into account when adjusting weights. This enables us to perform\n\u2018continual learning\u2019, i.e. to effectively utilise all simulations without explicitly having to store them.\nThird, we introduce an approach for ef\ufb01ciently dealing with simulations that return missing values,\nor which break altogether \u2013 a common situation in neuroscience and many other applications of\nsimulator-based models \u2013 by learning a model that predicts which parameters are likely to lead to\nbreaking simulations, and using this knowledge to modify the proposal distribution. We demonstrate\nthe practical effectiveness and importance of these innovations on biophysical models of single\nneurons, on simulated and neurophysiological data. Finally, we show how recurrent neural networks\ncan be used to directly learn relevant features from time-series data.\n\n1Code available at https://github.com/mackelab/del\ufb01\n\n2\n\n\u03b8proposal priorpriortrue posteriorposteriorxox\u03b11\u03b1K\u2026\u03bc1\u03bcK\u2026\u03bb1\u03bbK\u2026f1(s)f2(s)fI(s)\u2026h2h3hH\u2026h1MixtureweightsMeansPrecisionfactorsforward passfeature 1\u2026ABC\f1.1 Related work using likelihood-free inference for simulator models\nGiven experimental data xo (e.g. intracellular voltage measurements of a single neuron, or extra-\ncellular recordings from a neural population), a model p(x|\u03b8) parameterised by \u03b8 (e.g. biophysical\nparameters, or connectivity strengths in a network simulation) and a prior distribution p(\u03b8), our goal\nis to perform statistical inference, i.e. to \ufb01nd the posterior distribution \u02c6p(\u03b8|x = xo). We assume that\nthe model p(x|\u03b8) is only de\ufb01ned through a simulator [14, 15]: we can generate samples xn \u223c x|\u03b8\nfrom it, but not evaluate p(x|\u03b8) (or its gradients) explicitly. In neural modelling, many models are\nde\ufb01ned through speci\ufb01cation of a dynamical system with external or intrinsic noise sources or even\nthrough a black-box simulator (e.g. using the NEURON software [20]).\nIn addition, and in line with parameter-\ufb01tting approaches in neuroscience and most ABC techniques\n[14, 15, 21], we are often interested in capturing summary statistics of the experimental data (e.g.\n\ufb01ring rate, spike-latency, resting potential of a neuron). Therefore, we can think of x as resulting\nfrom applying a feature function f to the raw simulator output s, x = f (s), with dim(x) (cid:28) dim(s).\nClassical ABC algorithms simulate from multiple parameters, and reject parameter sets which yield\ndata that are not within a speci\ufb01ed distance from the empirically observed features. In their basic\nform, proposals are drawn from the prior (\u2018rejection-ABC\u2019 [22]). More ef\ufb01cient variants make\nuse of a Markov-Chain Monte-Carlo [23, 24] or Sequential Monte-Carlo (SMC) samplers [25, 26].\nSampling-based ABC approaches require the design of a distance metric on summary features, as\nwell as a rejection criterion (\u03b5), and are exact only in the limit of small \u03b5 (i.e. many rejections) [27],\nimplying strong trade-offs between accuracy and scalability. In SMC-ABC, importance sampling is\nused to sequentially sample from more accurate posteriors while \u03b5 is gradually decreased.\nSynthetic-likelihood methods [28, 21, 29] approximate the likelihood p(x|\u03b8) using multivariate\nGaussians \ufb01tted to repeated simulations given \u03b8 (see [30, 31] for generalisations). While the\nGaussianity assumption is often motivated by the central limit theorem, distributions over features can\nin practice be complex and highly non-Gaussian [32]. For example, neural simulations sometimes\nresult in systematically missing features (e.g. spike latency is unde\ufb01ned if there are no spikes), or\ndiverging \ufb01ring rates.\nFinally, methods originating from regression correction [33, 18, 19] simulate multiple data xn from\ndifferent \u03b8n sampled from a proposal distribution \u02dcp(\u03b8), and construct a conditional density estimate\nq(\u03b8|x) by performing a regression from simulated data xn to \u03b8n. Evaluating this density model at\nthe observed data xo, q(\u03b8|xo) yields an estimate of the posterior distribution. These approaches do\nnot require parametric assumptions on likelihoods or the choice of a distance function and a tolerance\n(\u03b5) on features. Two approaches are used for correcting the mismatch between prior and proposal\ndistributions: Blum and Fran\u00e7ois [18] proposed the importance weights p(\u03b8)/\u02dcp(\u03b8), but restricted\nthemselves to proposals which were truncated priors (i.e. all importance weights were 0 or 1), and did\nnot sequentially optimise proposals over multiple rounds. Papamakarios and Murray [19] recently\nused stochastic variational inference to optimise the parameters of a mixture-density network, and a\npost-hoc division step to correct for the effect of the proposal distribution. While highly effective in\nsome cases, this closed-form correction step can be numerically unstable and is restricted to Gaussian\nand uniform proposals, limiting both the robustness and \ufb02exibility of this approach. SNPE builds on\nthese approaches, but overcomes their limitations by introducing four innovations: a highly \ufb02exible\nproposal distribution parameterised as a mixture-density network, a Bayesian approach for continual\nlearning from multiple rounds of simulations, and a classi\ufb01er for predicting which parameters will\nresult in aborted simulations or missing features. Fourth, we show how this approach, when applied\nto time-series data of single-neuron activity, can automatically learn summary features from data.\n2 Methods\n2.1 Sequential Neural Posterior Estimation for likelihood-free inference\nIn SNPE, our goal is to learn the parameters \u03c6 of a posterior model q\u03c6(\u03b8|x = f (s)) which, when\nevaluated at xo, approximates the true posterior p(\u03b8|xo) \u2248 q\u03c6(\u03b8|x = xo). Given a prior p(\u03b8), a\nproposal prior \u02dcp(\u03b8), pairs of samples (\u03b8n, xn) generated from the proposal prior and the simulator,\nand a calibration kernel K\u03c4 , the posterior model can be trained by minimising the importance-\nweighted log-loss\n\n(cid:88)\n\nn\n\nL(\u03c6) = \u2212 1\nN\n\np(\u03b8n)\n\u02dcp(\u03b8n)\n\nK\u03c4 (xn, xo) log q\u03c6(\u03b8n|xn),\n\n(1)\n\n3\n\n\fWe sequentially optimise the density estimator q\u03c6(\u03b8|x) =(cid:80)\n\nas is shown by extending the argument in [19] with importance-weights p(\u03b8n)/\u02dcp(\u03b8n) and a kernel\nK\u03c4 in Appendix A.\nSampling from a proposal prior can be much more effective than sampling from the prior. By\nincluding the importance weights in the loss, the analytical correction step of [19] (i.e. division by\nthe proposal prior) becomes unnecessary: SNPE directly estimates the posterior density rather than\na conditional density that is reweighted post-hoc. The analytical step of [19] has the advantage of\nside-stepping the additional variance brought about by importance-weights, but has the disadvantages\nof (1) being restricted to Gaussian proposals, and (2) the division being unstable if the proposal prior\nhas higher precision than the estimated conditional density.\nThe calibration kernel K\u03c4 (x, xo) can be used to calibrate the loss function by focusing it on simulated\ndata points x which are close to xo [18]. Calibration kernels K\u03c4 (x, xo) are to be chosen such that\nK\u03c4 (xo, xo) = 1 and that K\u03c4 decreases with increasing distance (cid:107)x \u2212 xo(cid:107), given a bandwidth \u03c4 2.\nHere, we only used calibration kernels to exclude bad simulations by assigning them kernel value\nzero. An additional use of calibration kernels would be to limit the accuracy of the posterior density\nestimation to a region near xo. Choice of the bandwidth implies a bias-variance trade-off [18]. For\nthe problems we consider here, we assumed our posterior model q\u03c6(\u03b8|x) based on a multi-layer\nneural network to be suf\ufb01ciently \ufb02exible, such that limiting bandwidth was not necessary.\nk \u03b1kN (\u03b8|\u00b5k, \u03a3k) by training a mixture-\ndensity network (MDN) [19] with parameters \u03c6 over multiple \u2018rounds\u2019 r with adaptively chosen\nproposal priors \u02dcp(r)(\u03b8) (see Fig. 1). We initialise the proposal prior at the prior, \u02dcp(1)(\u03b8) = p(\u03b8),\nand subsequently take the posterior of the previous round as the next proposal prior (Appendix B).\nOur approach is not limited to Gaussian proposals, and in particular can utilise multi-modal and\nheavy-tailed proposal distributions.\n2.2 Training the posterior model with stochastic variational inference\nTo make ef\ufb01cient use of simulation time, we want the posterior network q\u03c6(\u03b8|x) to use all simulations,\nincluding ones from previous rounds. For computational and memory ef\ufb01ciency, it is desirable to\navoid having to store all old samples, or having to train a new model at each round. To achieve this\ngoal, we perform Bayesian inference on the weights w of the MDN across rounds. We approximate\nthe distribution over weights as independent Gaussians [34, 35]. Note that the parameters \u03c6 of this\nBayesian MDN are are means and standard deviations per each weight, i.e., \u03c6 = {\u03c6m, \u03c6s}. As an\nextension to the approach of [19], rather than assuming a zero-centred prior over weights, we use\nthe posterior over weights of the previous round, \u03c0\u03c6(r\u22121) (w), as a prior for the next round. Using\nstochastic variational inference, in each round, we optimise the modi\ufb01ed loss\n\n(cid:88)\n\nn\n\nDKL\n\nL(\u03c6(r)) = \u2212 1\nN\n\n+\n\n1\nN\n\nK\u03c4 (xn, xo)(cid:10) log qw(\u03b8n|xn)(cid:11)\n\np(\u03b8n)\n\u02dcp(r)(\u03b8n)\n\n(cid:0)\u03c0\u03c6(r)(w)||\u03c0\u03c6(r\u22121) (w)(cid:1) .\n\n\u03c0\n\n\u03c6(r) (w)\n\n(2)\n\nHere, the distributions \u03c0(w) are approximated by multivariate normals with diagonal covariance. The\ncontinuity penalty ensures that MDN parameters that are already well constrained by previous rounds\nare less likely to be updated than parameters with large uncertainty (see Appendix C). In practice,\ngradients of the expectation over networks are approximated using the local reparameterisation trick\n[36].\n2.3 Dealing with bad simulations and bad features, and learning features from time series\nBad simulations: Simulator-based models, and single-neuron models in particular, frequently\ngenerate nonsensical data (which we name \u2018bad simulations\u2019), especially in early rounds in which the\nrelevant region of parameter space has not yet been found. For example, models of neural dynamics\ncan easily run into self-excitation loops with diverging \ufb01ring rates [37] (Fig. 4A). We introduce\na feature b(s) = 1 to indicate that s and x correspond to a bad simulation. We set K(xn, xo) = 0\n\n2While we did not investigate this here, an attractive idea would be to base the kernel of the dis-\ntance between xn and xo on the divergence between the associated posteriors, e.g. K\u03c4 (xn, xo) =\nexp(\u22121/\u03c4 DKL(q(r\u22121)(\u03b8|xn)||q(r\u22121)(\u03b8|xo))) \u2013 in this case, two data would be regarded as similar if the\ncurrent estimation of the density network assigns similar posterior distributions to them, which is a natural\nmeasure of similarity in this context.\n\n4\n\n\fwhenever b(xn) = 1 since the density estimator should not spend resources on approximating the\nposterior for bad data. With this choice of calibration kernel, bad simulations are ignored when\nupdating the posterior model \u2013 however, this results in inef\ufb01cient use of simulations.\nWe propose to learn a model \u02c6g : \u03b8 \u2192 [0, 1] to predict the probability that a simulation from \u03b8 will\nbreak. While any probabilistic classi\ufb01er could be used, we train a binary-output neural network with\nlog-loss on (\u03b8n, b(sn)). For each proposed \u03b8, we reject \u03b8 with probability \u02c6g(\u03b8), and do not carry out\nthe expensive simulation3. The rejections could be incorporated into the importance weights (which\nwould require estimating the corresponding partition function, or assuming it to be constant across\nrounds), but as these rejections do not depend on the data xo, we interpret them as modifying the\nprior: from an initially speci\ufb01ed prior p(\u03b8), we obtain a modi\ufb01ed prior excluding those parameters\nwhich likely will lead to nonsensical simulations. Therefore, the predictive model \u02c6g(\u03b8) does not only\nlead to more ef\ufb01cient inference (especially in strongly under-constrained scenarios), but is also useful\nin identifying an effective prior \u2013 the space of parameters deemed plausible a priori intersected with\nthe space of parameters for which the simulator is well-behaved.\nBad features:\nIt is frequently observed that individual features of interest for \ufb01tting single-neuron\nmodels cannot be evaluated: for example, the spike latency cannot be evaluated if a simulation\ndoes not generate spikes, but the fact that this feature is missing might provide valuable information\n(Fig. 4C). SNPE can be extended to handle \u2018bad features\u2019 by using a carefully designed posterior\nnetwork. For each feature fi(s), we introduce a binary feature mi(s) which indicates whether fi\nis missing. We parameterise the input layer of the posterior network with multiplicative terms of\nthe form hi(s) = fi(s) \u00b7 (1 \u2212 mi(s)) + ci \u00b7 mi(s) where the term ci is to be learned. This approach\neffectively learns an imputation value ci for each missing feature. For a more expressive model, one\ncould also include terms which learn interactions across different missing-feature indicators and/or\nfeatures, but we did not explore this here.\nLearning features: Finally, we point out that using a neural network for posterior estimation yields\na straightforward way of learning relevant features from data [38, 39, 40]. Rather than feeding\nsummary features f (s) into the network, we directly feed time-series recordings of neural activity\ninto the network. The \ufb01rst layer of the MDN becomes a recurrent layer instead of a fully-connected\none. By minimising the variational objective (Eq.2), the network learns informative summary features\nabout posterior densities.\n3 Results\nWhile SNPE is in principle applicable to any simulator-based model, we designed it for performing\ninference on models of neural dynamics. In our applications, we concentrate on single-neuron models.\nWe demonstrate the ability of SNPE to recover ground-truth posteriors in Gaussian Mixtures and\nGeneralised Linear Models (GLMs) [41], and apply SNPE to a Hodgkin-Huxley neuron model and\nan autapse model, which can have parameter regimes of unstable behaviour and missing features.\n3.1 Statistical inference on simple models\nGaussian mixtures: We \ufb01rst demonstrate the effectiveness of SNPE for inferring the posterior of\nmixtures of two Gaussians, for which we can analytically compute true posteriors. We are interested\nin the numerical stability of the method (\u2018robustness\u2019) and the \u2018\ufb02exibility\u2019 to approximate multi-modal\nposteriors. To illustrate the robustness of SNPE, we apply SNPE and the method proposed by [19]\n(which we refer to by Conditional Density Estimation for Likelihood-free Inference, CDE-LFI) to\ninfer the common mean of a mixture of two Gaussians, given samples from the mixture distribution\n(Fig. 2A; details in Appendix D.1). Whereas SNPE works robustly across multiple algorithmic\nrounds, CDE-LFI can become unstable: its analytical correction requires a division by a Gaussian\nwhich becomes unstable if the precision of the Gaussian does not increase monotonically across\nrounds (see 2.1). Constraining the precision-matrix to be non-decreasing \ufb01xes the numerical issue,\nbut leads to biased estimates of the posterior. Second, we apply both SNPE and CDE-LFI to infer\nthe two means of a mixture of two Gaussians, given samples x from the mixture distribution (Fig.\n2B; Appendix D.1). While SNPE can use bi-modal proposals, CDE-LFI cannot, implying reduced\nef\ufb01ciency of proposals on strongly non-Gaussian or multi-modal problems.\n\n3An alternative approach would be to \ufb01rst learn p(\u03b8|b(s) = 0) by applying SNPE to a single feature,\nf1(s) = b(s), and to subsequently run SNPE on the full feature-set, but using p(\u03b8|b(s) = 0) as prior \u2013 however,\nthis would \u2018waste\u2019 simulations for learning p(\u03b8|b(s) = 1).\n\n5\n\n\fFigure 2: Inference on simple statistical models. A. Robustness of posterior inference on 1-D\nGaussian Mixtures (GMs). Left: true posterior given observation at xo = 0. Middle: percentage\nof completed runs as a function of number of rounds; SNPE is robust. Right: Gaussian proposal\npriors tend to underestimate tails of posterior (red). B. Flexibility of posterior inference. Left: True\nposterior for 1-D bimodal GM and observation xo. Middle and right: First round proposal priors\n(dotted), second round proposal priors (dashed) and estimated posteriors (solid) for CDE-LFI and\nSNPE respectively (true posterior red). SNPE allows multi-modal proposals. C, F. Application to\nGLM. Posterior means and variances are recovered well by both CDE-LFI and SNPE. For reference,\nwe approximate the posterior using likelihood-based PG-MCMC. D. Covariance matrices for SNPE\nand PG-MCMC. E. Partial view of the posterior for 3 out of 10 parameters (all 10 parameters in\nAppendix G). Ground-truth parameters in red. 2-D marginals for SNPE (lines) and PG-MCMC\n(histograms). White and yellow contour lines correspond to 68% and 95% of the mass, respectively.\n\nGeneralised linear models: Generalised linear models (GLM) are commonly used to model\nneural responses to sensory stimuli. For these models, several techniques are available to estimate the\nposterior distribution over parameters, making them ideally suited to test SNPE in a single-neuron\nmodel. We evaluated the posterior distribution over the parameters of a GLM using a P\u00f3lya-Gamma\nsampler (PG-MCMC, [42, 43]) and compared it to the posterior distributions estimated by SNPE\n(Appendix D.2 for details). We found a good agreement of the posterior means and variances (Fig.\n2C), covariances (Fig. 2D), as well as pairwise marginals (Fig. 2E). We note that, since GLMs have\nclose-to-Gaussian posteriors, the CDE-LFI method works extremely well on this problem (Fig. 2F).\nIn summary, SNPE leads to accurate and robust estimation of the posterior in simple models. It works\neffectively even on multi-modal posteriors on which CDE-LFI exhibits worse performance. On a\nGLM-example with an (almost) Gaussian posterior, the CDE-LFI method works extremely well,\nbut SNPE yields very similar posterior estimates (see Appendix F for additional comparison with\nSMC-ABC).\n\n3.2 Statistical inference on Hodgkin-Huxley neuron models\nSimulated data:\nThe Hodgkin-Huxley equations [44] describe the dynamics of a neuron\u2019s mem-\nbrane potential and ion channels given biophysical parameters (e.g. concentration of sodium and\npotassium channels) and an injected input current (Fig. 3A, see Appendix D.3). We applied SNPE\nto a Hodgkin-Huxley model with channel kinetics as in [45] and inferred the posterior over 12\nbiophysical parameters, given 20 voltage features of the simulated data. The true parameter values are\nclose to the mode of the inferred posterior (Fig. 3B, D), and in a region of high posterior probability.\nSamples from the posterior lead to voltage traces that are similar to the original data, supporting the\ncorrectness of the approach (Fig. 3C).\n\n6\n\n\u2212202\u03b8012p\u03b8xx*(|=)o23456# of rounds050100% completed runsSNPECDE-LFI\u2212202\u03b8densityp\u03b8()(2)~p\u03b8()(6)~0xo8x-10010\u03b8\u221210010\u03b8densityCDE-LFI\u221210010\u03b8densitySNPE1510parameter\u2212202valuetrue valueSNPEPG-MCMC-0.00.1PG-MCMC covariance-0.00.1SNPE covariance-3.0-0.5b0...-0.71.5h1...-0.32.4h2......1510parameter\u2212202valuetrue valueCDE-LFIPG-MCMCABCDEF\fFigure 3: Application to Hodgkin-Huxley model: A. Simulation of Hodgkin-Huxley model with\ncurrent injection. B. Posterior over 3 out of 12 parameters inferred with SNPE (12 parameters in\nAppendix G). True parameters have high posterior probabilities (red). C. Traces for the mode (cyan)\nof and samples (orange) from the inferred posterior match the original data (blue). D. Comparison\nbetween SNPE and a standard parameter-\ufb01tting procedure based on a genetic algorithm, IBEA:\ndifference between the mode of SNPE or IBEA best parameter set, and the ground-truth parameters,\nnormalised by the standard deviations obtained by SNPE. E-G. Application to real data from Allen\nCell Type Database. Inference over 12 parameters for cell 464212183. Results presented as in A-C.\n\nBiophysical neuron models are typically \ufb01t to data with genetic algorithms applied to the distance\nbetween simulated and measured data-features [2, 8, 9, 46]. We compared the performance of SNPE\nwith a commonly used genetic algorithm (Indicator Based Evolutionary Algorithm, IBEA, from the\nBluePyOpt package [9]), given the same number of model simulations (Fig. 3D). SNPE is comparable\nto IBEA in approximating the ground-truth parameters \u2013 note that de\ufb01ning an objective measure to\ncompare the two approaches is dif\ufb01cult, as they both minimise different criteria. However, unlike\nIBEA, SNPE also returns a full posterior distribution, i.e. the space of all parameters consistent with\nthe data, rather than just a \u2018best \ufb01t\u2019.\nIn-vitro recordings: We also applied the approach to in vitro recordings from the mouse visual\ncortex (see Appendix D.4, Fig. 3E-G). The posterior mode over 12 parameters of a Hodgkin-Huxley\nmodel leads to a voltage trace which is similar to the data, and the posterior distribution shows the\nspace of parameters for which the output of the model is preserved. These posteriors could be used to\nmotivate further experiments for constraining parameters, or to study invariances in the model.\n3.3 Dealing with bad simulations and features\nBad simulations: We demonstrate our approach (see Section 2.3) for dealing with \u2018bad simulations\u2019\n(e.g. for which \ufb01ring rates diverge) using a simple, two-parameter \u2018autapse\u2019 model for which the region\nof stability is known. During SNPE, we concurrently train a classi\ufb01er to predict \u2018bad simulations\u2019 and\nupdate the prior accordingly. This approach does not only lead to a more ef\ufb01cient use of simulations,\nbut also identi\ufb01es the parameter space for which the simulator is well-de\ufb01ned, information that could\nbe used for further model analysis (Fig. 4A, B).\nBad features: Many features of interest in neural models, e.g. the latency to \ufb01rst spike after the\ninjection of a current input, are only well de\ufb01ned in the presence of other features, e.g. the presence\nof spikes (Fig. 4C). Given that large parts of the parameter space can lead to non-spiking behaviour,\nmissing features occur frequently and cannot simply be ignored. We enriched our MDN with an extra\nlayer which imputes values to the absent features, values which are optimised alongside the rest of\nthe parameters of the network (Fig. 4D; Appendix E). Such imputation has marginal computational\n\n7\n\n\u221280\u22122040voltage (mV)060120time (ms)0.000.55input (nA)3.24.3lng()Na...0.92.0lng()K...-3.0-1.9lng()l......060120time (ms)\u221280\u22122040voltage (mV)gNagKENakbn1VTnoisegM\u2212Eltmaxglkbn2\u2212EK0.01.22.3|| - || / \u03b8\u03b8\u03c3*\u03b8SNPE meanbest IBEA\u221280\u22122040voltage (mV)06251250time (ms)0.000.19input (nA)3.24.3lng()Na...0.92.0lng()K...-3.0-1.9lng()l......06251250time (ms)\u221280\u22122040voltage (mV)ABCDEFG\fInference on neural dynamics has to deal with diverging simulations and missing\nFigure 4:\nfeatures. A. Firing rate of a model neuron connected to itself (autapse). If the strength of the self-\nconnection (parameter J) is bigger than 1, the dynamics are unstable (orange line - bad simulation).\nB. Portion of parameter space leading to diverging simulations learned by the classi\ufb01er (yellow: low\nprobability of bad simulation, blue: high probability), and comparison with analytically computed\nboundaries (white, see Appendix D.5). C. Illustration of a model neuron in two parameter regimes,\nspiking (grey trace) and non-spiking (blue). When the neuron does not spike, features that depend on\nthe presence of spiking, such as the latency to \ufb01rst spike, are not de\ufb01ned. D. Our MDN is augmented\nwith a multiplicative layer which imputes values for missing features.\n\ncost and grants us the convenience of not having to hand-tune imputation values, or to reject all\nsimulations for which any individual feature might be missing.\nLearning features with recurrent neural networks (RNNs):\nIn neural modelling, it is often\nof interest to work with hand-designed features that are thought to be particularly important or\ninformative for particular analysis questions [2]. For instance, the shape of the action potential is\nintimately related to the dynamics of sodium and potassium channels in the Hodgkin-Huxley model.\nHowever, the space of possible features is immense, and given the highly non-linear nature of many of\nthe neural models in question, it can sometimes be of interest to simply perform statistical inference\nwithout having to hand-design features. Our approach provides a straightforward means of doing that:\nwe augment the MDN with a RNN which runs along the recorded voltage trace (and stimulus, here a\ncoloured-noise input) to learn appropriate features to constrain the model parameters. As illustrated in\n\ufb01gure 5B, the \ufb01rst layer of the network, which previously received pre-computed summary statistics\nas inputs, is replaced by a recurrent layer that receives full voltage and current traces as inputs. In\norder to capture long-term dependencies in the sequence input, we use gated-recurrent units (GRUs)\nfor the RNN [47]. Since we are using 25 GRU units and only keep the \ufb01nal output of the unrolled\nRNN (many-to-one), we introduce a bottleneck. The RNN thus transforms the voltage trace and\nstimulus into a set of 25 features, which allow SNPE to recover the posterior over the 12 parameters\n(Fig. 5C). As expected, the presence of spikes in the observed data leads to a tighter posterior for\nparameters associated to the main ion channels involved in spike generation, ENa, EK, gNa and gK.\n4 Discussion\nQuantitatively linking models of neural dynamics to data is a central problem in computational\nneuroscience. We showed that likelihood-free inference is at least as general and ef\ufb01cient as \u2018black-\nbox\u2019 parameter \ufb01tting approaches in neuroscience, but provides full statistical inference, suggesting\nit to be the method of choice for inference on single-neuron models. We argued that ABC approaches\nbased on density estimation are particularly useful for neuroscience, and introduced a novel algorithm\n(SNPE) for estimating posterior distributions. We can \ufb02exibly and robustly estimate posterior\ndistributions, even when large regions of the parameter space correspond to unstable model behaviour,\nor when features of choice are missing. Furthermore, we have extended our approach with RNNs to\nautomatically de\ufb01ne features, thus increasing the potential for better capturing salient aspects of the\ndata with highly non-linear models. SNPE is therefore equipped to estimate posterior distributions\nunder common constraints in neural models.\nOur approach directly builds on a recent approach for density estimation ABC (CDE-LFI, [19]).\nWhile we found CDE-LFI to work well on problems with unimodal, close-to-Gaussian posteriors and\nstable simulators, our approach extends the range of possible applications, and these extensions are\ncritical for the application to neuron models. A key component of SNPE is the proposal prior, which\nguides the sampling on each round of the algorithm. Here, we used the posterior on the previous\nround as the proposal for the next one, as in CDE-LFI and in many Sequential-MC approaches. Our\n\n8\n\n050100time (ms)10\u22121100101102103104rate (Hz)observed databad simulation0.012.0J-102.5\u03c4g\u03b8()^1.00.0\u221280\u22122040voltage (mV)060120time (ms)0.03.6input (mA)\u2026\u2026\u2026\u2026h2h3hH\u2026h1m1(s)\u2026c11-m1(s)\u2026m2(s)f1(s)f2(s)+ABCD\fFigure 5: We can learn informative features using a recurrent mixture-density network (R-\nMDN). A. We consider a neuron driven by a colored-noise input current. B. Rather than engineering\nsummary features to reduce the dimensionality of observations, we provide the complete voltage\ntrace and input current as input to an R-MDN. The unrolled forward pass is illustrated, where a\nmany-to-one recurrent network reduces the dimensionality of the inputs (T time steps long) to a\nfeature vector of dimensionality N. C. Our goal is to infer the posterior density for two different\nobservations: (1) the full 240ms trace shown in panel A; and (2) the initial 60ms of its duration, which\ndo not show any spike. We show the obtained marginal posterior densities for the two observations,\nusing a 25-dimensional feature vector learned by the RNN. In the presence of spikes, the posterior\nuncertainty gets tighter around the true parameters related to spiking.\n\nmethod could be extended by alternative approaches to designing proposal priors [48, 49], e.g. by\nexploiting the fact that we also represent a posterior over MDN parameters: for example, one could\ndesign proposals that guide sampling towards regions of the parameter space where the uncertainty\nabout the parameters of the posterior model is highest. We note that, while here we concentrated\non models of single neurons, ABC methods and our approach will also be applicable to models of\npopulations of neurons. Our approach will enable neuroscientists to perform Bayesian inference on\ncomplex neuron models without having to design model-speci\ufb01c algorithms, closing the gap between\nmechanistic and statistical models, and enabling theory-driven data-analysis [50].\n\nAcknowledgements\nWe thank Maneesh Sahani, David Greenberg and Balaji Lakshminarayanan for useful comments\non the manuscript. This work was supported by SFB 1089 (University of Bonn) and SFB 1233\n(University of T\u00fcbingen) of the German Research Foundation (DFG) to JHM and by the caesar\nfoundation.\n\nReferences\n\n[1] W Gerstner, W M Kistler, R Naud, and L Paninski. Neuronal dynamics: From single neurons to networks\n\nand models of cognition. Cambridge University Press, 2014.\n\n[2] S Druckmann, Y Banitt, A Gidon, F Sch\u00fcrmann, H Markram, and I Segev. A novel multiple objective\noptimization framework for constraining conductance-based neuron models by experimental data. Front\nNeurosci, 1, 2007.\n\n[3] C van Vreeswijk and H Sompolinsky. Chaos in neuronal networks with balanced excitatory and inhibitory\n\nactivity. Science, 274(5293), 1996.\n\n[4] H Markram et al. Reconstruction and Simulation of Neocortical Microcircuitry. Cell, 163(2), 2015.\n[5] Q J M Huys and L Paninski. Smoothing of, and parameter estimation from, noisy biophysical recordings.\n\nPLoS Comput Biol, 5(5), 2009.\n\n[6] L Meng, M A Kramer, and U T Eden. A sequential monte carlo approach to estimate biophysical neural\n\nmodels from spikes. J Neural Eng, 8(6), 2011.\n\n[7] C D Meliza, M Kostuk, H Huang, A Nogaret, D Margoliash, and H D I Abarbanel. Estimating parameters\nand predicting membrane voltages with conductance-based neuron models. Biol Cybern, 108(4), 2014.\n[8] C Rossant, D F M Goodman, B Fontaine, J Platkiewicz, A K Magnusson, and R Brette. Fitting neuron\n\nmodels to spike trains. Front Neurosci, 5:9, 2011.\n\n[9] W Van Geit, M Gevaert, G Chindemi, C R\u00f6ssert, J Courcol, E B Muller, F Sch\u00fcrmann, I Segev, and\nH Markram. Bluepyopt: Leveraging open source software and cloud infrastructure to optimise model\nparameters in neuroscience. Front Neuroinform, 10:17, 2016.\n\n[10] A A Prinz, C P Billimoria, and E Marder. Alternative to hand-tuning conductance-based models: Con-\n\nstruction and analysis of databases of model neurons. J Neurophysiol, 90(6), 2003.\n\n9\n\n\u221280\u22122040voltage (mV)602400120240time (ms)0.002.55input (nA)v1GRUsi1GRUsGRUsGRUs\u2026v2i2v3i3vTiT\u2026f1f2fN\u2026\u2026FeaturesMixture Density NetworkgNagKglENa60240\u2212EK\u2212ElgMtmaxkbn1kbn2VTnoiseABC\f[11] C Stringer, M Pachitariu, N A Steinmetz, M Okun, P Bartho, K D Harris, M Sahani, and N A Lesica.\n\nInhibitory control of correlated intrinsic variability in cortical networks. Elife, 5, 2016.\n\n[12] Kristofor D Carlson, Jayram Moorkanikara Nageswaran, Nikil Dutt, and Jeffrey L Krichmar. An ef\ufb01-\ncient automated parameter tuning framework for spiking neural networks. Front Neurosci, 8:10, 2014.\ndoi:10.3389/fnins.2014.00010.\n\n[13] P Friedrich, M Vella, A I Guly\u00e1s, T F Freund, and S K\u00e1li. A \ufb02exible, interactive software tool for \ufb01tting\n\nthe parameters of neuronal models. Frontiers in neuroinformatics, 8, 2014.\n\n[14] P J Diggle and R J Gratton. Monte carlo methods of inference for implicit statistical models. J R Stat Soc\n\nB Met, 1984.\n\n[15] F Hartig, J M Calabrese, B Reineking, T Wiegand, and A Huth. Statistical inference for stochastic\n\nsimulation models\u2013theory and application. Ecol Lett, 14(8), 2011.\n\n[16] J Lintusaari, M U Gutmann, R Dutta, S Kaski, and J Corander. Fundamentals and recent developments in\n\napproximate bayesian computation. Syst Biol, 2016.\n\n[17] Aidan C Daly, David J Gavaghan, Chris Holmes, and Jonathan Cooper. Hodgkin\u2013huxley revisited:\nreparametrization and identi\ufb01ability analysis of the classic action potential model with approximate\nbayesian methods. Royal Society open science, 2(12):150499, 2015.\n\n[18] M G B Blum and O Fran\u00e7ois. Non-linear regression models for approximate bayesian computation. Stat\n\nComput, 20(1), 2010.\n\n[19] G Papamakarios and I Murray. Fast epsilon-free inference of simulation models with bayesian conditional\n\ndensity estimation. In Adv in Neur In, 2017.\n\n[20] N T Carnevale and M L Hines. The NEURON Book. Cambridge University Press, 2009.\n[21] E Meeds, M Welling, et al. Gps-abc: Gaussian process surrogate approximate bayesian computation. UAI,\n\n2014.\n\n[22] J K Pritchard, M T Seielstad, A Perez-Lezaun, and M W Feldman. Population growth of human y\n\nchromosomes: a study of y chromosome microsatellites. Mol Biol Evol, 16(12), 1999.\n\n[23] P Marjoram, J Molitor, V Plagnol, and S Tavare. Markov chain monte carlo without likelihoods. Proc\n\nNatl Acad Sci U S A, 100(26), 2003.\n\n[24] E Meeds, R Leenders, and M Welling. Hamiltonian abc. arXiv preprint arXiv:1503.01916, 2015.\n[25] M A Beaumont, J Cornuet, J Marin, and C P Robert. Adaptive approximate bayesian computation.\n\nBiometrika, 2009.\n\n[26] F V Bonassi, M West, et al. Sequential monte carlo with adaptive weights for approximate bayesian\n\ncomputation. Bayesian Anal, 10(1), 2015.\n\n[27] R Wilkinson. Accelerating abc methods using gaussian processes. In AISTATS, 2014.\n[28] S N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310), 2010.\n[29] V M H Ong, D J Nott, M Tran, S A Sisson, and C C Drovandi. Variational bayes with synthetic likelihood.\n\n[30] Y Fan, D J Nott, and S A Sisson. Approximate bayesian computation via regression density estimation.\n\n[31] B M Turner and P B Sederberg. A generalized, likelihood-free method for posterior estimation. Psycho-\n\nnomic Bulletin & Review, 21(2), 2014.\n\n[32] L F Price, C C Drovandi, A Lee, and David J N. Bayesian synthetic likelihood. J Comput Graph Stat,\n\n[33] M Beaumont, W Zhang, and D J Balding. Approximate bayesian computation in population genetics.\n\narXiv:1608.03069, 2016.\n\nStat, 2(1), 2013.\n\n(just-accepted), 2017.\n\nGenetics, 162(4), 2002.\n\n[34] G E Hinton and D Van Camp. Keeping the neural networks simple by minimizing the description length\nof the weights. In Proceedings of the sixth annual conference on Computational learning theory, 1993.\n\n[35] A Graves. Practical variational inference for neural networks. In Adv Neur In, 2011.\n[36] D P Kingma, T Salimans, and M Welling. Neural adaptive sequential monte carlo. In Variational Dropout\n\nand the Local Reparameterization Trick, pages 2575\u20132583, 2015.\n\n[37] F Gerhard, M Deger, and W Truccolo. On the stability and dynamics of stochastic spiking neuron models:\n\nNonlinear hawkes process and point process glms. PLoS Comput Biol, 13(2), 2017.\n\n[38] K Cho, B Van Merri\u00ebnboer, C Gulcehre, D Bahdanau, F Bougares, H Schwenk, and Y Bengio. Learning\nphrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint\narXiv:1406.1078, 2014.\n\n[39] M G B Blum, M A Nunes, Ds Prangle, S A Sisson, et al. A comparative review of dimension reduction\n\nmethods in approximate bayesian computation. Statistical Science, 28(2), 2013.\n\n[40] B Jiang, T Wu, Cs Zheng, and W H Wong. Learning summary statistic for approximate bayesian\n\ncomputation via deep neural network. arXiv preprint arXiv:1510.02175, 2015.\n\n[41] J W Pillow, J Shlens, L Paninski, A Sher, A M Litke, E J Chichilnisky, and E P Simoncelli. Spatio-temporal\n\ncorrelations and visual signalling in a complete neuronal population. Nature, 454(7207), 2008.\n\n[42] N G Polson, J G Scott, and J Windle. Bayesian inference for logistic models using p\u00f3lya\u2013gamma latent\n\nvariables. J Am Stat Assoc, 108(504), 2013.\n\n10\n\n\f[43] S Linderman, R P Adams, and J W Pillow. Bayesian latent structure discovery from multi-neuron\n\nrecordings. In Advances in Neural Information Processing Systems, 2016.\n\n[44] A L Hodgkin and A F Huxley. A quantitative description of membrane current and its application to\n\nconduction and excitation in nerve. J Physiol, 117(4), 1952.\n\n[45] M Pospischil, M Toledo-Rodriguez, C Monier, Z Piwkowska, T Bal, Y Fr\u00e9gnac, H Markram, and\nA Destexhe. Minimal hodgkin-huxley type models for different classes of cortical and thalamic neurons.\nBiol Cybern, 99(4-5), 2008.\n\n[46] E Hay, S Hill, F Sch\u00fcrmann, H Markram, and I Segev. Models of neocortical layer 5b pyramidal cells\n\ncapturing a wide range of dendritic and perisomatic active properties. PLoS Comput Biol, 7(7), 2011.\n\n[47] J Chung, C Gulcehre, K H Cho, and Y Bengio. Empirical evaluation of gated recurrent neural networks\n\non sequence modeling. arXiv preprint arXiv:1412.3555, 2014.\n\n[48] Marko J\u00e4rvenp\u00e4\u00e4, Michael U Gutmann, Aki Vehtari, and Pekka Marttinen. Ef\ufb01cient acquisition rules for\n\nmodel-based approximate bayesian computation. arXiv preprint arXiv:1704.00520, 2017.\n\n[49] S Gu, Z Ghahramani, and R E Turner. Neural adaptive sequential monte carlo. In Advances in Neural\n\n[51] G De Nicolao, G Sparacino, and C Cobelli. Nonparametric input estimation in physiological systems:\n\nproblems, methods, and case studies. Automatica, 33(5), 1997.\n\nInformation Processing Systems, pages 2629\u20132637, 2015.\n\n[50] S W Linderman and S J Gershman. Using computational theory to constrain statistical models of neural\n\ndata. bioRxiv, 2017.\n\n11\n\n\f", "award": [], "sourceid": 855, "authors": [{"given_name": "Jan-Matthis", "family_name": "Lueckmann", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Pedro", "family_name": "Goncalves", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Giacomo", "family_name": "Bassetto", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Kaan", "family_name": "\u00d6cal", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Marcel", "family_name": "Nonnenmacher", "institution": "research center caesar, an associate of the Max Planck Society"}, {"given_name": "Jakob", "family_name": "Macke", "institution": "research center caesar, an associate of the Max Planck Society"}]}