{"title": "Variational inference for Markov jump processes", "book": "Advances in Neural Information Processing Systems", "page_first": 1105, "page_last": 1112, "abstract": "Markov jump processes play an important role in a large number of application domains. However, realistic systems are analytically intractable and they have traditionally been analysed using simulation based techniques, which do not provide a framework for statistical inference. We propose a mean field approximation to perform posterior inference and parameter estimation. The approximation allows a practical solution to the inference problem, {while still retaining a good degree of accuracy.} We illustrate our approach on two biologically motivated systems.", "full_text": "Variational inference for Markov jump processes\n\nDepartment of Computer Science\n\nTechnische Universit\u00a4at Berlin\nD-10587 Berlin, Germany\n\nManfred Opper\n\nopperm@cs.tu-berlin.de\n\nGuido Sanguinetti\n\nDepartment of Computer Science\n\nUniversity of Shef(cid:2)eld, U.K.\nguido@dcs.shef.ac.uk\n\nAbstract\n\nMarkov jump processes play an important role in a large number of application\ndomains. However, realistic systems are analytically intractable and they have tra-\nditionally been analysed using simulation based techniques, which do not provide\na framework for statistical inference. We propose a mean (cid:2)eld approximation to\nperform posterior inference and parameter estimation. The approximation allows\na practical solution to the inference problem, while still retaining a good degree of\naccuracy. We illustrate our approach on two biologically motivated systems.\n\nIntroduction\n\nMarkov jump processes (MJPs) underpin our understanding of many important systems in science\nand technology. They provide a rigorous probabilistic framework to model the joint dynamics of\ngroups (species) of interacting individuals, with applications ranging from information packets in\na telecommunications network to epidemiology and population levels in the environment. These\nprocesses are usually non-linear and highly coupled, giving rise to non-trivial steady states (often\nreferred to as emerging properties). Unfortunately, this also means that exact statistical inference is\nunfeasible and approximations must be made in the analysis of these systems.\nA traditional approach, which has been very successful throughout the past century, is to ignore\nthe discrete nature of the processes and to approximate the stochastic process with a deterministic\nprocess whose behaviour is described by a system of non-linear, coupled ODEs. This approximation\nrelies on the stochastic (cid:3)uctuations being negligible compared to the average population counts.\nThere are many important situations where this assumption is untenable: for example, stochastic\n(cid:3)uctuations are reputed to be responsible for a number of important biological phenomena, from\ncell differentiation to pathogen virulence [1]. Researchers are now able to obtain accurate estimates\nof the number of macromolecules of a certain species within a cell [2, 3], prompting a need for\npractical statistical tools to handle discrete data.\nSampling approaches have been extensively used to simulate the behaviour of MJPs. Gillespie\u2019s\nalgorithm and its generalisations [4, 5] form the basis of many simulators used in systems biology\nstudies. The simulations can be viewed as individual samples taken from a completely speci(cid:2)ed\nMJP, and can be very useful to reveal possible steady states. However, it is not clear how observed\ndata can be incorporated in a principled way, which renders this approach of limited use for posterior\ninference and parameter estimation. A Markov chain Monte Carlo (MCMC) approach to incorpo-\nrate observations has been recently proposed by Boys et al. [6]. While this approach holds a lot of\npromise, it is computationally very intensive. Despite several simplifying approximations, the cor-\nrelations between samples mean that several millions of MCMC iterations are needed even in simple\nexamples. In this paper we present an alternative, deterministic approach to posterior inference and\nparameter estimation in MJPs. We extend the mean-(cid:2)eld (MF) variational approach ([cf. e.g. 7])\nto approximate a probability distribution over an (in(cid:2)nite dimensional) space of discrete paths, rep-\nresenting the time-evolving state of the system. In this way, we replace the couplings between the\n\n1\n\n\fdifferent species by their average, mean-(cid:2)eld (MF) effect. The result is an iterative algorithm that\nallows parameter estimation and prediction with reasonable accuracy and very contained computa-\ntional costs.\nThe rest of this paper is organised as follows: in sections 1 and 2 we review the theory of Markov\njump processes and introduce our general strategy to obtain a MF approximation.\nIn section 3\nwe introduce the Lotka-Volterra model which we use as an example to describe how our approach\nworks. In section 4 we present experimental results on simulated data from the Lotka-Volterra model\nand from a simple gene regulatory network. Finally, we discuss the relationship of our study to other\nstochastic models, as well as further extensions and developments of our approach.\n\n1 Markov jump processes\n\nWe start off by establishing some notation and basic de(cid:2)nitions. A D-dimensional discrete stochas-\ntic process is a family of D-dimensional discrete random variables x(t) indexed by the continuous\ntime t. In our examples, the values taken by x(t) will be restricted to the non-negative integers\n0 . The dimensionality D represents the number of (molecular) species present in the system; the\nND\ncomponents of the vector x (t) then represent the number of individuals of each species present at\ntime t. Furthermore, the stochastic processes we will consider will always be Markovian, i.e. given\nany sequence of observations for the state of the system (xt1 ; : : : ; xtN ), the conditional probability\nof the state of the system at a subsequent time xtN +1 depends only on the last of the previous obser-\nvations. A discrete stochastic process which exhibits the Markov property is called a Markov jump\nprocess (MJP).\nA MJP is characterised by its process rates f (x\ninterval (cid:14)t, the quantity f (x\na transition from state x at time t to state x\n\n6= x; in an in(cid:2)nitesimal time\n0jx) (cid:14)t represents the in(cid:2)nitesimal probability that the system will make\n\n0 at time t + (cid:14)t. Explicitly,\n\n0jx), de(cid:2)ned 8x\n\n0\n\n0\n\n0\n\np (x\n\n0jx)\n\nx + (cid:14)tf (x\n\n0jx) \u2019 (cid:14)x\n\nwhere (cid:14)x\n\n(1) implies by normalisation that f (xjx) = (cid:0)Px\n\n(1)\nx is the Kronecker delta and the equation becomes exact in the limit (cid:14)t ! 0. Equation\n0jx). The interpretation of the process\nrates as in(cid:2)nitesimal transition probabilities highlights the simple relationship between the marginal\ndistribution pt (x) and the process rates. The probability of (cid:2)nding the system in state x at time\nt + (cid:14)t will be given by the probability that the system was already in state x at time t, minus the\nprobability that the system was in state x at time t and jumped to state x\n0, plus the probability that\n00 at time t and then jumped to state x. In formulae, this is given\nthe system was in a different state x\nby\n\n06=x f (x\n\n0jx) (cid:14)t3\n\nf (x\n\n5 + Xx\n\n06=x\n\npt (x\n\n0) f (xjx\n\n0) (cid:14)t:\n\npt+(cid:14)t (x) = pt (x)2\n\n06=x\n\n41 (cid:0) Xx\n= Xx\n\n06=x\n\ndpt (x)\n\ndt\n\nTaking the limit for (cid:14)t ! 0 we obtain the (forward) Master equation for the marginal probabilities\n\n[(cid:0)pt (x) f (x\n\n0jx) + pt (x\n\n0) f (xjx\n\n0)] :\n\n(2)\n\n2 Variational approximate inference\n\nLet us assume that we have noisy observations yl\nl = 1; : : : ; N of the state of the system at a dis-\ncrete number of time points; the noise model is speci(cid:2)ed by a likelihood function ^p (yljx (tl)). We\ncan combine this likelihood with the prior process to obtain a posterior process. As the observations\nhappen at discrete time points, the posterior process is clearly still a Markov jump process. Given\nthe Markovian nature of the processes, one could hope to obtain the posterior rate functions g(x\n0jx)\nby a forward-backward procedure similar to the one used for Hidden Markov Models. While this\nis possible in principle, the computations would require simultaneously solving a very large system\nof coupled linear ODEs (the number of equations is of order S D, S being the number of states\naccessible to the system), which is not feasible even in simple systems.\n\n2\n\n\fIn the following, we will use the variational mean (cid:2)eld (MF) approach to approximate the posterior\nprocess by a factorizing process, minimising the Kullback - Leibler (KL) divergence between pro-\ncesses. The inference process is then reduced to the solution of D one - dimensional Master and\nbackward equations of size S. This is still nontrivial because the KL divergence requires the joint\nprobabilities of variables x(t) at in(cid:2)nitely many different times t, i.e. probabilities over entire paths\nof a process rather than the simpler marginals pt(x). We will circumvent this problem by working\nwith time discretised trajectories and then passing on to the continuum time limit. We denote such\na trajectory as x0:K = (x (t0) ; : : : ; x (t0 + K(cid:14)t)) where (cid:14)t is a small time interval and K is very\nlarge. Hence, we write the joint posterior probability as\n\nppost(x0:K) =\n\n1\nZ\n\npprior(x0:K) (cid:2)\n\nN\n\nYl=1\n\n^p (yljx (tl)) with pprior(x0:K ) = p(x0)\n\np(xk+1jxk)\n\nK(cid:0)1\n\nYk=0\n\nwith Z = p(y1; : : : ; yN ). Note that x (tl) 2 x0:K. In the rest of this section, we will show how to\ncompute the posterior rates and marginals by minimising the KL divergence. We notice in passing\nthat a similar framework for continuous stochastic processes was proposed recently in [8].\n\n2.1 KL divergence between MJPs\n\nThe KL divergence between two MJPs de(cid:2)ned by their path probabilities p(x0:K ) and q(x0:K ) is\n\nq(x0:K ) ln\n\nq(x0:K )\np(x0:K )\n\n=\n\nK(cid:0)1\n\nXk=0 Xxk\n\nq(xk) Xxk+1\n\nq(xk+1jxk) ln\n\nq(xk+1jxk)\np(xk+1jxk)\n\n+ K0\n\nKL [q; p] = Xx0:K\nand where K0 =Px0\nKL [q; p] =Z T\n\n0\n0jx) and g(x\n\np(x0) will be set to zero in the following. We can now use equation\n(1) for the conditional probabilities; letting (cid:14)t ! 0 and simultaneously K ! 1 so that K(cid:14)t ! T ,\nwe obtain\n\nq (x0) log q(x0)\n\ndt Xx\n\nqt(x) Xx\n\n0:x\n\n06=x(cid:26)g(x\n\n0jx) ln\n\n0jx)\ng(x\nf (x0jx)\n\n+ f (x\n\n0jx) (cid:0) g(x\n\n0jx)(cid:27)\n\n(3)\n\nwhere f (x\n0jx) are the rates of the p and q process respectively. Notice that we have\nswapped from the path probabilities framework to an expression that depends solely on the process\nrates and marginals.\n\n2.2 MF approximation to posterior MJPs\n\nWe will now consider the case where p is a posterior MJP and q is an approximating process. The\nprior process will be denoted as pprior and its rates will be denoted by f. The KL divergence then is\n\nN\n\nKL(q; ppost) = ln Z + KL(q; pprior) (cid:0)\n\nEq [ln ^p (yljx (tl))] :\n\nTo obtain a tractable inference problem, we will assume that, in the approximating process q, the\njoint path probability for all the species factorises into the product of path probabilities for individ-\nual species. This gives the following equations for the species probabilities and transition rates\n\nqt (x) =\n\nD\n\nYi=1\n\nqit (xi)\n\ngt (x\n\n0jx) =\n\nD\n\nXi=1Yj6=i\n\n(cid:14)x0\n\nj ;xj git (x0\n\nijxi) :\n\n(4)\n\nNotice that we have emphasised that the process rates for the approximating process may depend\nexplicitly on time, even if the process rates of the original process do not. Exploiting these assump-\ntions, we obtain that the KL divergence between the approximating process and the posterior process\nis given by\n\nN\n\nXl=1\n\nKL [q; ppost] = ln Z (cid:0)\n\nEq [ln ^p (yljx (tl))] +\n\nXl=1\n\nZ T\n\n0\n\ndtXi Xx\n\nqit(x) Xx0:x06=x(git (x0jx) ln\n\ngit (x0jx)\n^fi (x0jx)\n\n+ ~fi (x0jx) (cid:0) git (x0jx))\n\n(5)\n\n3\n\n\fwhere we have de(cid:2)ned\n\n^fi (x0jx) = exp(cid:0)Exni[ln fi(cid:0)x\n~fi (x0jx) = Exni[fi(cid:0)x\n\n0jx : x0\n\n0jx : x0\n\nj = xj; 8j 6= i(cid:1)](cid:1)\n\nj = xj; 8j 6= i(cid:1)]\n\nand Exni[: : :] denotes an expectation over all components of x except xi (using the measure q). In\norder to (cid:2)nd the MF approximation to the posterior process we must optimise the KL divergence (5)\nwith respect to the marginals qit(x) and the rates git (x0jx). These, however, are not independent\nbut ful(cid:2)ll the Master equation (2).\nWe will take care of this constraint by using a Lagrange multiplier function (cid:21)i(x; t) and compute\nthe stationary values of the Lagrangian\n\n(6)\n\n(7)\n\n(8)\n\n(9)\n\nL =KL (q; ppost)\n\n(cid:0)Xi Z T\n\n0\n\ndtXx\n\n(cid:21)i (x; t)0\n\n@@tqit (x) (cid:0) Xx06=x\n\nfgit (xjx0) qit (x0) (cid:0) git (x0jx) qit (x)g1\nA :\n\nWe can now compute functional derivatives of (7) to obtain\n\n(cid:14)L\n\n(cid:14)qit(x)\n\n= Xx06=x\"git (x0jx) ln\nXx0\n\ngit (x0jx)\n^fi (x0jx)\n\n(cid:0) git (x0jx) + ~fi (x0jx)# + @t(cid:21)i (x; t) +\n\ngit (x0jx) f(cid:21)i (x0; t) (cid:0) (cid:21)i (x; t)g (cid:0)Xl\n\nln ^p (yljx (t)) (cid:14) (t (cid:0) tl) = 0\n\n(cid:14)L\n\n(cid:14)git (x0jx)\n\n= qit (x) ln\n\ngit (x0jx)\n^fi (x0jx)\n\n+ (cid:21)i (x0; t) (cid:0) (cid:21)i (x; t)! = 0\n\nDe(cid:2)ning ri(x; t) = e(cid:0)(cid:21)i(x;t) and inserting (9) into (8), we arrive at the linear differential equation\n\ndri(x; t)\n\ndt\n\n= Xx06=x(cid:16) ~fi (x0jx) ri (x; t) (cid:0) ^fi (x0jx) ri (x0; t)(cid:17)\n\n(10)\n\nvalid for all times outside of the observations. To include the observations, we assume for simplicity\n8l. Then\n\nthat the noise model factorises across the species, so that ^p (yljx(t)) =Qi ^pi (yiljxi(tl))\n\nequation (8) yields\n\nlim\nt!t(cid:0)\nl\n\nri (x; t) = ^pi (yiljxi(tl)) lim\nt!t+\nl\n\nri (x; t) :\n\nWe can then optimise the Lagrangian (7) using an iterative strategy. Starting with an initial guess for\nqt(x) and selecting a species i, we can compute ^fi (x0jx) and ~fi (x0jx). Using these, we can solve\nequation (10) backwards starting from the condition ri (x; T ) = 18x (i.e., the constraint becomes\nvoid at the end of the time under consideration). This allows us to update our estimate of the rates\ngit (x0jx) using equation (9), which can then be used to solve the master equation (2) and update our\nguess of qit(x). This procedure can be followed sequentially for all the species; as each step leads\nto a decrease in the value of the Lagrangian, this guarantees that the algorithm will converge to a\n(local) minimum.\n\n2.3 Parameter estimation\n\nSince KL [q; ppost] (cid:21) 0, we obtain as useful by-product of the MF approximation a tractable varia-\ntional lower bound on the log - likelihood of the data log Z = log p(y1; : : : ; yN ) from (5). As usual\n[e.g 7] such a bound can be used in order to optimise model parameters using a variational E-M\nalgorithm.\n\n4\n\n\f3 Example: the Lotka-Volterra process\n\nThe Lotka-Volterra (LV) process is often used as perhaps the simplest non-trivial MJP [6, 4]. Intro-\nduced independently by Alfred J. Lotka in 1925 and Vito Volterra in 1926, it describe the dynamics\nof a population composed of two interacting species, traditionally referred to as predator and prey.\nThe process rates for the LV system are given by\n\nfprey (x + 1jx; y) = (cid:11)x\nfpredator (y + 1jx; y) = (cid:14)xy\n\nfprey (x (cid:0) 1jx; y) = (cid:12)xy\nfprey (y (cid:0) 1jx; y) = (cid:13)y\n\n(11)\n\nwhere x is the number of preys and y is the number of predators. All other rates are zero: individuals\ncan only be created or destroyed one at the time. Rate sparsity is a characteristic of very many\nprocesses, including all chemical kinetic processes (indeed, the LV model can be interpreted as a\nchemical kinetic model). An immediate dif(cid:2)culty in implementing our strategy is that some of the\nprocess rates are identically zero when one of the species is extinct (i.e. its numbers have reached\nzero); this will lead to in(cid:2)nities when computing the expectation of the logarithm of the rates in\nequation (6). To avoid this, we will (cid:147)regularise(cid:148) the process by adding a small constant to the f (1j0);\nit can be proved that on average over the data generating process the variational approximation to\nthe regularised process still optimises a bound analogous to (3) on the original process [9].\nThe variational estimates for the parameters of the LV process are obtained by inserting the process\nrates (11) into the MF bound and taking derivatives w.r.t. the parameters. Setting them to zero, we\nobtain a set of (cid:2)xed point equations\n\n(cid:11) = R T\n(cid:13) = R T\n\n0 hgpreyt (x + 1jx)ipreyt\n\n;\n\n0 dt hxipreyt\n\n0 hgpredatort (y (cid:0) 1jy)ipredatort\n\n;\n\n0 dt hyipredatort\n\nR T\nR T\n\n(cid:12) = R T\nR T\n\n0 hgpreyt (x (cid:0) 1jx)ipreyt\n0 dt hxipreyt hyipredatort\n0 hgpredatort (y + 1jy)ipredatort\n\n0 dt hyipredatort hxipreyt\n\n;\n\n:\n\n(12)\n\n(cid:14) = R T\n\nR T\n\nEquations (12) have an appealing intuitive meaning in terms of the physics of the process: for\nexample, (cid:11) is given by the average total increase rate of the approximating process divided by the\naverage total number of preys.\nWe generated 15 counts of predator and prey numbers at regular intervals from a LV process with\nparameters (cid:11) = 5 (cid:2) 10(cid:0)4, (cid:12) = 1 (cid:2) 10(cid:0)4, (cid:13) = 5 (cid:2) 10(cid:0)4 and (cid:14) = 1 (cid:2) 10(cid:0)4, starting from initial\npopulation levels of seven predators and nineteen preys. These counts were then corrupted according\nto the following noise model\n\n^pi (yiljxi (tl)) /(cid:20)\n\n1\n\n2jyil(cid:0)xi(tl)j + 10(cid:0)6(cid:21) ;\n\n(13)\n\nwhere xi (tl) is the (discrete) count for species i at time tl before the addition of noise. Notice that,\nsince population numbers are constrained to be positive, the noise model is not symmetric. The\noriginal count is placed at the mode, rather than the mean, of the noise model. This asymmetry is\nunavoidable when dealing with quantities that are constrained positive.\nWhile in theory each species can have an arbitrarily large number of individuals, in order to solve the\ndifferential equations (2) and (10) we have to truncate the process. While the truncation threshold\ncould be viewed as another parameter and optimised variationally, in these experiments we took a\nmore heuristic approach and limited the maximum number of individuals of each species to 200.\nThis was justi(cid:2)ed by considering that an exponential growth pattern (cid:2)tted to the available data led to\nan estimate of approximately 90 individuals in the most abundant species, well below the truncation\nthreshold.\nThe results of the inference are shown in Figure 1. The solid line is the mean of the approximating\ndistribution, the dashed lines are the 90% con(cid:2)dence intervals, the dotted line is the true path from\nwhich the data was obtained. The diamonds are the noisy observations. The parameter values\ninferred are reasonably close to the real parameter values: (cid:11) = 1:35 (cid:2) 10(cid:0)3, (cid:12) = 2:32 (cid:2) 10(cid:0)4,\n\n5\n\n\fy\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n0\n\nx\n\n25\n\n20\n\n15\n\n10\n\n5\n\n0\n0\n\n500\n\n1000\n\n1500\n\n(a)\n\n2000\n\n2500\n\n3000\nt\n\n500\n\n1000\n\n1500\n\n(b)\n\n2000\n\n2500\n\n3000\nt\n\nFigure 1: MF approximation to posterior LV process: (a) predator population and (b) prey pop-\nulation. Diamonds are the (noisy) observed data points, solid line the mean, dashed lines 90%\ncon(cid:2)dence intervals, dotted lines the true path from which the data was sampled.\n\n(cid:13) = 1:57 (cid:2) 10(cid:0)3 and (cid:14) = 1:78 (cid:2) 10(cid:0)4. While the process is well approximated in the area where\ndata is present, the free-form prediction is less good, especially for the predator population. This\nmight be due to the inaccuracies in the estimates of the parameters. The approximate posterior\ndisplays nontrivial emerging properties: for example, we predict that there is a 10% chance that the\nprey population will become extinct at the end of the period of interest. These results were obtained\nin approximately (cid:2)fteen minutes on an Intel Pentium M 1.7GHz laptop computer.\nTo check the reliability of our inference results and the rate with which the estimated parameter\nvalues converge to the true values, we repeated our experiments for 5, 10, 15 and 20 available data\npoints. For each sample size, we drew (cid:2)ve independent samples from the same LV process. Figure\n2(a) shows the average and standard deviation of the mean squared error (MSE) in the estimate of\nthe parameters as a function of the number of observations N; as expected, this decreases uniformly\nwith the sample size.\n\n4 Example: gene autoregulatory network\n\nAs a second example we consider a gene autoregulatory network. This simple network motif is\none of the most important building blocks of the transcriptional regulatory networks found in cells\nbecause of its ability to increase robustness in the face of (cid:3)uctuation in external signals [10]. Because\nof this, it is one of the best studied systems, both at the experimental and at the modelling level\n[11, 3]. The system consists again of two species, mRNA and protein; the process rates are given by\n\nfRN A (x (cid:0) 1jx; y) = (cid:12)x\nfp (y (cid:0) 1jx; y) = (cid:14)y\n\nfRN A(x + 1jx; y) = (cid:11) (1 (cid:0) 0:99 (cid:2) (cid:2) (y (cid:0) yc))\nfp (y + 1jx; y) = (cid:13)x\n\n(14)\nwhere (cid:2) is the Heavyside step function, y the protein number and x the mRNA number. The intuitive\nmeaning of these equations is simple: both protein and mRNA decay exponentially. Proteins are\nproduced through translation of mRNA with a rate proportional to the mRNA abundance. On the\nother hand, mRNA production depends on protein concentration levels through a logical function:\nas soon as protein numbers increase beyond a certain critical parameter yc, mRNA production drops\ndramatically by a factor 100.\nThe optimisation of the variational bound w.r.t. the parameters (cid:11), (cid:12), (cid:13) and (cid:14) is straightforward and\nyields (cid:2)xed point equations similar to the ones for the LV process. The dependence of the MF bound\non the critical parameter yc is less straightforward and is given by\n\nLyc = const +(2Z T\n\ndt(cid:22)gh (yc) + log\"1 (cid:0) 0:99\nwhere (cid:22)g = hgRN A (x + 1jx)iqRN A and h (yc) = Py(cid:21)yc\n\nqp (y). A plot of this function obtained\nduring the inference task below can be seen in Figure 2(b). We can determine the minimum of (15)\nby searching over the possible (discrete) values of yc.\n\nh (yc) dt#Z T\n\ndt(cid:22)g)\n\n(15)\n\n1\n\nT Z T\n\n0\n\n0\n\n0\n\n6\n\n\fx 10\u22124\n\n2\n\nM SE\n\n1.5\n\n1\n\nL\n\n2.5\n\n2\n\n1.5\n\n1\n\n0.5\n\n0\n\n0.5\n4\n\n6\n\n8\n\n10\n\n12\n\n14\n\n(a)\n\n16\n\n18\n\n20\n\n22\n\nN\n\n\u22120.5\n0\n\n20\n\n40\n\n60\n\n80\n\n100\n\n(b)\n\n120\n\n140\n\n160\n\n180\n\n200\n\nyc\n\nFigure 2: (a) Mean squared error (MSE) in the estimate of the parameters as a function of the\nnumber of observations N for the LV process. (b) Negative variational likelihood bound for the\ngene autoregulatory network as a function of the critical parameter yc.\n\ny\n\n30\n\n28\n\n26\n\n24\n\n22\n\n20\n\n18\n\n16\n\n14\n0\n\nx\n\n18\n\n17\n\n16\n\n15\n\n14\n\n13\n\n12\n\n11\n\n10\n\n500\n\n1000\n\n(a)\n\n1500\n\n2000\nt\n\n9\n0\n\n500\n\n1000\n\n(b)\n\n1500\n\n2000\nt\n\nFigure 3: MF approximation to posterior autoregulatory network process: (a) protein population and\n(b) mRNA population. Diamonds are the (noisy) observed data points, solid line the mean, dashed\nlines 90% con(cid:2)dence intervals, dotted lines the true path from which the data was taken.\n\nAgain, we generated data by simulating the process with parameter values yc = 20, (cid:11) = 2 (cid:2) 10(cid:0)3,\n(cid:12) = 6 (cid:2) 10(cid:0)5, (cid:13) = 5 (cid:2) 10(cid:0)4 and (cid:14) = 7 (cid:2) 10(cid:0)5. Fifteen counts were generated for both mRNA\nand proteins, with initial count of 17 protein and 12 mRNA molecules. These were then corrupted\nwith noise generated from the distribution shown in equation (13). The results of the approximate\nposterior inference are shown in Figure 3. The inferred parameter values are in good agreement\nwith the true values: yc = 19, (cid:11) = 2:20 (cid:2) 10(cid:0)3, (cid:12) = 1:84 (cid:2) 10(cid:0)5, (cid:13) = 4:01 (cid:2) 10(cid:0)4 and\n(cid:14) = 1:54 (cid:2) 10(cid:0)4. Interestingly, if the data is such that the protein count never exceeds the critical\nparameter yc, this becomes unidenti(cid:2)able (the likelihood bound is optimised by yc = 1 or yc = 0),\nas may be expected. The likelihood bound loses its sharp optimum evident from Figure 2(b) (results\nnot shown).\n\n5 Discussion\n\nIn this contribution we have shown how a MF approximation can be used to perform posterior in-\nference in MJPs from discretely observed noisy data. The MF approximation has been shown to\nperform well and to retain much of the richness of these complex systems. The proposed approach\nis conceptually very different from existing MCMC approaches [6]. While these focus on sampling\nfrom the distribution of reactions happening in a small interval in time, we compute an approxi-\nmation to the probability distribution over possible paths of the system. This allows us to easily\nfactorise across species; by contrast, sampling the number of reactions happening in a certain time\n\n7\n\n\finterval is dif(cid:2)cult, and not amenable to simple techniques such as Gibbs sampling. While it is\npossible that future developments will lead to more ef(cid:2)cient sampling strategies, our approach out-\nstrips current MCMC based methods in terms of computational ef(cid:2)ciency, A further strength of our\napproach is the ease with which it can be scaled to more complex systems involving larger numbers\nof species. The factorisation assumption implies that the computational complexity grows linearly\nin the number of species D; it is unclear how MCMC would scale to larger systems.\nAn alternative suggestion, proposed in [11], was somehow to seek a middle way between a MJP\nand a deterministic, ODE based approach by approximating the MJP with a continuous stochastic\nprocess, i.e. by using a diffusion approximation. While these authors show that this approximation\nworks reasonably well for inference purposes, it is worth pointing out that the population sizes\nin their experimental results were approximately one order of magnitude larger than in ours.\nIt\nis arguable that a diffusion approximation might be suitable for population sizes as low as a few\nhundreds, but it cannot be expected to be reasonable for population sizes of the order of 10.\nThe availability of a practical tool for statistical inference in MJPs opens a number of important\npossible developments for modelling. It would be of interest, for example, to develop mixed mod-\nels where one species with low counts interacts with another species with high counts that can be\nmodelled using a deterministic or diffusion approximation. This situation would be of particular im-\nportance for biological applications, where different proteins can have very different copy numbers\nin a cell but still be equally important. Another interesting extension is the possibility of introducing\na spatial dimension which in(cid:3)uences how likely interactions are. Such an extension would be very\nimportant, for example, in an epidemiological study. All of these extensions rely centrally on the\npossibility of estimating posterior probabilities, and we expect that the availability of a practical tool\nfor the inference task will be very useful to facilitate this.\n\nReferences\n[1] Harley H. McAdams and Adam Arkin. Stochastic mechanisms in gene expression. Proceed-\n\nings of the National Academy of Sciences USA, 94:814(cid:150)819, 1997.\n\n[2] Long Cai, Nir Friedman, and X. Sunney Xie. Stochastic protein expression in individual cells\n\nat the single molecule level. Nature, 440:580(cid:150)586, 2006.\n\n[3] Yoshito Masamizu, Toshiyuki Ohtsuka, Yoshiki Takashima, Hiroki Nagahara, Yoshiko Take-\nnaka, Kenichi Yoshikawa, Hitoshi Okamura, and Ryoichiro Kageyama. Real-time imaging of\nthe somite segmentation clock: revelation of unstable oscillators in the individual presomitic\nmesoderm cells. Proceedings of the National Academy of Sciences USA, 103:1313(cid:150)1318,\n2006.\n\n[4] Daniel T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of\n\nPhysical Chemistry, 81(25):2340(cid:150)2361, 1977.\n\n[5] Eric Mjolsness and Guy Yosiphon. Stochastic process semantics for dynamical grammars. to\n\nappear in Annals of Mathematics and Arti(cid:2)cial Intelligence, 2006.\n\n[6] Richard J. Boys, Darren J. Wilkinson, and Thomas B. L. Kirkwood.\n\nBayesian\nfrom\n\navailable\n\ninference\nhttp://www.staff.ncl.ac.uk/d.j.wilkinson/pub.html, 2004.\n\na discretely observed stochastic kinetic model.\n\nfor\n\n[7] Manfred Opper and David Saad (editors). Advanced Mean Field Methods. MIT press, Cam-\n\nbridge,MA, 2001.\n\n[8] Cedric Archambeau, Dan Cornford, Manfred Opper, and John Shawe-Taylor. Gaussian process\napproximations of stochastic differential equations. Journal of Machine Learning Research\nWorkshop and Conference Proceedings, 1(1):1(cid:150)16, 2007.\n\n[9] Manfred Opper and David Haussler. Bounds for predictive errors in the statistical mechanics\n\nof supervised learning. Physical Review Letters, 75:3772(cid:150)3775, 1995.\n\n[10] Uri Alon. An introduction to systems biology. Chapman and Hall, London, 2006.\n[11] Andrew Golightly and Darren J. Wilkinson. Bayesian inference for stochastic kinetic models\n\nusing a diffusion approximation. Biometrics, 61(3):781(cid:150)788, 2005.\n\n8\n\n\f", "award": [], "sourceid": 67, "authors": [{"given_name": "Manfred", "family_name": "Opper", "institution": null}, {"given_name": "Guido", "family_name": "Sanguinetti", "institution": null}]}