{"title": "Probabilistic Linear Multistep Methods", "book": "Advances in Neural Information Processing Systems", "page_first": 4321, "page_last": 4328, "abstract": "We present a derivation and theoretical investigation of the Adams-Bashforth and Adams-Moulton family of linear multistep methods for solving ordinary differential equations, starting from a Gaussian process (GP) framework. In the limit, this formulation coincides with the classical deterministic methods, which have been used as higher-order initial value problem solvers for over a century. Furthermore, the natural probabilistic framework provided by the GP formulation allows us to derive probabilistic versions of these methods, in the spirit of a number of other probabilistic ODE solvers presented in the recent literature. In contrast to higher-order Runge-Kutta methods, which require multiple intermediate function evaluations per step, Adams family methods make use of previous function evaluations, so that increased accuracy arising from a higher-order multistep approach comes at very little additional computational cost. We show that through a careful choice of covariance function for the GP, the posterior mean and standard deviation over the numerical solution can be made to exactly coincide with the value given by the deterministic method and its local truncation error respectively. We provide a rigorous proof of the convergence of these new methods, as well as an empirical investigation (up to fifth order) demonstrating their convergence rates in practice.", "full_text": "Probabilistic Linear Multistep Methods\n\nOnur Teymur\n\nDepartment of Mathematics\nImperial College London\n\no@teymur.uk\n\nKonstantinos Zygalakis\nSchool of Mathematics\nUniversity of Edinburgh\nk.zygalakis@ed.ac.uk\n\nBen Calderhead\n\nDepartment of Mathematics\nImperial College London\n\nb.calderhead@imperial.ac.uk\n\nAbstract\n\nWe present a derivation and theoretical investigation of the Adams-Bashforth and\nAdams-Moulton family of linear multistep methods for solving ordinary differential\nequations, starting from a Gaussian process (GP) framework. In the limit, this\nformulation coincides with the classical deterministic methods, which have been\nused as higher-order initial value problem solvers for over a century. Furthermore,\nthe natural probabilistic framework provided by the GP formulation allows us to\nderive probabilistic versions of these methods, in the spirit of a number of other\nprobabilistic ODE solvers presented in the recent literature [1, 2, 3, 4]. In contrast\nto higher-order Runge-Kutta methods, which require multiple intermediate function\nevaluations per step, Adams family methods make use of previous function evalua-\ntions, so that increased accuracy arising from a higher-order multistep approach\ncomes at very little additional computational cost. We show that through a careful\nchoice of covariance function for the GP, the posterior mean and standard deviation\nover the numerical solution can be made to exactly coincide with the value given\nby the deterministic method and its local truncation error respectively. We provide\na rigorous proof of the convergence of these new methods, as well as an empirical\ninvestigation (up to \ufb01fth order) demonstrating their convergence rates in practice.\n\n1\n\nIntroduction\n\nNumerical solvers for differential equations are essential tools in almost all disciplines of applied\nmathematics, due to the ubiquity of real-world phenomena described by such equations, and the lack\nof exact solutions to all but the most trivial examples. The performance \u2013 speed, accuracy, stability,\nrobustness \u2013 of the numerical solver is of great relevance to the practitioner. This is particularly\nthe case if the computational cost of accurate solutions is signi\ufb01cant, either because of high model\ncomplexity or because a high number of repeated evaluations are required (which is typical if an\nODE model is used as part of a statistical inference procedure, for example). A \ufb01eld of work has\nemerged which seeks to quantify this performance \u2013 or indeed lack of it \u2013 by modelling the numerical\nerrors probabilistically, and thence trace the effect of the chosen numerical solver through the entire\ncomputational pipeline [5]. The aim is to be able to make meaningful quantitative statements about\nthe uncertainty present in the resulting scienti\ufb01c or statistical conclusions.\nRecent work in this area has resulted in the development of probabilistic numerical methods, \ufb01rst\nconceived in a very general way in [6]. An recent summary of the state of the \ufb01eld is given in [7].\nThe particular case of ODE solvers was \ufb01rst addressed in [8], formalised and extended in [1, 2, 3]\nwith a number of theoretical results recently given in [4]. The present paper modi\ufb01es and extends the\nconstructions in [1, 4] to the multistep case, improving the order of convergence of the method but\navoiding the simplifying linearisation of the model required by the approaches of [2, 3]. Furthermore\nwe offer extensions to the convergence results in [4] to our proposed method and give empirical\nresults con\ufb01rming convergence rates which point to the practical usefulness of our higher-order\napproach without signi\ufb01cantly increasing computational cost.\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\f1.1 Mathematical setup\nWe consider an Initial Value Problem (IVP) de\ufb01ned by an ODE\n\nd\ndt\n\ny(t0,\u2713 ) = y0\n\ny(t, \u2713) = f (y(t, \u2713), t),\n\n(1)\nHere y(\u00b7,\u2713 ) : R+ ! Rd is the solution function, f : Rd \u21e5 R+ ! Rd is the vector-valued function\nthat de\ufb01nes the ODE, and y0 2 Rd is a given vector called the initial value. The dependence of y on\nan m-dimensional parameter \u2713 2 Rm will be relevant if the aim is to incorporate the ODE into an\ninverse problem framework, and this parameter is of scienti\ufb01c interest. Bayesian inference under this\nsetup (see [9]) is covered in most of the other treatments of this topic but is not the main focus of this\npaper; we therefore suppress \u2713 for the sake of clarity.\nSome technical conditions are required in order to justify the existence and uniqueness of solutions to\n(1). We assume that f is evaluable point-wise given y and t and also that it satis\ufb01es the Lipschitz\ncondition in y, namely ||f (y1, t) f (y2, t)|| \uf8ff Lf||y1 y2|| for some Lf 2 R+ and all t, y1 and y2;\nand also is continuous in t. These conditions imply the existence of a unique solution, by a classic\nresult usually known as the Picard-Lindel\u00f6f Theorem [10].\nWe consider a \ufb01nite-dimensional discretisation of the problem, with our aim being to numerically\ngenerate an N-dimensional vector1 y1:N approximating the true solution y(t1:N ) in an appropriate\nsense. Following [1], we consider the joint distribution of y1:N and the auxiliary variables f0:N\n(obtained by evaluating the function f), with each yi obtained by sequentially conditioning on\nprevious evaluations of f. A basic requirement is that the marginal mean of y1:N should correspond\nto some deterministic iterative numerical method operating on the grid t1:N. In our case this will be a\nlinear multistep method (LMM) of speci\ufb01ed type. 2\nFirstly we telescopically factorise the joint distribution as follows:\n\nN1Yi=0\n\np(y1:N , f0:N|y0) = p(f0|y0)\n\np(yi+1|y0:i, f0:i) p(fi+1|y0:i+1, f0:i)\n\n(2)\n\nWe can now make simplifying assumptions about the constituent distributions. Firstly since we have\nassumed that f is evaluable point-wise given y and t,\n\np(fi|yi, . . . ) = p(fi|yi) = fif (yi, ti),\n\n(3)\nwhich is a Dirac-delta measure equivalent to simply performing this evaluation deterministically.\nSecondly, we assume a \ufb01nite moving window of dependence for each new state \u2013 in other words\nyi+1 is only allowed to depend on yi and fi, fi1, . . . , fi(s1) for some s 2 N. This corresponds to\nthe inputs used at each iteration of the s-step Adams-Bashforth method. For i < s we will assume\ndependence on only those derivative evaluations up to i; this initialisation detail is discussed brie\ufb02y\nin Section 4. Strictly speaking, fN is super\ufb02uous to our requirements (since we already have yN) and\nthus we can rewrite (2) as\n\np(fi|yi) p(yi+1|yi, fmax(0,is+1):i)\n\n(4)\n\n(5)\n\np(y1:N , f0:N1|y0) =\n\n=\n\nN1Yi=0\nN1Yi=0\n\nfi(f (yi, ti)) p(yi+1|yi, fmax(0,is+1):i)\n\n|\n\n\u21e4\n\n{z\n\n}\n\nThe conditional distributions \u21e4 are the primary objects of our study \u2013 we will de\ufb01ne them by\nconstructing a particular Gaussian process prior over all variables, then identifying the appropriate\n(Gaussian) conditional distribution. Note that a simple modi\ufb01cation to the decomposition (2) allows\nthe same set-up to generate an (s + 1)-step Adams-Moulton iterator3 \u2013 the implicit multistep method\nwhere yi+1 depends in addition on fi+1. At various stages of this paper this extension is noted but\nomitted for reasons of space \u2013 the collected results are given in Appendix C.\n\n1The notation y0:N denotes the vector (y0, . . . , yN ), and analogously t0:N , f0:N etc.\n2We argue that the connection to some speci\ufb01c deterministic method is a desirable feature, since it aids\ninterpretability and allows much of the well-developed theory of IVP solvers to be inherited by the probabilistic\nsolver. This is a particular strength of the formulation in [4] which was lacking in all previous works.\n\n3The convention is that the number of steps is equal to the total number of derivative evaluations used in each\n\niteration, hence the s-step AB and (s + 1)-step AM methods both go \u2018equally far back\u2019.\n\n2\n\n\fLinear multistep methods\nWe give a very short summary of Adams family LMMs and their conventional derivation via\ninterpolating polynomials. For a fuller treatment of this well-studied topic we refer the reader to\nthe comprehensive references [10, 11, 12]. Using the usual notation we write yi for the numerical\nestimate of the true solution y(ti), and fi for the estimate of f (ti) \u2318 y0(ti).\nThe classic s-step Adams-Bashforth method calculates yi+1 by constructing the unique polynomial\nPi(!) 2 Ps1 interpolating the points {fij}s1\n\nj=0. This is given by Lagrange\u2019s method as\n\nPi(!) =\n\ns1Xj=0\n\n` 0:s1\nj\n\n(!)fij\n\n` 0:s1\nj\n\n(!) =\n\ns1Yk=0\n\nk6=j\n\n! tik\ntij tik\n\n(6)\n\nj\n\n(!) are known as Lagrange polynomials, have the property that ` 0:s1\n\nThe ` 0:s1\n(tiq) = pq, and\nform a basis for the space Ps1 known as the Lagrange basis. The Adams-Bashforth iteration then\nf (y, t) dt and approximating\nthe function under the integral by the extrapolated interpolating polynomial to give\n\nproceeds by writing the integral version of (1) as y(ti+1)y(ti) \u2318R ti+1\n\nti\n\np\n\nPi(!) d! = h\n\ns1Xj=0\n\nAB\nj,s fij\n\n(7)\n\nyi+1 yi \u21e1Z ti+1\nj,s \u2318 h1R h\n\nti\n\nj\n\n0 ` 0:s1\n\nwhere h = ti+1 ti and the AB\n(!) d! are the Adams-Bashforth coef\ufb01cients for\norder s, all independent of h and summing to 1. Note that if f is a polynomial of degree s 1 (so y(t)\nis a polynomial of degree s) this procedure will give the next solution value exactly. Otherwise the\nextrapolation error in fi+1 is of order O(hs) and in yi+1 (after an integration) is of order O(hs+1).\nSo the local truncation error is O(hs+1) and the global error O(hs) [10].\nAdams-Moulton methods are similar except that the polynomial Qi(!) 2 Ps interpolates the s + 1\npoints {fij}s1\nj=1. The resulting equation analogous to (7) is thus an implicit one, with the unknown\nyi+1 appearing on both sides. Typically AM methods are used in conjunction with an AB method of\none order lower, in a \u2018predictor-corrector\u2019 arrangement. Here, a predictor value y\u21e4i+1 is calculated\nusing an AB step; this is then used to estimate f\u21e4i+1 = f (y\u21e4i+1); and \ufb01nally an AM step uses this\nvalue to calculate yi+1. We again refer the reader to Appendix C for details of the AM construction.\n\n2 Derivation of Adams family LMMs via Gaussian processes\n\nWe now consider a formulation of the Adams-Bashforth family starting from a Gaussian process\nframework and then present a probabilistic extension. We \ufb01x a joint Gaussian process prior over\nyi+1, yi, fi, fi1, . . . , fis+1 as follows. We de\ufb01ne two vectors of functions (!) and (!) in terms\nof the Lagrange polynomials ` 0:s1\n\n(!) de\ufb01ned in (6) as\n\nj\n\n(!)\n\n` 0:s1\n0\n\n(!) =\u27130\n(!) =Z (!) d! =\u27131 Z ` 0:s1\n\n` 0:s1\n1\n\n(!)\n\n0\n\n. . .\n\n` 0:s1\ns1\n\n(!) d!. . .\n\n(!)\u25c6T\nZ ` 0:s1\n\ns1\n\n(!) d!\u25c6T\n\n(8)\n\n(9)\n\nThe elements (excluding the \ufb01rst) of (!) form a basis for Ps1 and the elements of (!) form a\nbasis for Ps. The initial 0 in (!) is necessary to make the dimensions of the two vectors equal, so we\ncan correctly de\ufb01ne products such as (!)T (!) which will be required later. The \ufb01rst element of\n(!) can be any non-zero constant C; the analysis later is unchanged and we therefore take C = 1.\nSince we will solely be interested in values of the argument ! corresponding to discrete equispaced\ntime-steps tj tj1 = h indexed relative to the current time-point ti = 0, we will make our notation\nmore concise by writing i+k for (ti+k), and similarly i+k for (ti+k). We now use these vectors\nof basis functions to de\ufb01ne a joint Gaussian process prior as follows:\n\n3\n\n\f0BBBBBBBB@\n\nyi+1\nyi\nfi\nfi1\n...\n\nfis+1\n\n1CCCCCCCCA\n\n= N\n\n2666666664\n\n0\n0\n0\n0\n...\n0\n\n0BBBBBBBB@\n\n1CCCCCCCCA\n\n,\n\n0BBBBBBBB@\n\nT\ni+1i+1\nT\ni i+1\nT\ni i+1\nT\ni1i+1\n...\n\nT\ni+1i\nT\ni i\nT\ni i\nT\ni1i\n...\n\nT\ni+1i\nT\ni i\nT\ni i\nT\ni1i\n...\n\nT\nis+1i+1 T\n\nis+1i T\n\nis+1i\n\nT\ni+1is+1\nT\ni is+1\n T\ni is+1\n T\ni1is+1\n\n...\n\n\u00b7\u00b7\u00b7\n\u00b7\u00b7\u00b7\n. . .\n. . .\n...\n. . . T\n\nis+1is+1\n\n1CCCCCCCCA\n3777777775(10)\n\nThis construction works because y0 = f and differentiation is a linear operator; the rules for the\ntransformation of the covariance elements is given in Section 9.4 of [13] and can easily be seen to\ncorrespond to the de\ufb01ned relationship between (!) and (!).\nRecalling the decomposition in (5), we are interested in the conditional distribution\np(yi+1|yi, fis+1:i). This is also Gaussian, with mean and covariance given by the standard formulae\nfor Gaussian conditioning. This construction now allows us to state the following result:\nProposition 1. The conditional distribution p(yi+1|yi, fis+1:i) under the Gaussian process prior\ngiven in (10), with covariance kernel basis functions as in (8) and (9), is a -measure concentrated\non the s-step Adams-Bashforth predictor yi + hPs1\n\nThe proof of this proposition is given in Appendix A.\nBecause of the natural probabilistic structure provided by the Gaussian process framework, we can\naugment the basis function vectors (!) and (!) to generate a conditional distribution for yi+1\nthat has non-zero variance. By choosing a particular form for this augmented basis we can obtain an\nexpression for the standard deviation of yi+1 that is exactly equal to the leading-order local truncation\nerror of the corresponding deterministic method.\nWe will expand the vectors (!) and (!) by one component, chosen so that the new vector\ncomprises elements that span a polynomial space of order one greater than before. De\ufb01ne the\naugmented bases +(!) and +(!) as\n\nj,s fij.\n\nj=0 AB\n\n` 0:s1\n0\n\n(!)+ =\u2713 0\n(!)+ =\u27131 Z ` 0:s1\n\n0\n\n(!)\n\n` 0:s1\n1\n\n(!)\n\n. . .\n\n(!) d!. . .\n\n` 0:s1\ns1\n\n(!) \u21b5hs` 1:s1\n\n1\n\nZ ` 0:s1\n\ns1\n\n(!) d! Z \u21b5hs` 1:s1\n\n1\n\n(!)\u25c6T\n(!) d!\u25c6T\n\n(11)\n\n(12)\n\nThe additional term at the end of +(!) is the polynomial of order s which arises from interpolating\nf at s + 1 points (with the additional point at ti+1) and choosing the basis function corresponding to\nthe root at ti+1, scaled by \u21b5hs with \u21b5 a positive constant whose role will be explained in the next\nsection. The elements of these vectors span Ps and Ps+1 respectively. With this new basis we can\ngive the following result:\nProposition 2. The conditional distribution p(yi+1|yi, fis+1:i) under the Gaussian process prior\ngiven in (10), with covariance kernel basis functions as in (11) and (12), is Gaussian with mean\nequal to the s-step Adams-Bashforth predictor yi + hPs1\nj,s fij and, setting \u21b5 = y(s+1)(\u2318)\nfor some \u2318 2 (tis+1, ti+1), standard deviation equal to its local truncation error.\nThe proof is given in Appendix B. In order to de-mystify the construction, we now exhibit a concrete\nexample for the case s = 3. The conditional distribution of interest is p(yi+1|yi, fi, fi1, fi2) \u2318\np(yi+1|yi, fi:i2). In the deterministic case, the vectors of basis functions become\n\nj=0 AB\n\n(!)s=3 =\u27130\n(!)s=3 =\u27131\n\n!(! + h)\n\n2h2 \u25c6\n\n!2 (2! + 3h)\n\n12h2\n\n\u25c6\n\n(! + h)(! + 2h)\n\n!(! + 2h)\n\n2h2\n\n!2!2 + 9h! + h2\n\n12h2\n\nh2\n!2 (! + 3h)\n\n3h2\n\n4\n\n\fand simple calculations give that\n\n12\n\n(! + h)(! + 2h)\n\n4\nfi \n3\nThe probabilistic version follows by setting\n\nE(yi+1|yi, fi:i2) = yi + h\u2713 23\n+(!)s=3 =\u27130\n+(!)s=3 =\u27131\nE(yi+1|yi, fi:i2) = yi + h\u2713 23\n\n!2!2 + 9h! + h2\n\nand further calculation shows that\n\nfi \n\n12h2\n\n2h2\n\n12\n\nfi1 +\n\n5\n12\n\nfi2\u25c6\n\nVar(yi+1|yi, fi:i2) = 0\n\n!(! + 2h)\n\n!(! + h)\n\n\u21b5!(! + h)(! + 2h)\n\nh2\n!2 (x + 3h)\n\n2h2\n\n3h2\n\n5\n12\n\nfi2\u25c6\n\n4\n3\n\nfi1 +\n\n!2 (2! + 3h)\n\n12h2\n\n6\n\u21b5!2(! + 2h)2\n\n\u25c6\nVar(yi+1|yi, fi:i2) =\u2713 3h4\u21b5\n8 \u25c62\n\n24\n\n\u25c6\n\nAn entirely analogous argument can be shown to reproduce and probabilistically extend the implicit\nAdams-Moulton scheme. The Gaussian process prior now includes fi+1 as an additional variable\nand the correlation structure and vectors of basis functions are modi\ufb01ed accordingly. The required\nmodi\ufb01cations are given in Appendix C and a explicit derivation for the 4-step AM method is given in\nAppendix D.\n\n2.1 The role of \u21b5\nReplacing \u21b5 in (11) by y(s+1)(\u2318), with \u2318 2 (tis+1, ti+1), makes the variance of the integrator\ncoincide exactly with the local truncation error of the underlying deterministic method.4\nThis is of course of limited utility unless higher derivatives of y(t) are available, and even if they\nare, \u2318 is itself unknowable in general. However it is possible to estimate the integrator variance in a\nsystematic way by using backward difference approximations [14] to the required derivative at ti+1.\nWe show this by expanding the s-step Adams-Bashforth iterator as\n\ns\n\nyi+1 = yi + hPs1\n= yi + hPs1\n= yi + hPs1\n= yi + hPs1\ns Ps\n= yi + hPs1\n\u00b7,s are the set of coef\ufb01cients and CAB\n\nj,s fij + hs+1CAB\nj,s fij + hs+1CAB\nj,s fij + hs+1CAB\nj,s fij + hs+1CAB\nj,s fij + hCAB\n\nj=0 AB\nj=0 AB\nj=0 AB\nj=0 AB\nj=0 AB\n\ns\n\ns\n\ns\n\ny(s+1)(\u2318)\ny(s+1)(ti+1) + O(hs+2)\nf (s)(ti+1) + O(hs+2)\n\n\u2318 2 [tis+1, ti+1]\n\nsince y0 = f\n\nhhsPs1+p\n\nk=0\n\nk,s1+pfik + O(hp)i + O(hs+2)\n\ns\n\nif we set p = 1\n\n(13)\nwhere AB\nthe local truncation error constant for the s-step\nAdams-Bashforth method, and \u00b7,s1+p are the set of backward difference coef\ufb01cients for estimating\nthe sth derivative of f to order O(hp) [14].\n\nk=0 k,sfik + O(hs+2)\n\nIn other words, the constant \u21b5 can be substituted with hsPs\n\nk=0 k,sfik, using already available\nfunction values and to adequate order. It is worth noting that collecting the coef\ufb01cients AB\n\u00b7,s and \u00b7,s\nresults in an expression equivalent to the Adams-Bashforth method of order s + 1 and therefore, this\nprocedure is in effect employing two integrators of different orders and estimating the truncation error\nfrom the difference of the two.5 This principle is similar to the classical Milne Device [12], which\npairs an AB and and AM iterator to achieve the same thing. Using the Milne Device to generate a\nvalue for the error variance is also straightforward within our framework, but requires two evaluations\nof f at each iteration (one of which immediately goes to waste) instead of the approach presented\nhere, which only requires one.\n\n4We do not claim that this is the only possible way of modelling the numerical error in the solver. The\nquestion of how to do this accurately is an open problem in general, and is particularly challenging in the\nmulti-dimensional case. In many real world problems different noise scales will be appropriate for different\ndimensions and \u2013 especially in \u2018hierarchical\u2019 models arising from higher-order ODEs \u2013 non-Gaussian noise is to\nbe expected. That said, the Gaussian assumption as a \ufb01rst order approximation for numerical error is present in\nvirtually all work on this subject and goes all the way back to [8]. We adopt this premise throughout, whilst\nnoting this interesting unresolved issue.\n\n5An explicit derivation of this for s = 3 is given in Appendix E.\n\n5\n\n\f3 Convergence of the probabilistic Adams-Bashforth integrator\n\nWe now give the main result of our paper, which demonstrates that the convergence properties of the\nprobabilistic Adams-Bashforth integrator match those of its deterministic counterpart.\nTheorem 3. Consider the s-step deterministic Adams-Bashforth integrator given in Proposition 1,\nwhich is of order s. Then the probabilistic integrator constructed in Proposition 2 has the same mean\nsquare error as its deterministic counterpart. In particular\n\nmax\n0\uf8ffkh\uf8ffT\n\nE|Yk yk|2 \uf8ff Kh2s\n\nwhere Yk \u2318 y(tk) denotes the true solution, yk the numerical solution, and K is a positive real\nnumber depending on T but independent of h.\n\nThe proof of this theorem is given in Appendix F, and follows a similar line of reasoning to that given\nfor a one-step probabilistic Euler integrator in [4]. In particular, we deduce the convergence of the\nalgorithm by extrapolating from the local error. The additional complexity arises due to the presence\nof the stochastic part, which means we cannot rely directly on the theory of difference equations\nand the representations of their solutions. Instead, following [15], we rewrite the de\ufb01ning s-step\nrecurrence equation as a one-step recurrence equation in a higher dimensional space.\n\n4\n\nImplementation\n\nWe now have an implementable algorithm for an s-step probabilistic Adams-Bashforth integrator.\nFirstly, an accurate initialisation is required for the \ufb01rst s iterations \u2013 this can be achieved with, for\nexample, a Runge-Kutta method of suf\ufb01ciently high order.6 Secondly, at iteration i, the preceding s\nstored function evaluations are used to \ufb01nd the posterior mean and variance of yi+1. The integrator\nthen advances by generating a realisation of the posterior measure derived in Proposition 2. Following\n[1], a Monte Carlo repetition of this procedure with different random seeds can then be used as an\neffective way of generating propagated uncertainty estimates at any time 0 < T < 1.\n4.1 Example \u2013 Chua circuit\n\nThe Chua circuit [16] is the simplest electronic circuit that exhibits chaotic behaviour, and has been\nthe subject of extensive study \u2013 in both the mathematics and electronics communities \u2013 for over\n30 years. Readers interested in this rich topic are directed to [17] and the references therein. The\nde\ufb01ning characteristic of chaotic systems is their unpredictable long-term sensitivity to tiny changes\nin initial conditions, which also manifests itself in the sudden ampli\ufb01cation of error introduced by\nany numerical scheme. It is therefore of interest to understand the limitations of a given numerical\nmethod applied to such a problem \u2013 namely the point at which the solution can no longer be taken to\nbe a meaningful approximation of the ground truth. Probabilistic integrators allow us to do this in a\nnatural way [1].\nThe Chua system is given by x0 = \u21b5(y (1 + h1)x h3x3), y0 = x y + z, z0 = y z. We\nuse parameter values \u21b5 = 1.4157, = 0.02944201, = 0.322673579, h1 = 0.0197557699,\nh3 = 0.0609273571 and initial conditions x0 = 0, y0 = 0.003, z0 = 0.005. This particular choice\nis taken from \u2018Attractor CE96\u2019 in [18]. Using the probabilistic version of the Adams-Bashforth\nintegrator with s > 1, it is possible to delay the point at which numerical path diverges from the\ntruth, with effectively no additional evaluations of f required compared to the one-step method. This\nis demonstrated in Figure 1. Our approach is therefore able to combine the bene\ufb01ts of classical\nhigher-order methods with the additional insight into solution uncertainty provided by a probabilistic\nmethod.\n\n4.2 Example \u2013 Lotka-Volterra model\n\nWe now apply the probabilistic integrator to a simple periodic predator-prey model given by the\nsystem x0 = \u21b5x xy, y0 = xy y for parameters \u21b5 = 1, = 0.3, = 1 and = 0.7. We\ndemonstrate the convergence behaviour stated in Theorem 3 empirically.\n\n6We use a (packaged) adaptive Runge-Kutta-Fehlberg solver of 7th order with 8th order error control.\n\n6\n\n\fFigure 1: Time series for the x-component in the Chua circuit model described in Section\n4.1, solved 20 times for 0 \uf8ff t \uf8ff 1000 using an s-step probabilistic AB integrator with\ns = 1 (top), s = 3 (middle), s = 5 (bottom). Step-size remains h = 0.01 throughout.\nWall-clock time for each simulation was close to constant (\u00b110 per cent \u2013 the difference\n\nprimarily accounted for by the RKF initialisation procedure).\n\nThe left-hand plot in Figure 2 shows the sample mean of the absolute error of 200 realisations of\nthe probabilistic integrator plotted against step-size, on a log-log scale. The differing orders of\nconvergence of the probabilistic integrators are easily deduced from the slopes of the lines shown.\nThe right-hand plot shows the actual error value (no logarithm or absolute value taken) of the same\n200 realisations, plotted individually against step-size. This plot shows that the error in the one-step\nintegrator is consistently positive, whereas for two- and three-step integrators is approximately centred\naround 0. (This is also visible with the same data if the plot is zoomed to more closely examine the\nrange with small h.) Though this phenomenon can be expected to be somewhat problem-dependent,\nit is certainly an interesting observation which may have implications for bias reduction in a Bayesian\ninverse problem setting.\n\n0\n\n\u22125\n\n\u221210\n\n|\nr\no\nr\nr\n\nE\n\n|\n \n\n \n\nE\n0\n1\ng\no\n\nl\n\n0.6\n\n0.4\n\n0.2\n\n0.0\n\nr\no\nr\nr\n\nE\n\nNo. of\nsteps \n(s)\n\n1\n\n2\n\n3\n\n4\n\n5\n\n\u22124\n\n\u22123\n\nlog10 h\n\n\u22122\n\n\u22124\n\n\u22123\n\nlog10 h\n\n\u22122\n\nFigure 2: Empirical error analysis for the x-component of 200 realisations of the\n\nprobabilistic AB integrator as applied to the Lotka-Volterra model described in Section 4.2.\nThe left-hand plot shows the convergence rates for AB integrators of orders 1-5, while the\nright-hand plot shows the distribution of error around zero for integrators of orders 1-3.\n\n7\n\n\f5 Conclusion\n\nWe have given a derivation of the Adams-Bashforth and Adams-Moulton families of linear multistep\nODE integrators, making use of a Gaussian process framework, which we then extend to develop\ntheir probabilistic counterparts.\nWe have shown that the derived family of probabilistic integrators result in a posterior mean at each\nstep that exactly coincides with the corresponding deterministic integrator, with the posterior standard\ndeviation equal to the deterministic method\u2019s local truncation error. We have given the general forms\nof the construction of these new integrators to arbitrary order. Furthermore, we have investigated\ntheir theoretical properties and provided a rigorous proof of their rates of convergence, Finally we\nhave demonstrated the use and computational ef\ufb01ciency of probabilistic Adams-Bashforth methods\nby implementing the solvers up to \ufb01fth order and providing example solutions of a chaotic system,\nand well as empirically verifying the convergence rates in a Lotka-Voltera model.\nWe hope the ideas presented here will add to the arsenal of any practitioner who uses numerical\nmethods in their scienti\ufb01c analyses, and contributes a further tool in the emerging \ufb01eld of probabilistic\nnumerical methods.\n\nReferences\n\n[1] O. A. CHKREBTII, D. A. CAMPBELL, B. CALDERHEAD, and M. A. GIROLAMI. Bayesian Solution\n\nUncertainty Quanti\ufb01cation for Differential Equations. Bayesian Analysis, 2016.\nP. HENNIG and S. HAUBERG. Probabilistic Solutions to Differential Equations and their Application to\nRiemannian Statistics. In, Proc. of the 17th int. Conf. on Arti\ufb01cial Intelligence and Statistics (AISTATS).\nVol. 33. JMLR, W&CP, 2014.\n\n[2]\n\n[3] M. SCHOBER, D. K. DUVENAUD, and P. HENNIG. Probabilistic ODE Solvers with Runge-Kutta Means.\nIn Z. GHAHRAMANI, M. WELLING, C. CORTES, N. D. LAWRENCE, and K. Q. WEINBERGER, editors,\nAdvances in Neural Information Processing Systems 27, pp. 739\u2013747. Curran Associates, Inc., 2014.\nP. R. CONRAD, M. GIROLAMI, S. S\u00c4RKK\u00c4, A. STUART, and K. ZYGALAKIS. Statistical Analysis\nof Differential Equations: Introducing Probability Measures on Numerical Solutions. Statistics and\nComputing, 2016.\n\n[4]\n\n[5] M. C. KENNEDY and A. O\u2019HAGAN. Bayesian Calibration of Computer Models. Journal of the Royal\n\n[6]\n\n[7]\n\n[8]\n\nStatistical Society: Series B, 63(3):425\u2013464, 2001.\nP. DIACONIS. Bayesian Numerical Analysis. In J. BERGER and S. GUPTA, editors, Statistical Decision\nTheory and Related Topics IV. Vol. 1, pp. 163\u2013175. Springer, 1988.\nP. HENNIG, M. A. OSBORNE, and M. GIROLAMI. Probabilistic Numerics and Uncertainty in Computa-\ntions. Proc. R. Soc. A, 471(2179):20150142, 2015.\nJ. SKILLING. Bayesian Numerical Analysis. In J. W. T. GRANDY and P. W. MILONNI, editors, Physics\nand Probability, pp. 207\u2013222. Cambridge University Press, 1993.\n\n[9] M. GIROLAMI. Bayesian Inference for Differential Equations. Theor. Comp. Sci., 408(1):4\u201316, 2008.\n[10] A. ISERLES. A First Course in the Numerical Analysis of Differential Equations. Cambridge University\n\nPress, 2nd ed., 2008.\n\n[11] E. HAIRER, S. N\u00d8RSETT, and G. WANNER. Solving Ordinary Differential Equations I: Nonstiff\n\nProblems. Of Springer Series in Computational Mathematics. Springer, 2008.\nJ. BUTCHER. Numerical Methods for Ordinary Differential Equations: Second Edition. Wiley, 2008.\n\n[12]\n[13] C. RASMUSSEN and C. WILLIAMS. Gaussian Processes for Machine Learning. University Press Group\n\nLimited, 2006.\n\n[14] B. FORNBERG. Generation of Finite Difference Formulas on Arbitrarily Spaced Grids. Mathematics of\n\nComputation, 51(184):699\u2013706, 1988.\n\n[15] E. BUCKWAR and R. WINKLER. Multistep Methods for SDEs and Their Application to Problems with\n\nSmall Noise. SIAM J. Numer. Anal., 44(2):779\u2013803, 2006.\n\n[16] L. O. CHUA. The Genesis of Chua\u2019s Circuit. Archiv f\u00fcr Elektronik und \u00dcbertragungstechnik, 46(4):250\u2013\n\n257, 1992.\n\n[17] L. O. CHUA. Chua Circuit. Scholarpedia, 2(10):1488, 2007.\n[18] E. BILOTTA and P. PANTANO. A Gallery of Chua Attractors. World Scienti\ufb01c, 2008.\n\nKZ was partially supported by a grant from the Simons Foundation and by the Alan Turing Institute under\nthe EPSRC grant EP/N510129/1. Part of this work was done during the author\u2019s stay at the Newton Institute for\nthe programme Stochastic Dynamical Systems in Biology: Numerical Methods and Applications.\n\n8\n\n\f", "award": [], "sourceid": 2140, "authors": [{"given_name": "Onur", "family_name": "Teymur", "institution": "Imperial College London"}, {"given_name": "Kostas", "family_name": "Zygalakis", "institution": "University of Edinburgh"}, {"given_name": "Ben", "family_name": "Calderhead", "institution": "Imperial College"}]}