Paper ID: | 2140 |
---|---|

Title: | Probabilistic Linear Multistep Methods |

The paper extends earlier work on probabilistic numerics for ODE solvers to linear multistep methods, providing a probabilistic counterpart of the existing algorithms. The existing deterministic methods are used to some extent, especially in problems where very high accuracy is required or f() is expensive to evaluate.

Technical quality: The method is very clear and elegant. Experimental evaluation could be improved by introducing more context: how does the method perform relative to other alternatives (probabilistic Runge-Kutta with potentially different parameters, perhaps relevant non-probabilistic methods). This would help understand the trade-offs: what is the method good for and when another alternative would be better. The way the paper reads the proposed method seems the best for everything, yet at least Matlab recommends using Runge-Kutta-based ode45 as the default instead of Adams-Bashforth-based ode113. Novelty: The proposed algorithm is a novel combination of existing techniques in probabilistic numerics and the existing deterministic algorithm. There do not seem to be significant broader novel contributions besides the proposed algorithm. Impact: The presented experiments look very impressive, but the practical impact potential is unclear and seems marginal. Clarifying the strengths and weaknesses of the method (see above) would clearly help in making the case for its impact. Clarity: The paper is very clearly written and easy to follow. It would be interesting to see a version of Fig. 1 including error bars estimated from the method - it seems that currently only the estimated means are ever used. More emphasis could be put on explaining the big picture of when the method is actually useful too. Other comments/questions: 1. In Eq. (1), why does y depend on theta but not f? 2. It would be nice to know the source of the variance seen in Fig. 1.

2-Confident (read it all; understood it all reasonably well)

%%% UPDATE %%% Thank you for your detailed response. In discussion, several points were raised by a meta-reviewer about the connection between this work and the work of Schober et al in NIPS 2014. In particular, it was suggested that the two methods may be near identical, albeit with different derivations. I shall not repeat the discussion here as I have only a limited understanding and I do not wish to make statements that are incorrect. I do hope these points will be made available to the author(s) by the meta reviewer(s). At the outset this paper excited me, although I am not expert in ODEs. It is clearly well written and technically proficient. However, the nature of the concerns that have been raised make it difficult to argue strongly in favour of the paper. As a result, my confidence score and corresponding novelty scores have now significantly decreased from the initial review stage. I would like to add that it would have been much better if the issue of equivalence with Schober et al 2014 had been put to the authors during the initial review stage, so that there was an opportunity to provide a rebuttal. (A comparison was requested by one reviewer, but at that stage the argument that the two works may be near identical was not put forward.) The unfortunate situation appears to be due to the structure of the NIPS review process and I do sympathise with the author(s)' predicament. %%% END %%% This paper develops a probabilistic version of the classical Adams-Bashforth and Adams-Moulton numerical integrators. The authors propose a distribution over (stochastic) solutions of the ordinary differential equation (ODE) that is intended to provide a probabilistic quantification of epistemic uncertainty due to the presence of numerical error. This paper is in line with other recent research into probabilistic numerical methods.

This is an outstanding paper. The author(s) should be warmly congratulated on a timely, intelligent and thought-provoking contribution to this emerging literature. The only substantive negative comment that I can make is that there is not a frequentist assessment of the coverage properties of the proposed distributional quantification of epistemic uncertainty (empirical or theoretical). However, that would not be easily achieved within the confines of 8 pages. In general terms, the fundamentalist Bayesian could reject the idea because information is "thrown away" outside the s-length moving window. They could also object because the prior measure on the solution is not placed on an infinite dimensional function space at the outset, but is instead defined on the finite dimensional distributions in a somewhat ad-hoc manner (relative to the fundamentalist Bayesian approach). The paper presumably aims to strike a balance between being "fully" Bayesian and being a computational pragmatist. Perhaps there is room to raise and address this trade-off in discussion? Some minor comments: Please provide a reference for the Picard-Lindelof theorem (page 2). On page 2, lines 55-56, I find the description "marginal measure given evaluations of the vector field" to be rather odd, since it sounds like you are in fact conditioning on evaluations of the vector field. My guess is that the equation that follows this description is not the one that was intended to be written here. Again on page 2, rather than \mathbb{Z}^+, do you mean \mathbb{N}? As far as I can tell, zero is excluded from this statement, so \mathbb{N} = {1,2,3,...} would be more standard. (I may well be mistaken - perhaps some clarification could be made.) I wonder how standard it is to refer to (e.g.) Eqn. 10 as a Gaussian "process" prior? Would not simply a "Gaussian prior" suffice here, given there doesn't seem to be a continuous time variable present at this point? I believe references [2] and [3] are both now published, so these reference should be updated. In the appendix a bold-face `f' is used, whereas in the main text it is not bold.

1-Less confident (might not have understood significant parts)

This paper makes use of the Gaussian Process framework to presents a derivation of the Adams-Bashforth and Adams-Moulton families of linear multistep ODE integrators and extends them to be probabilistic. The posterior of the resulting family of probabilistic integrators has mean (at each step) equal to the corresponding deterministic integrator and standard deviation corresponding to its truncation error.

I can't really comment on novelty of the method presented here or the appropriateness of the experiments as it's outside my area of expertise. Still, this paper seems to present an interesting result consisting in the extension of the Adams family LMMs to the probabilistic framework given by Gaussian Process. The (theoretical) proof of their convergence rates as well as their (empirical) verification is also a nice contribution. Minor: - Figure 2a is missing a legend (I assume it's a simple enumeration 1-5)

1-Less confident (might not have understood significant parts)

The paper presented a rigorous derivation of probabilistic linear multistep methods Adams-Bashforth and Adams-Moulton ODE integrators from Gaussian processes. A probabilistic framework has been provided to obtain numeric Adams-Bashforth and Adams-Moulton integrators up to an arbitrary order. Proof has been given that the same convergence characteristics apply to probabilistic Adams-Bashforth integrators as to their deterministic counterparts. A practical implementation of a probabilistic Adams-Bashforth integrator has empirically been verified to match convergence expectations.

The presented paper provided a rigorous theoretical derivation of the probabilistic Adams-Bashforth and Adams-Moulton numerical integrators. Detailed literature referencing puts the paper into context followed by a formal introduction of linear multistep methods. The authors also found a very good balance of stating what is necessary to follow theorems and concepts but also providing enough detail in the Appendix to follow proofs in full length. Certain assumptions or practices are always well motivated or sufficiently referenced when necessary. Therefore, clarity and logical flow is maintained throughout the paper providing a well chosen level of detail. Probabilistic multistep methods have the potential to estimate the degree of certainty of approximated numerical solutions. Probabilistic numerical implementations are a significant improvement over their deterministic counterparts. They maintain the same convergence characteristics while providing additional information about the quality of the solution. Therefore, they are of great interest for the NIPS community and might have the potential to be pioneering.

2-Confident (read it all; understood it all reasonably well)

In line with recent developments in probabilistic numerics, this paper constructs a probabilistic derivation for Adams-Moulton numerical solvers for ODE's. The development of these algorithms is important since it completes the understanding of the assumptions underlying these methods, in addition to being able to quantify uncertainty of the resultant answers.

Overall, an important contribution to an exciting field. Comments: - The clarity of the probabilistic interpretation could be improved, which is an important factor for this work to be taken up The derivation of the method follows the numerical procedure closely, by placing distributions over quantities that would have been computed in the original algorithm. The insight gained form treating numerical procedures as an inference problem generally relies on placing well defined distributions over all quantities, and then deriving the implied distributions by marginalisation. From the presentation in this paper, it is not immediately clear to me that there even is a consistent joint distribution over all variables, something which is important for a method attempting to be Bayesian. Additionally, presenting the method in this form would make the properties of the priors in the model more explicit, and easier to critique and improve on. A picture would also help the explanation of Adams family solvers (something along the lines of http://imgur.com/J46NIrV?). Perhaps in conjunction with a visualisation of relevant posteriors. Currently the paper lacks intuition and illustration of to how the probabilistic method behaves. The experiments provide a nice proof of concept, together with a neat illustration of how the probabilistic features of this model can predict when small errors accumulate to the point of no return in chaotic systems. Finally, it is unclear how the ground truth for estimating the empirical errors in figure 2 is created. Overall, a good contribution to the exciting field of probabilistic numerics. The paper is already a useful technical contribution, but could benefit a lot from additional clarity in presenting the assumptions and behaviour of the underlying probabilistic model. Additional points: The equation in line 56 seems odd, since it marginalises out the prior over function evaluations, rather than doing any conditioning.

2-Confident (read it all; understood it all reasonably well)

This paper tries to derive a new approach to numerically solving ODEs. The idea is to construct a GP which has as its mean the deterministic Adams-Bashforth method but additionally provides variance estimates for the values of a function y(t) with given t.

This is a very interesting submission which however leaves me with mixed feelings. The fundamental questions that I think is unaddressed is the following: the uncertainty of the estimator does not stem from measurement error and is purely epistemic. What exactly then is the rational behind the chosen augmented bases in eq (11) and (12)? Why does this make sense, what are other possibilities for augmenting the bases (or why are there none)? In particular, a useless 'probabilistic numerical method' is constructable by taking any deterministic numerical method to define the location of a gaussian, and arbitrarily setting a standard deviation for the gaussian. The paper would gain immensely by explicitly showing how it's construction is not arbitrary in this sense. A less important question is the following: if I'm not completely mistaken, all predictive densities are gaussian - is Monte Carlo really necessary in this case or is an analytical solution possible? === After rebuttal and reviewer discussion === My rather fundamental question about the arbitrariness of the probabilistic extension of a deterministic method was not properly addressed: so the SD of the probabilistic formulation coincides with the local truncation error - why should it? Whats the interpretation of that? Related to this: the paper conveys the impression of to not using priors and not solving a bayesian inverse problem (as it says it is not occupied with a bayesian treatment of θ). Apart from the comparison to the closely related probabilistic Runge-Kutta-Method, methods based on Monte Carlo (MC) are not even mentioned, though they now exist for quite some time [1]. Not comparing to (state of the art) MC makes the paper look a bit strange or even dishonest when taking into account MCs pervasiveness. [1] Arnold et al (2013). Linear multistep methods, particle filtering and sequential Monte Carlo. Inverse Problems, Volume 29, Issue 8.

1-Less confident (might not have understood significant parts)