Paper ID: | 5208 |
---|---|

Title: | Neural Jump Stochastic Differential Equations |

I very much like this approach. It is well explained. Many of the existing approaches are overly complex cobblings of discrete-time and continuous-time ideas. By contrast, this is a clean solution that is computationally reasonable. The experimental results show a good variety of uses of the proposed method. It is here that the paper could be most improved: - The details of the MLPs (number of hidden units, number of layers, method for training) are completely omitted. - The paper does not compare with Neural ODE, particularly where the authors of that paper used an RNN to encode the starting latent value. - The paper does not do comparisons in Section 4.3 to other methods. The neural Hawkes process can be easily augmented (by adding another MLP from the hidden state) to predict marks (locations) associated with each event. Overall, I like the paper, but it really should supply more information on the experimental set-ups, in order to judge the significance of the effects shown. The "Reproducibility Response" claims this information is in the paper, but it is neither in the paper nor the supplementary material. One question which should be addressed in the paper: - Why are the RNN and Neural JSDE better than a Poisson process on Poisson data? If this is because of regularization, then this further underscores the need for clear experimental set-up descriptions so that this can be judged for all experiments and all methods.

This paper makes a notable extension to the Neural ODE work, by amending the continuous ODE with discontinuities introduced by discrete events. The technically difficulty is that in the presence of the discontinuity, the left and right limit of the adjoint vectors are different. The authors propose to handle the jumps of the adjoint vectors by lifting them from their right limit to the left. In terms of the model structure, both the dynamics function and the jump function are parameterized by MLP, which is also used to model the conditional intensity, and the emission. Various methods are compared empirically, showing the advantages of the proposed approach in accuracy, parsimonious of the model, and the capability of handling events with real-valued features. This paper is nicely written, well motivated, the relevant background and is presented clearly. I quite enjoyed reading the paper. Significance wise, I can see the proposed model having applications in a wide range of time series prediction tasks, such as financial, or retail demand time series where the spikes are triggered by certain events. Overall, a very good paper. Detailed comments and questions. 1. In section 3.1, it would be good to spell out the full generative model. For example, it would make it more clear how different marks are generated, and similarly with "events with real-valued features." 2. The original neural ODE can also be used to model inhomogeneous Poisson processes, I'm wondering how this method compares empirically to it? 3. For the experiments, it would be good to repeat multiple times and report the stds associated with the results. Also how's the running time, comparing to, for example, baseline RNN network, and RMTPP [20]?

This paper proposes Neural Jump Stochastic Differential Equations, a general framework for modeling temporal event sequences. We demonstrate the state of the art performance on prediction tasks. The paper is well written and model is interesting. My concern lies in the experiment section. The model does not show performance improvement on MIMIC dataset, and on stackoverflow the improvement is incremental, and it is questionable if it is statistical significant. Another evaluation metric is event time prediction, see the experiments in [20]. This evaluation metric will also demonstrate the predictive power of the proposed model. a relevant paper that should be cited: [1] A Stochastic Differential Equation Framework for Guiding Online User Activities in Closed Loop, Wang et al, AISTATS 2018 The derivations in section 3.1 should mention [1], where same equations have been derived. ---------------------------- thanks the authors for your response, and it addressed my concerns. I changed the score accordingly.