__ Summary and Contributions__: The authors focus on the problem of inferring structural and dynamical properties of diffusion processes on graphs from a set of observations. To address this problem they developed a new methodology based on a rigorous analytical characterization of diffusion on graphs. They demonstrate on synthetic and real-world data that their newly proposed methodology allows to perform inference of both structural and dynamical properties of diffusion on graphs, with better performance than state-of-the-art methods.

__ Strengths__: - The paper proposes clear mathematical proofs for the construction of a new methodology to diffusion processes on graphs
- Authors propose a new algorithm for studying diffusion processes on graphs, and demonstrate, on both synthetic and real-world data, that their method achieve better performance than state-of-the-art methods.

__ Weaknesses__: The authors named there method Neural Mean-Field Dynamics but here the notion of neural network is defined in a very generic sense (discrete non-linear dynamical system of order 1, cf equation (14) ) far away from the common conception of RNN used in ML. Moreover it seems that the intervention of neural networks in the whole procedure is just one out of many insights on which the method relies. As such I wonder whether this will be of interest to the NeurIPS community and I think this work, of extraordinary quality, would be more suited for publications in network science conferences for instance.

__ Correctness__: Claims, method and methodology look correct.

__ Clarity__: The paper is well written.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: Update: I thank the authors for their response. However I still see a large distance between the common notion of RNN and the one used here, as such I still have my concern about the relevance for the NeurIPS audience and keep my score at 5.
----------------------
- In a typical RNN, inputs can be time varying signals, here what is termed inputs are simply the initial conditions of the dynamical system.
- Can equations 14 be mapped on standard RNN architectures such as vanilla RNN, or LSTM, and interpret the x and h in terms of hidden state, gates or other ?
- Could the statement lines 170-171 be expanded: with a more precise definition of \epsilon rather than « remaining nonlinear part « and take more words to define the « first layer of the RNN« ?

__ Summary and Contributions__: This paper proposes to model the diffusion on a network with a mean-field framework, which is derived from the Mori-Zwanzig formalism. The hard-to-solve memory term in the Mori-Zwanzig equation is approximated by a differential neural network. The differential equation is solved with Euler-forward discretization.

__ Strengths__: i) The problem considered in this work is of high relevance.
ii) Application of the Mori-Zwanzig formalism is in general appealing and novel. The resulting differential Equation (Thm. 2) is of interesting type and offers many future research directions.
iii) The experimental section underlines the claims of interpretable results (Table 1).

__ Weaknesses__: i) Many design choices appear rather arbitrary in section 3.2, e.g. why the split between linear and nonlinear parts for \epsilon(\cdot)? Since \epsilon is a black box, can we not get rid of the linear part from the beginning?
ii) The connection to optimal control (Sec. 3.2) is confusing and seems out of context.
a) Learning of weights as an optimal control problem is not new and was e.g. discussed in “Maximum Principle Based Algorithms for Deep Learning”. Such an optimal control connection can be done for any
parameter inference problem.
b) Why is Pontryagin maximum principle (PMP) introduced? From my understanding, the parameters are trained by minimizing loss function 18a with gradient descent. However if PMP is used, the solution quality heavily depends on the chosen solver method to the PMP problem, e.g. shooting methods. A discussion about such is missing. If PMP is not used (especially line 206 suggests no use of PMP), I would rather propose to get rid of this chapter, since it does not strengthen the main claims of the paper: Deriving a Mean Field framework from Mori-Zwanzig formalism.
iii) How come in Fig. 1 the MAE of NMF decreases with increasing time horizon? I would expect the error to increase with increasing horizon, as it does for InfluLearner. The proposed model in this paper is an
autoregressive one (Eq. 14a, b) and (small) errors at the beginning should accumulate to larger errors at later time steps.
iv) Minor: Broader Scope just repeats the Intro/Conclusion.
--- POST REBUTTAL ---
I read the author response and the other reviews carefully. The main two issues I've raised, namely i) the paper is hard to follow ii) weak motivation of the central Eq. 10 in the paper, are shared by the other reviewers.
I agree with the author’s comment that one could find some clues throughout the paper why Eq. 10 is in this particular shape, but this is rather hard to do (due to hard to read paper).
Furthermore, I still wonder how the paper benefits from relation between standard-backprop and the Hamiltonian. I believe that the evaluation is correct, but why should this be in the paper?
I would recommend the authors to rewrite the paper and motivate clearer the central design choices. In addition, I agree with Reviewer #2 that the paper would benefit more from being published at a network science conference.
Consequently, I keep my score unchanged.

__ Correctness__: The derivations and theoretical claims seem correct.

__ Clarity__: i) The paper is rather hard to read and many terms/symbols are used
without proper introduction, e.g. in Thm 1. z(t)=[x(t); e(t)] used
without introducing e(t).
ii) Experimental section is missing further details, e.g. initialization?
Early stopping?

__ Relation to Prior Work__: • The related work section is properly written.
• The novelty of this work is clear.

__ Reproducibility__: No

__ Additional Feedback__: The theoretical contribution of this paper is valuable. However the writing of the paper of this paper does not meet the quality standards. I suggest rewriting and resubmitting the paper to an upcoming venue.

__ Summary and Contributions__: In this paper, authors propose the neural mean-field dynamics (NMF) framework to solve the prediction (future infection states of nodes) and network inference (connectivity and strength of impact between nodes) problem. Specifically, they use the Mori-Zwanzig formalism to derive a generalized Langevin equation (GLE). Further, this GLE is approximated by a deep neural network. Experimental results show that the proposed model outperforms existing approaches in different tasks.

__ Strengths__: 1. The investigated problem is important and interesting. In addition, the proposed method is technically sound with theoretical proofs.
2. Evaluation is implemented on multiple generated datasets with different graph models and different distributions, and a real dataset from Sina Weibo.
3. This proposed model does not require early adopters or network structures, and the network structures can be inferenced based on the proposed model, which is different from the most existing methods.

__ Weaknesses__: 1. The utilization of the approximation in (10) is not properly validated. For example, the error between the approximation of the deep neural network and the original Mori-Zwanzig memory term is not evaluated.
2．In the section of numerical experiments, different baselines are compared in different tasks. However, choosing them in these tasks is not well justified. For example, InfluLearner is only compared in the task of Infection probability and influence function estimation. Obviously, by combining with the classical greedy algorithm, it can be compared in the task of Influence Maximization. Thus, why choosing these compared algorithms in different tasks needs more discussion.
3. Technical details in this paper is a bit hard to follow. It is better to given a neural network diagram or a pseudo-code algorithm to help readers between understand the details of the proposed framework.
4. In line 216, it is said that 1,000 source sets are generated. However, in line 224, MAE is only averaged over 100 source sets, which is contrary to previous description.
5. There are many typos in this paper, e.g.,
- Line 40: “and and reture”
- Line 127 and Line 142: “Appendix ??”

__ Correctness__: Yes, both claims and method are correct.

__ Clarity__: Most of them are well written, however, technical details are difficult of fellow.

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper studies diffusion in various networks. The main problem studied is the inference and estimation of when a particular node is infected and with what probability.

__ Strengths__: The paper is generally well written and I understood the main motivations and contributions quite well.
I am not in the best position to judge the competitiveness or the novelty of the method, however.

__ Weaknesses__: A general comment from an outside perspective-
Can you include in the introduction how your results would affect networks encountered in daily life like social networks or disease infection networks? In what ways does knowledge of timing and probability of infection affect me in my daily usage of social networks?
In other words, describe some real-world applications of your results.

__ Correctness__: I have not checked this.

__ Clarity__: Yes.

__ Relation to Prior Work__: I cannot judge this.

__ Reproducibility__: No

__ Additional Feedback__: I am an emergency reviewer for this paper, which, unfortunately is not at all in my field. So I am not in the best position to rate this paper. I have provided some general comments from an outsider's perspective.
My rating as is, is based on the fact that the paper felt "pleasing" to read, and I understood the main motivations quite well. Since I cannot judge the novelty or the accuracy of the method, I have assumed these factors to be rated highly.
==============================================================
POST-REBUTTAL: After the author response and other reviews, I am keeping my score unchanged.