__ Summary and Contributions__: The paper proposes to correct errors produced by PDE-solvers by inserting a neural network to the outputs of the (differentiable) PDE solver after each integration step and optimize the parameters of the neural network using backpropagation (through time) to compute the gradients wrt the neural network parameters. The experiments show that the proposed method work well.

__ Strengths__: The idea is simple and the experimental results are good.

__ Weaknesses__: The idea seems straightforward and it feels like it may have been tried before. After doing some search, I was not able to find this way of using neural networks in the literature. So, the idea seems novel.

__ Correctness__: Mostly correct. Although I think that the paper could motivate better the problem being solved. What would be a real use case for the considered problem statement?

__ Clarity__: Mostly well written. Although I did not understand completely the PRE approach. Does one use prior knowledge in designing the PRE-corrections? What is trained in the PRE-approach?

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: - Is there benefit in using the differential PDE solver? It might be interesting to compare the proposed approach with just training a neural network to solve a PDE (without the differential PDE solver).
- Line 52: What do you mean by explicit and implicit solvers?
- Do steps of a differentiable simulator correspond to time steps? For example in Figure 1, t=200 corresponds to the 200-th step of the solver? The text says that the number of steps is up to 128, as shown in Fig. 1.
- Line 177: What is "look-ahead trajectory per iteration"? What kind of iteration do you mean?
- Line 202: It would be helpful to provide more details on the "constrained least-squares corrector".
- If I understood the idea correctly, the computational graph of the model is a recurrent model with n consecutive blocks, each block is a combination of a differentiable solver and a conv net. The conv nets have shared parameters in each iteration. Is training of this model difficult because of the vanishing and exploding problems?
- It would be helpful to draw the computational graph.
- Paragraph starting on line 269: Do I understand correctly that in this experiment steps do not correspond to time but to iterations of a solver? It can be helpful to emphasise this more (maybe even in the introduction).

__ Summary and Contributions__: The authors propose to learn neural networks to correct (i.e., to reduce discretization errors of) PDE solutions. Their idea is to use differentiable PDE solves during training so that the correction neural network can be trained in an end-to-end manner, that is, both the correction step and the solution update step (by the solver) are taken into account in training. They provide intensive experimental results on several use cases.

__ Strengths__: The technical motivation is clear, the method is reasonable, and the intensive experiments are convincing.
The proposed method is a decent application of differentiable solvers and will be useful for applications where we need to solve PDEs for many settings.

__ Weaknesses__: The computational burden can be discussed more. The main motivation for correcting coarse solutions is to achieve a good fidelity with fewer computation resources. So, readers will be very interested in how it can really reduce the computational burden in solving PDEs. The discussion on the runtime, which is detailed in the appendix, can be a part of the main text. Moreover, such an analysis should be provided for every type of experiment conducted.

__ Correctness__: The experiments are convincing as to the types of problems tackled there.

__ Clarity__: The statement of the paper is basically clear, but some points in the main text can be improved; some important details are deferred to the appendix.
- What kinds of differentiable PDE solvers were adopted in the experiments?
- How was the test datasets were created? Now it only says "test data sets whose parameter distributions differ from the ones of the training data set," but it should be elaborated as it is important to assess the adequacy of the experiments.

__ Relation to Prior Work__: The related work part is too diverged. It should first focus on precisely *previous* studies, i.e., ones working on solution correction. Aren't there any researches that propose methods to correct coarse solutions of PDEs (or ODEs), not necessarily using neural networks? This lack of discussion makes the paper less convincing in terms of the novelty and the significance of the proposed method. If there are no previous studies that are directly comparable to the current one, it should be claimed so.

__ Reproducibility__: Yes

__ Additional Feedback__: Line 33-34: "We show that neural networks can only achieve optimal performance if they take the reaction of the solver into account."
I think it is an overclaiming. We cannot tell if it's "can only" nor "optimal" from the experiments.
Really minor points below.
Line 130: typo, T s_t --> T r_t ?
The broad impact section. The NeurIPS template says "Use unnumbered first level headings for this section, which should go at the end of the paper."
-----
[After rebuttal]
Thank you for the rebuttal. It is basically understandable, but I leave my score unchanged to be 6 because I cannot judge whether the main potential drawback (i.e., the lack of detailed discussion on computational burden) would be satisfactory covered in a revised version.

__ Summary and Contributions__: The paper propose a method to learn a correction function which is formulated as a neural network to reduce the numerical error of PDE solvers. The main contribution is that integrate the solver into the training process and thereby allow the correction function to interact with the PDE solver during training.

__ Strengths__: The empirical evaluation is impressing. Several complex PDEs are tested to show the performance of the proposed error correction method.

__ Weaknesses__: Not any theoretical analysis for the proposed method. In the proposed method, after training, the correction is a function of the numerical solution at current time, this may not be true. In principle, error should depend on the whole trajectory

__ Correctness__: The empirical methodology of this paper is self-contained under the assumption that the correction is a function of current solution. However, I doubt this assumption may not be true.

__ Clarity__: The paper is very hard to read. The idea of the paper is easy to understand, but too many conceptual interpretations make it difficult to get the main idea. The paper could be significantly simplified by introducing proper mathematical formula.

__ Relation to Prior Work__: The difference from previous works is properly discussed.

__ Reproducibility__: No

__ Additional Feedback__: After rebuttal, the whole picture is more clear for me. But the details are still confusing. I am not sure if my question can be well addressed in the revision. I change the score to 5.

__ Summary and Contributions__: Problem:
This paper focuses on PDE solver correction. It considers a fine grained simulation as reference and a coarse grained simulation as source. The goal is to learn a correction to the source with which the source simulation matches the reference. I can see a lot of meaningful applications such as weather forecast, where the reference simulation is hard to get.
Technics:
The authors point out the fact that the corrections would effect the PDE trajectory, therefore the training data one would collect under "non interacting" setting is not suitable for inference. In order to resolve this problem, the authors propose a solver-in-the-loop schema, which backpropagates gradients through the PDE solver on the source manifold to account for the influence of the corrector itself.
Summary:
This is a very well motivated work; the experiments are very well conducted; the high level idea are well explained and the logics are well organized.

__ Strengths__: Topic:
This work identifies an important practical problem (corrections themselves change PDE trajectory) and proposed an effective solution to it.
Evaluation:
The authors conduct a series of systemic experiments on multiple types of PDEs, and almost all gives desired results.
Significance:
The technics proposed here are especially suitable for PDE simulations where the reference can be observed yet very hard to simulate accurately (such as weather forecasting).

__ Weaknesses__: Writing:
It is hard to understand the notation and math without looking at the appendix. There are discrepancies between the main text and the appendix, e.g. line 132 in the main text defines the correction operator output as (original input + correction), while in the appendix, the correction operator output only contains the correction (line 25).
Experiment Cost:
It would be interesting to compare the cost of training the corrector verses directly running simulation on the reference manifold. Even though the later might be cheaper.

__ Correctness__: The claims, methods, and empirical methodology seems correct to me.

__ Clarity__: This paper is well written, although there seems to be minor inconsistencies in terms of notation.

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: Questions:
1) In line 25 of the appendix, the correction is defined as a function of the current state s, regardless of the number of time steps since the initialization (line 21). Can you explain why? Intuitively, the longer the simulation on the source manifold, the larger the deviation from the reference would be, no?