The author feedback caused the reviewers to re-evaluate some of their initial opinions. R2 increased their score, and R3 (who is an expert) carefully reevaluated their opinion, although they did not change their score. (Since the authors speculated about their situation, I want to add that this reviewer is not in any evident conflict of interest, nor are they the author of any related works they pointed out). I largely follow the argument of R3. I understand that the authors are unhappy with the way this review is phrased, but it does raise important issues: There are a lot of design decisions here, not all of which can just be swept under the rug. Of course the proposed method is specifically designed for the application in the NODE setting, and that's fine. However, as R3 rightly points out, the experiments are done in a way that does not allow any generalization beyond a specific combination of network and solver. As we are still in the early days of NODE research, it seems unlikely that even interested readers would be willing to commit themselves to such specific choices only to achieve the speedup reported in the rebuttal. Nevertheless, the other reviewers, perhaps rightly, point out that the paper does contain interesting results that should be published. Indeed, we *are* in the early days of NODE research, and there is scope to test out ideas even if their eventual utility is still a bit unclear. The paper can thus be accepted. However, in light of what I wrote above and the criticism by R3, I want to *strongly urge* the authors to clarify and expand the scope of what we as a community can learn from the paper. In particular, how can the results shown in this paper be transferred to other base solvers?