Four knowledgable reviewers feel the paper is of high technical quality, novel, and the claims are well supported by empirical evidence; all of this puts it comfortably above the bar for publication at NeurIPS. R1 and R4 raised some technical concerns, primarily about evaluation in their initial reviews (evaluate on incident instead of cumulative deaths, use longer time horizons, and evaluate probabilistic calibration). The authors included these results in their rebuttal; the reviewers were convinced by this additional evidence and raised their scores. The authors are encouraged to take reviewer comments into account in the final version, including: acknowledge the use of additional features compared to the baseline models (R1), include additional related work (R3), clarify the scheme used to snapshot data for hindcasting and justify the fairness of the comparison to pre-registered forecasts (R3).